Find potential wheat homeologs (best hit with >90% identity and alignment >60% of the CDS length) and their functions based on Arabidopsis (At) and rice (Os) blast results (top 1 hit).
Please paste gene IDs (e.g. TraesCS5A02G391700) below. Each line is a gene.
To start a new job, click “Clear” button below, and resubmit (faster than refresh the page).
Database to search:
Output below
or Export to CSVWheatGeneID | Best Wheat matches | Wheat %identity | Best At matches | At %identity | At align length | At description | Best Os matches | Os %identity | At align length | Os description |
---|
Update
blastp
method (-seg yes
) to match Ensembl blast output (only affect some top hits of Arabidopsis).Methods
Here are the commands I used for preparing homeologs and the best hits in Arabidopsis and rice. Arabidopsis and rice seequnces were downloaded from Ensembl Plants. Kronos cDNAs were downloaded from Zenodo. CS IWGSC annotation v1.1 HC cDNAs were downloaded from Wheat URGI.
## homeolog search by self blast
### blast self
blastn -task blastn -db ../blastdb/Kronos.v1.0.all.cds.fa -query ../blastdb/Kronos.v1.0.all.cds.fa -outfmt "6 std qlen slen" -perc_identity 90 -word_size 20 -num_threads 40 -out out_Kronos_v1.0_cdna_self_wordsize20.txt &
blastn -task blastn -query /Users/galaxy/blastdb/IWGSC_v1.1_HC_20170706_cds.fasta -db /Users/galaxy/blastdb/IWGSC_v1.1_HC_20170706_cds.fasta -outfmt "6 std qlen slen" -perc_identity 90 -word_size 20 -num_threads 40 -out out_CS_v1.1_HC_self_wordsize20.txt &
### organize results: self3, use 0.6 length as cut point, due to splice variation
gawk '$4>$13*0.6 {split($1,aa,"."); split($2,bb,"."); qq=aa[1]; ss=bb[1]; if(!(qq"\t"ss in cc)) {cc[qq"\t"ss]++; printf("%s\t%s\t%.f\t%s\n",qq,ss,$3,$4)} }' out_CS_v1.1_HC_self_wordsize20.txt > filtered_CS_v1.1_HC_self3.txt
gawk '$4>$13*0.6 {split($1,aa,"."); split($2,bb,"."); qq=aa[1]; ss=bb[1]; if(!(qq"\t"ss in cc)) {cc[qq"\t"ss]++; printf("%s\t%s\t%.f\t%s\n",qq,ss,$3,$4)} }' out_Kronos_v1.0_cdna_self_wordsize20.txt > filtered_Kronos_self3.txt
## blast Os and At
# update 2024-09-18: add '-seg yes'
### Kronos
blastp -db ../blastdb/Arabidopsis_thaliana.TAIR10.pep.all.fa -query ../blastdb/Kronos.v1.0.all.pep.fa -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 3 -num_threads 40 -out out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt -seg yes &
blastn -task blastn -db /Users/galaxy/blastdb/Oryza_sativa.IRGSP-1.0.cds.all.fa -query ../blastdb/Kronos.v1.0.all.cds.fa -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 15 -num_threads 40 -out out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt &
gawk 'bb[$1]<1{bb[$1]=1; print}' out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt > top1hit_out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt
sed -i 's/ gene:/\t/g;s/ gene_symbol:/\t/g;s/ description:/\t/g;s/ \[Source/\t/g' top1hit_out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt
gawk 'bb[$1]<1{bb[$1]=1; print}' out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt > top1hit_out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt
sed -i 's/ gene:/\t/g;s/ gene_biotype:/\t/g; s/ gene_symbol:/\t/g;s/ description:/\t/g' top1hit_out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt
### CS
blastp -db ../blastdb/Arabidopsis_thaliana.TAIR10.pep.all.fa -query ../blastdb/Triticum_aestivum.IWGSC.pep.all.fa -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 3 -num_threads 40 -out out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt -seg yes &
blastn -task blastn -db /Users/galaxy/blastdb/Oryza_sativa.IRGSP-1.0.cds.all.fa -query /Users/galaxy/blastdb/IWGSC_v1.1_HC_20170706_cds.fasta -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 11 -num_threads 40 -out out_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt &
gawk 'bb[$1]<1{bb[$1]=1; print}' out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt > top1hit_out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt
sed -i 's/ gene:/\t/g;s/ gene_symbol:/\t/g;s/ description:/\t/g;s/ \[Source/\t/g' top1hit_out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt
gawk 'bb[$1]<1{bb[$1]=1; print}' out_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt > top1hit_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt
sed -i 's/ gene:/\t/g; s/ gene_biotype:/\t/g; s/ gene_symbol:/\t/g; s/ description:/\t/g' top1hit_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt
## then I prepared a sqlite3 database for the webtool
Acknowledgment