Find wheat potential homeologs

Find potential wheat homeologs (best hit with >90% identity and alignment >60% of the CDS length) and their functions based on Arabidopsis (At) and rice (Os) blast results (top 1 hit).

Please paste gene IDs (e.g. TraesCS5A02G391700) below. Each line is a gene.

To start a new job, click “Clear” button below, and resubmit (faster than refresh the page).

Database to search:


Output below

or Export to CSV
WheatGeneID Best Wheat matches Wheat %identity Best At matches At %identity At align length At description Best Os matches Os %identity At align length Os description

Update

  • 2024-09-18: modify the blastp method (-seg yes) to match Ensembl blast output (only affect some top hits of Arabidopsis).
  • 2024-09-18: add some low confidence genes that are hits of high confidence genes. For example, the B homeolog of PLATZ-A1 (TraesCS6A02G156600) is a low confidence gene.
  • 2024-11-01: add alignment length from BLAST for At and Os hits. Without the alignment length, we cannot tell which wheat gene is best At/Os homolog.

Methods

Here are the commands I used for preparing homeologs and the best hits in Arabidopsis and rice. Arabidopsis and rice seequnces were downloaded from Ensembl Plants. Kronos cDNAs were downloaded from Zenodo. CS IWGSC annotation v1.1 HC cDNAs were downloaded from Wheat URGI.

## homeolog search by self blast
### blast self
blastn -task blastn -db ../blastdb/Kronos.v1.0.all.cds.fa -query ../blastdb/Kronos.v1.0.all.cds.fa -outfmt "6 std qlen slen" -perc_identity 90 -word_size 20 -num_threads 40 -out out_Kronos_v1.0_cdna_self_wordsize20.txt &
blastn -task blastn -query /Users/galaxy/blastdb/IWGSC_v1.1_HC_20170706_cds.fasta -db /Users/galaxy/blastdb/IWGSC_v1.1_HC_20170706_cds.fasta -outfmt "6 std qlen slen" -perc_identity 90 -word_size 20 -num_threads 40 -out out_CS_v1.1_HC_self_wordsize20.txt &

### organize results: self3, use 0.6 length as cut point, due to splice variation
gawk '$4>$13*0.6 {split($1,aa,"."); split($2,bb,"."); qq=aa[1]; ss=bb[1]; if(!(qq"\t"ss in cc)) {cc[qq"\t"ss]++; printf("%s\t%s\t%.f\t%s\n",qq,ss,$3,$4)} }' out_CS_v1.1_HC_self_wordsize20.txt > filtered_CS_v1.1_HC_self3.txt
gawk '$4>$13*0.6 {split($1,aa,"."); split($2,bb,"."); qq=aa[1]; ss=bb[1]; if(!(qq"\t"ss in cc)) {cc[qq"\t"ss]++; printf("%s\t%s\t%.f\t%s\n",qq,ss,$3,$4)} }' out_Kronos_v1.0_cdna_self_wordsize20.txt > filtered_Kronos_self3.txt

## blast Os and At
# update 2024-09-18: add '-seg yes'
### Kronos
blastp -db ../blastdb/Arabidopsis_thaliana.TAIR10.pep.all.fa -query ../blastdb/Kronos.v1.0.all.pep.fa -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 3 -num_threads 40 -out out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt -seg yes &
blastn -task blastn -db /Users/galaxy/blastdb/Oryza_sativa.IRGSP-1.0.cds.all.fa -query ../blastdb/Kronos.v1.0.all.cds.fa -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 15 -num_threads 40 -out out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt &

gawk 'bb[$1]<1{bb[$1]=1; print}' out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt > top1hit_out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt
sed -i 's/ gene:/\t/g;s/ gene_symbol:/\t/g;s/ description:/\t/g;s/ \[Source/\t/g' top1hit_out_Kronos_v1.0_against_Arabidopsis_TAIR10_pep_wordsize3.txt

gawk 'bb[$1]<1{bb[$1]=1; print}' out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt > top1hit_out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt
sed -i 's/ gene:/\t/g;s/ gene_biotype:/\t/g; s/ gene_symbol:/\t/g;s/ description:/\t/g' top1hit_out_Kronos_v1.0_against_rice_IRGSP-1.0_cdna_wordsize15.txt

### CS
blastp -db ../blastdb/Arabidopsis_thaliana.TAIR10.pep.all.fa -query ../blastdb/Triticum_aestivum.IWGSC.pep.all.fa -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 3 -num_threads 40 -out out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt -seg yes &
blastn -task blastn -db /Users/galaxy/blastdb/Oryza_sativa.IRGSP-1.0.cds.all.fa -query /Users/galaxy/blastdb/IWGSC_v1.1_HC_20170706_cds.fasta -outfmt "6 std qlen slen stitle" -max_target_seqs 6 -word_size 11 -num_threads 40 -out out_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt &

gawk 'bb[$1]<1{bb[$1]=1; print}' out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt > top1hit_out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt
sed -i 's/ gene:/\t/g;s/ gene_symbol:/\t/g;s/ description:/\t/g;s/ \[Source/\t/g' top1hit_out_CS_v1.1_against_Arabidopsis_TAIR10_pep_wordsize3.txt

gawk 'bb[$1]<1{bb[$1]=1; print}' out_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt > top1hit_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt
sed -i 's/ gene:/\t/g; s/ gene_biotype:/\t/g; s/ gene_symbol:/\t/g; s/ description:/\t/g'  top1hit_CS_v1.1_against_rice_IRGSP-1.0_cdna_wordsize11.txt

## then I prepared a sqlite3 database for the webtool

Acknowledgment