class: center, middle, inverse, title-slide # Molecular Marker Development ## A simple introduction and computer demo ### Junli Zhang ### 2017/11/11 --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/9/96/Polymerase_chain_reaction.svg) ??? Image credit: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File%3APolymerase_chain_reaction.svg) --- # Molecular Markers ## Wikipedia's defination In genetics, a **molecular marker** (identified as genetic marker) is a fragment of DNA that is associated with a certain location within the genome. ## Reflect Polymorphism in a Population --- # Steps of marker Development 1. Find DNA sequence variaitons within a population or between two parents 1. Need a way to screen the variations in a populations: chips, sequencing or PCR ## Here I will just focus on PCR method, that is, we need to .red[design primers]! --- # DNA sequence mutations ### A mutation is a change in the "normal" base pair sequences - #### a single base pair substitution .remark-code[ CGAGCTTGA**T**GACGAAGAAGGAG</br> CGAGCTTGA![:color red](**A**)GACGAAGAAGGAG ] -- - #### a small deletion or insertion .remark-code[ CGAGCTTGA**TGA**CGAAGAAGGAG</br> CGAGCTTGA![:color red](**---**)CGAAGAAGGAG ] -- - #### a larger insertion or deletion or rearrangement - inversion, duplications, translocations etc ??? Credit: Youtube lecture [DNA Sequence Variation](https://youtu.be/ZXQuQJjdPvk) --- ## Types of molecular markers - RFLP: Restriction Fragment Length Polymorphism - RAPD: Random Amplified Polymorphic DNA - AFLP: Amplified Fragment Length Polymorphism - SSR: Simple Sequence Repeats - or microsatellite - ~1-5 bp core unit - SNP: Single Nucleotide Polymorphism ### A very good review on genetic markers and QTL mapping .cite[Collard, B.C.Y., Jahufer, M.Z.Z., Brouwer, J.B. et al. Euphytica (2005) 142: 169. https://doi.org/10.1007/s10681-005-1681-5] --- class: middle, center # Dorminant vs. Co-dorminant markers ![dorminant-markers](/files/dorminant-codorminant-markers.svg) --- # Common PCR-based genotyping methods for SNP markers - CAPS: - Cleaved Amplified Polymorphic Sequences - A Single Nucleotide Polymorphism (SNP) where one allele creates (or removes) a naturally occurring restriction site - Codorminant -- - dCAPS: - Derived Cleaved Amplified Polymorphic Sequences - For SNPs that do not create a natural restriction site. - Uses mismatches in one PCR primer to create or remove a restriction site for one allele - Codorminant -- - KASP: - Kompetitive Allele Specific PCR - A homogenous, fluorescence-based genotyping variant of PCR. - Based on allele-specific oligo extension and fluorescence resonance energy transfer for signal generation - Codorminant --- ![](/files/CAPS.png) --- background-image: url(https://www.ncbi.nlm.nih.gov/core/assets/probe/images/dCAPS_example.png) --- # Genotyping - dCAPS - Derived CAPS uses a mismatched PCR primer to create or remove a restriction site based on the genotype of a SNP. -- - Advantages: - Can be used when the SNP does not create a natural CAPS/RFLP marker. - Can be used to change a natural CAPS marker from a site using an expensive or rare enzyme to a cheap or common enzyme. -- - Disadvantages: - Mismatches in primer lowers PCR specificity. - Laborious compared to hybridization with gene chip methods for SNP detection. - Finding the right enzyme. Can use web site: http://helix.wustl.edu/dcaps/dcaps.html to find dCAPS primers for SNPs. ??? Credits: [dCAP genotyping slides](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&ved=0ahUKEwj3y86F-bHXAhVEllQKHQMpBF8QFghPMAk&url=http%3A%2F%2Fwww.genetics-gsa.org%2Feducation%2Fpdf%2FHultman%2520and%2520Mellgren%25202014%2520Fetching%2520SNPs%2520Supplemental%25202%2520-%2520dCAPS%2520genotyping%2520slides.pptx&usg=AOvVaw2Hb0zyAzAfqRomIx4MkizO) --- # dCAPS **IWB1998**: CGAGCTTGATGACGAAGAAGGAGA[**T/C**]CGGGCAGACCCACGACGT **EcoRV**: GAT'ATC Allele | Seq --- | --- C | CGAGCTTGATGACGAAGAAG![:color blue](GAGA**C**C)GGGCAGACCCACGACGT T | CGAGCTTGATGACGAAGAAG![:color blue](GAGA**T**C)GGGCAGACCCACGACGT Primer | CGAGCTTGATGACGAAGAAGGA![:color red](**T**)A -- **After PCR** Allele | Seq --- | --- C | CGAGCTTGATGACGAAGAAG![:color blue](GA)![:color red](**T**)![:color blue](A**C**C)GGGCAGACCCACGACGT T | CGAGCTTGATGACGAAGAAG![:color blue](GA)![:color red](**T**)![:color blue](A**T**C)GGGCAGACCCACGACGT --- # dCAPS example: IWB1998 .pull-left[ #### Steps 1. PCR with dCAPs primers 1. Digest products with EcoRV 1. Run on gel #### Expected results for three genotypes: – Homozygous C/C –181 bp – Homozygous T/T –157, 24 bp – Heterozygous C/T –181, 157, 24 bp ] .pull-right[ ![](/files/dCAPS-gel.svg) ] ??? The 20 bpproduct will run off gel, since we run gel long enough to resolve between 163 and 143 bp --- # CAPS gel scoring ![:scale 100%](/files/CAPS-scoring.png) --- # KASP components ![:scale 100%](/files/kasp-components.png) --- ![:scale 95%](/files/kasp-steps.png) --- class: center ![:scale 60%](/files/KASP-results.png) --- name: inverse class: center, middle, inverse ## That is all the introductions. ## ![:emoji smiley] Let us develop markers ![:emoji smiley] --- # Steps in primer design .wide-left[ 1. Find the locations on wheat pseudomolecule of the flanking markers - Blast the sequences of your markers against the wheat pseudomolecule 1. Find all the variations between the two parents of your population in the region between two flanking markers - You already have the exon capture data for both parents on [T3](https://triticeaetoolbox.org/wheat/display_genotype.php?trial_code=2017_WheatCAP) - Extract variations only in the target regions 1. Design primers for a certain variations to screen recombinants - Majority will be SNPs, small insertions or deletion. - We fill focus on CAPS/dCAPS/KASP ] .narrow-right[ ![](/files/QTl7AS-4.svg) ] --- class: graylist # Requirements of PCR primers - ## Specific -- - Length: 18 - 25 -- - Melting Temperature: around 60 °C -- - GC Clamp: G or C bases within the last five bases in the 3' end helps promote specific binding, but more than 3 G/C should be avoided -- - NO Secondary Structures -- - Avoid Template Secondary Structure or other complex regions, such as retros -- - Amplicon Length: qPCR and dCAPS are short (< 300 bp), other markers usually < 1 kb -- - Primer Pair Tm Difference < 5 C [More information](http://www.premierbiosoft.com/tech_notes/PCR_Primer_Design.html) --- class: graylist # Primer Design Tips - The first nb in the 3' end is most important; avoid changing this one to introduce mutations in dCAPS primer design; .remark-code[CGAGCTTGATGACGAAGAAGGA![:color red](**T**)] -- - To be more specific, two nb difference in the first 4 bp from the 3' end .remark-code[CGAGCTTGATGACGAAGAA![:color red](**GGAT**)] -- - Sometimes we can introduce 1 mutation in the 3rd nb to make the primer more specific .remark-code[CGAGCTTGATGACGAAGAAG.red.bold[G]A.red.bold[T]] -- - We can add some tails to make the primer longer for dCAPS primer to be able to better seprate after digestion .remark-code[.blue[GAAGGTGACCAAGTTCATGCT]CGAGCTTGATGACGAAGAAG.red[**G**]AT] [More reading](http://www.genomecompiler.com/tips-for-efficient-primer-design/) --- # Primer design software - Primer3 (http://primer3.ut.ee/) - Polymarker for KASP in wheat (https://github.com/TGAC/bioruby-polyploid-tools) - CAPS Designer (https://solgenomics.net/tools/caps_designer/caps_input.pl) - dCAPS Finder 2.0 (http://helix.wustl.edu/dcaps/dcaps.html) - GSP (Genome Specific Primers) (https://probes.pw.usda.gov/GSP/) - **SNP Primer Design** (a galaxy tool, http://169.237.215.34/galaxy/) --- # SNP Primer Design ### A pipeline to design .red[KASP/CAPS/dCAPS] primers for SNPS in wheat ### A Python script which incorperates: - **Muscle**: Multiple sequence alignment program (http://www.drive5.com/muscle/) - **Primer3**: program for designing PCR primers (http://primer3.sourceforge.net/) - **blast+**: BLAST the wheat genome (https://blast.ncbi.nlm.nih.gov/Blast.cgi) ### I have a github repository for this tool https://github.com/pinbo/getKASP_pipeline --- # How .blue[SNP Primer Design] works? 1. Blast each SNP sequence against the pseudomolecule and get hits that are - 90% similarity and - 90% of length of the best hit AND > 50 bp 1. Get 500 bps on each side of the SNP for all the hits 1. Multiple Sequence Alignment of the homeolog sequences with MUSCLE 1. Find all the variation sites that can differ the target from other homeologs 1. Use these sites as forced 3’ end in Primer3 and design homeolog specific sequences 1. Blast all the primers against the pseudomolecule v1.0 with word length 7 to see whether it also hits other chromosomes - Criterion of matches: < 2 mismatches in the first 4 bps from 3' --- # Test Math Expressions You can write LaTeX math expressions inside a pair of dollar signs, e.g. \$\alpha+\beta\$ renders $\alpha+\beta$ and \\( \alpha+\beta \\) You can use the display style with double dollar signs: ``` $$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$ ``` $$\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i$$ When $a \ne 0$, there are two solutions to `\(ax^2 + bx + c = 0\)` and they are $$x = {-b \pm \sqrt{b^2-4ac} \over 2a}$$