class: center, middle, inverse, title-slide # Marker Development ## with Galaxy ### Junli Zhang ### 2017/11/11 --- class: center, middle, inverse, title-slide # Molecular Marker Development ## A simple introduction and computer demo ### Junli Zhang ### 2017/11/11 --- background-image: url(https://upload.wikimedia.org/wikipedia/commons/9/96/Polymerase_chain_reaction.svg) ??? Image credit: [Wikimedia Commons](https://commons.wikimedia.org/wiki/File%3APolymerase_chain_reaction.svg) --- ## Molecular Markers ### Wikipedia's defination In genetics, a **molecular marker** (identified as genetic marker) is a fragment of DNA that is associated with a certain location within the genome. ### Reflect Polymorphism in a Population --- ## Steps of marker Development 1. Find DNA sequence variaitons within a population or between two parents 1. Need a way to screen the variations in a populations: chips, sequencing or PCR ## Here I will just focus on PCR method, that is, we need to .red[design primers]! --- ## DNA sequence mutations ### A mutation is a change in the "normal" base pair sequences - #### a single base pair substitution .remark-code[ CGAGCTTGA**T**GACGAAGAAGGAG</br> CGAGCTTGA![:color red](**A**)GACGAAGAAGGAG ] -- - #### a small deletion or insertion .remark-code[ CGAGCTTGA**TGA**CGAAGAAGGAG</br> CGAGCTTGA![:color red](**---**)CGAAGAAGGAG ] -- - #### a larger insertion or deletion or rearrangement - inversion, duplications, translocations etc ??? Credit: Youtube lecture [DNA Sequence Variation](https://youtu.be/ZXQuQJjdPvk) --- ## Types of molecular markers - RFLP: Restriction Fragment Length Polymorphism - RAPD: Random Amplified Polymorphic DNA - AFLP: Amplified Fragment Length Polymorphism - SSR: Simple Sequence Repeats - or microsatellite - ~1-5 bp core unit - SNP: Single Nucleotide Polymorphism #### A very good review on genetic markers and QTL mapping .cite[Collard, B.C.Y., Jahufer, M.Z.Z., Brouwer, J.B. et al. Euphytica (2005) 142: 169. https://doi.org/10.1007/s10681-005-1681-5] --- class: middle, center ## Dominant vs. Co-dominant markers ![dominant-markers](files/dorminant-codorminant-markers.svg) --- ## Common PCR-based genotyping methods for SNP markers - CAPS: - Cleaved Amplified Polymorphic Sequences - A Single Nucleotide Polymorphism (SNP) where one allele creates (or removes) a naturally occurring restriction site - Codominant -- - dCAPS: - Derived Cleaved Amplified Polymorphic Sequences - For SNPs that do not create a natural restriction site - Uses mismatches in one PCR primer to create a restriction site for one allele - Codominant -- - KASP: - Kompetitive Allele Specific PCR - A homogenous, fluorescence-based genotyping variant of PCR - Based on allele-specific oligo extension and fluorescence resonance energy transfer for signal generation - Codominant --- ![](files/CAPS.png) --- background-image: url(https://www.ncbi.nlm.nih.gov/core/assets/probe/images/dCAPS_example.png) --- ## Genotyping - dCAPS - Derived CAPS uses a mismatched PCR primer to create or remove a restriction site based on the genotype of a SNP. -- - Advantages: - Can be used when the SNP does not create a natural CAPS/RFLP marker. - Can be used to change a natural CAPS marker from a site using an expensive or rare enzyme to a cheap or common enzyme. -- - Disadvantages: - Mismatches in primer lowers PCR specificity. - Finding the right enzyme. Can use web site: http://helix.wustl.edu/dcaps/dcaps.html or http://indcaps.kieber.cloudapps.unc.edu/ to find dCAPS primers for SNPs. ??? Credits: [dCAP genotyping slides](https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=10&ved=0ahUKEwj3y86F-bHXAhVEllQKHQMpBF8QFghPMAk&url=http%3A%2F%2Fwww.genetics-gsa.org%2Feducation%2Fpdf%2FHultman%2520and%2520Mellgren%25202014%2520Fetching%2520SNPs%2520Supplemental%25202%2520-%2520dCAPS%2520genotyping%2520slides.pptx&usg=AOvVaw2Hb0zyAzAfqRomIx4MkizO) --- ## dCAPS **IWB1998**: CGAGCTTGATGACGAAGAAGGAGA[**T/C**]CGGGCAGACCCACGACGT **EcoRV**: GAT'ATC Allele | Seq --- | --- C | CGAGCTTGATGACGAAGAAG![:color blue](GAGA**C**C)GGGCAGACCCACGACGT T | CGAGCTTGATGACGAAGAAG![:color blue](GAGA**T**C)GGGCAGACCCACGACGT Primer | CGAGCTTGATGACGAAGAAGGA![:color red](**T**)A -- **After PCR** Allele | Seq --- | --- C | CGAGCTTGATGACGAAGAAG![:color blue](GA)![:color red](**T**)![:color blue](A**C**C)GGGCAGACCCACGACGT T | CGAGCTTGATGACGAAGAAG![:color blue](GA)![:color red](**T**)![:color blue](A**T**C)GGGCAGACCCACGACGT --- ## dCAPS example: IWB1998 .pull-left[ #### Steps 1. PCR with dCAPs primers 1. Digest products with EcoRV 1. Run on gel #### Expected results for three genotypes: – Homozygous C/C –181 bp – Homozygous T/T –157, 24 bp – Heterozygous C/T –181, 157, 24 bp ] .pull-right[ ![](files/dCAPS-gel.svg) ] ??? The 20 bp product will run off gel, since we run gel long enough to resolve between 163 and 143 bp --- # CAPS gel scoring ![:scale 100%](files/CAPS-scoring.png) --- ## KASP components ![:scale 100%](files/kasp-components.png) --- ![:scale 95%](files/kasp-steps.png) --- class: center ![:scale 60%](files/KASP-results.png) --- name: inverse class: center, middle, inverse ## That is all the introductions. ## ![:emoji smiley] Let us develop markers ![:emoji smiley] --- ## Steps in primer design .wide-left[ 1. Find the locations on wheat pseudomolecule of the flanking markers - Blast the sequences of your markers against the wheat pseudomolecule 1. Find all the variations between the two parents of your population in the region between two flanking markers - You already have the exon capture data for both parents on [T3](https://triticeaetoolbox.org/wheat/display_genotype.php?trial_code=2017_WheatCAP) - Extract variations only in the target regions 1. Design primers for a certain variations to screen recombinants - Majority will be SNPs, small insertions or deletion. - We will focus on CAPS/dCAPS/KASP ] .narrow-right[ ![](files/QTl7AS-4.svg) ] --- class: graylist ## Requirements of PCR primers - ## Specific -- - Length: 18 - 25 -- - Melting Temperature: around 60 °C -- - GC Clamp: G or C bases within the last five bases in the 3' end helps promote specific binding, but more than 3 G/C should be avoided -- - NO Secondary Structures -- - Avoid Template Secondary Structure or other complex regions, such as retros -- - Amplicon Length: qPCR and dCAPS are short (< 300 bp), other markers usually < 1 kb -- - Primer Pair Tm Difference < 5 C [More information](http://www.premierbiosoft.com/tech_notes/PCR_Primer_Design.html) --- class: graylist ## Primer Design Tips - The first nt in the 3' end is most important; avoid changing this one to introduce mutations in dCAPS primer design; .remark-code[CGAGCTTGATGACGAAGAAGGA![:color red](**T**)] -- - To be more specific, two nt difference in the first 4 bp from the 3' end .remark-code[CGAGCTTGATGACGAAGAA![:color red](**GGAT**)] -- - Sometimes we can introduce 1 mutation in the 3rd nt to make the primer more specific .remark-code[CGAGCTTGATGACGAAGAAG.red.bold[G]A.red.bold[T]] -- - We can add some tails to make the primer longer for dCAPS primer to be able to better separate after digestion .remark-code[.blue[GAAGGTGACCAAGTTCATGCT]CGAGCTTGATGACGAAGAAG.red[**G**]AT] [More reading](http://www.genomecompiler.com/tips-for-efficient-primer-design/) --- ## Primer design software - Primer3 (http://primer3.ut.ee/) - Polymarker for KASP in wheat (https://github.com/TGAC/bioruby-polyploid-tools) - CAPS Designer (https://solgenomics.net/tools/caps_designer/caps_input.pl) - dCAPS Finder 2.0 (http://helix.wustl.edu/dcaps/dcaps.html) - GSP (Genome Specific Primers) (https://probes.pw.usda.gov/GSP/) - **SNP Primer Design** (a galaxy tool, https://galaxy.triticeaetoolbox.org/) --- ## SNP Primer Design ### A pipeline to design .red[KASP/CAPS/dCAPS] primers for SNPS in wheat ### A Python script which incorperates: - **Muscle**: Multiple sequence alignment program (http://www.drive5.com/muscle/) - **Primer3**: program for designing PCR primers (http://primer3.sourceforge.net/) - **blast+**: BLAST the wheat genome (https://blast.ncbi.nlm.nih.gov/Blast.cgi) ### I have a github repository for this tool https://github.com/pinbo/getKASP_pipeline --- ## How .blue[SNP Primer Design] works? 1. Blast each SNP sequence against the CS RefSeqv1.0 and get hits that are - 90% similarity and - 90% of length of the best hit AND > 50 bp 1. Get 500 bps on each side of the SNP for all the hits (.red[your SNP is at 501]) 1. Multiple Sequence Alignment of the homeolog sequences with MUSCLE 1. Find all the variation sites that can differ the target from other homeologs 1. Use these sites as forced 3’ end in Primer3 and design homeolog specific sequences 1. Blast all the primers against the CS RefSeqv1.0 with word length 7 to see whether it also hits other chromosomes - Criterion of matches: < 2 mismatches in the first 4 bps from 3' --- ## Design KASP markers in wheat with T3 Galaxy We will use T3 galaxy: it supports both CS RefSeq v1.0 and v2.0. - https://galaxy.triticeaetoolbox.org - Need to register first (different from your T3 account) --- ## Test KASP After we received the KASP primers, we need to test them before running on big population. - Hydrate the primers to 100 μM - Make KASP primer assay mix (100 μL) | Primer (100 μM) | Volum (μL) | | :--- | :----: | | FAM | 12 | | VIC | 12 | | Common | 30 | | Water | 46 | --- ## Test KASP (cont) - PCR setup (5 μL reaction for both 96-well and 384-well plates) - DNA concentration: 20 ng/μL to 200 ng/μL all works, but lower is better. .pull-left[ | Components | Volum (μL) | | :--- | :----: | | 2x Master Mix| 2.5 | | KASP primer assay mix | 0.07 | | Water | 1.5 | | DNA | 1 | ] --- ## Test KASP (cont) - PCR program |Step| Description |Temp. (̊C) |Time |No. Cycles | |---|---|---|---|:----:| |1 | Enzyme activation | 94 |15 min | 1| |||||| |2 | Template denaturation | 94 |20 secs |10| | | Annealing and extension | 65 -> 57 |60 secs (-0.8 ̊C/cycle)|| |3 | Denaturation | 94 |20 secs |35| | | Annealing and extension | 57 |60 secs| | - If need more cycles .pull-left[ |Temp. (̊C) |Time |No. Cycles | |---|---|:----:| | 94 |20 secs |5 - 10| | 57 |60 secs| | ] --- ## Test KASP (cont) At least: 4 parent A, 4 parent B, 2 heterozygous (or mix of A and B), 2 water. .pull-left[ - Failed ![:scale 80%](files/KASP-test-example-failed.png) ] .pull-right[ - Works ![:scale 80%](files/KASP-test-example-works.png) ] --- ### Characteristics of KASP primers designed by LGC and 3crbio (PACE) I checked KASP primers designed by LGC and 3crbio (PACE) with primer3 default settings. Here are their Characteristics: - Tm: 58 C - 70 C, average 63 C; - Tm difference between left and right primers: 0 - 8 C, average 2.6 C. - Hairpin: 0 - 75 - Product length: 39 - 93 bp, average 57 bp. - Primer length: 15 to 33 bp --- ### Summary for using the Galaxy pipeline 1. **SNP Position to Polymarker Input**: make sure to use the correct genome version (1.0 or 2.0). If you are not sure, ask the person or company about the commands that are used to call SNPs, and versions of the wheat genome and all software used. You will need this for your paper later. 2. **SNP Primer Design**: it does NOT matter which version of the genome to use. You can use the latest version. 2. For the output of the primers: your SNP is at **501** on the target template. 2. For all primers: specificity is the most important and all others can be ignored (such as length, hairpin, Tm etc). --- ### Summary for using the Galaxy pipeline (cont.) 3. For CAPS: 1) specificity: at least 2 nucleotides (nt) difference in the 3' end for at least 1 primer; if not, introduce one in the 3rd nt. 2) Product size can be < 500 bp to reduce PCR time; 3) check the size difference after digestion, at least >50bp 4. For dCAPS: 1) the primer with introduced mutation is at around 500bp position (primer A); check the specificity of the other primer because you cannot do anything about primer A. 2) product size < 300bp; 3) add a FAM tail to the 5' end of the primer A to make the size difference to about 40 bp after the cut; 4) since primer A has an introduced mutation, use 55 C for annealing temperature. 5. For KASP: 1) common primer decides the specificity of your marker, just make sure the 3' end is specific to your template. Based on my experience, this is enough, no need to introduce mutation. 2) small product size, ideally <100 bp. You can set "Pick primer anyway?" to Yes to get all possible common primers, so you have more choices. --- ### Links - KASP marker development for wheat: https://link.springer.com/article/10.1007/s00122-020-03608-x - T3 Galaxy: https://galaxy.triticeaetoolbox.org/ - Manually design KASP primers with Primer3: https://junli.netlify.app/apps/design-primers-with-primer3/