Y-Chromosome Single Nucleotide Polymorphism testing

From FamilySearch Wiki
Jump to navigation Jump to search

Y-Chromosome Single Nucleotide Polymorphism testing aka Y-SNP testing explained

Y-Chromosome Single Nucleotide Polymorphisms or Y-SNPs (pronounced as “snip”) are used in Y- Chromosomal DNA (Y-DNA) testing for Haplogroup and Haplotype confirmation.[1] SNPs are defined as: “A single-nucleotide polymorphism (SNP is pronounced snip) is a DNA sequence variation occurring when a single nucleotide adenine (A), thymine (T), cytosine (C), or guanine (G]) in the genome (or other shared sequence) differs between members of a species or paired chromosomes in an individual.” SNPs need to be confirmed by specific DNA SNP tests and are resolved in series of SNPs from basic Haplogrouping into the specific haplotypes or sub-types. Haplogroups are measured in the thousands and tens of thousands of years.[2]

In general Haplogroups can be generally estimated[3] by the first 10 or so STRs (Short tandem repeats)[4] or Y-DNA markers, but the actual haplotype must be confirmed with a specific SNP test.[5]

Color coding and confusion[edit | edit source]

DNA testing companies like FTDNA.com (Family Tree DNA) use the color green to represent a positive ( + ) or derived SNP test. Those estimated are in red and are not tested for that SNP. Past testing of SNPs are also listed in green confirming the last positive ( + ) tested at that time. This means those who SNP tested many years ago is a lower resolution SNP test compared to today's SNP testing. The newer SNPs are confirmed as by the Big Y test chip as at FTDNA. Without understanding this, it appears that Y-DNA genetic groups with a common genealogical ancestry can appear to have multiple levels or types of positive or tested SNPs when that is not the actual fact. And this can cause much confusion because of the older lower resolution SNP tests tested then verses the newer ones tested for today.

For an example of this at FTDNA, please see the Carpenter Cousins Y-DNA Project[6] version at FTDNA: https://www.familytreedna.com/public/carpenter%20cousins%20%20dna/default.aspx?section=ycolorized - Group 2 provides an example of different green colored SNP names tested at different times.

Background[edit | edit source]

Before we go into the understanding or resolution of such SNP confusion, the next couple of paragraphs provide a quick review of some important points.

It is very important to understand that “Genetic Genealogy”[7] is genealogical techniques using specific DNA testing to help focus research or over come choke points in genealogical research and that this is often done by triangulation.[8] And that “Genetic Genealogy” is NOT genetic anthropology, the DNA study of humankind.[9]

The time before genealogical records is often referred to as "Deep Ancestry."[10] And any DNA or genetic ancestry testing at this level is anthropologic or delves into DNA data points of genetic anthropology. Genealogy uses personal data (documentation) to make familia connections from one person to another. Where as anthropology uses impersonal unrelated data points using computational models involving things such as geological time and place of material to make sense or to determine a logical sequence of those data points.[11]

Y Chromosomal SNP (Y-SNP) testing[12] determines the Haplogroup and its sub-types or haplotypes from the very deep of human ancestry towards the present. This is done in a mathematical progression that is subject by several variables and the results must be taken as an estimational figure. One example: YSNP prediction or methodology closely maps the classic S-Curve of the classic binary logistic regression formula: y=exp(a+b*x)/(1+exp(a+b*x)).[13]

Y-SNPs are expressed in either a longhand format or a shorthand format. And this is where more (most!) of the confusion occurs in understanding SNPs.

Longhand verses Shorthand SNP designations[edit | edit source]

The long hand format was a re-organization of over 15 very different regional models completed by the Y Chromosome Consortium (YCC) [14] via a scholarship group that provided compiled information on the YCC Repository, NRY polymorphisms and changes in nomenclature. The nomenclature system published in 2002 and updated in 2008 is widely used in papers on Y chromosome variation today. While the YCC is no longer active the International Society of Genetic Genealogy (ISOGG)[15] web-based Y-DNA Haplogroup Tree[16] continues the longhand methodology of the YCC nomenclature.

The shorthand format focus on the major Haplogroup followed by the estimated (unconfirmed or red color value) or the confirmed (green or derived (+) value) SNP name. This specific focus on the SNP name can cause major confusion when one does not understand the Y-DNA Tree in general. Simply Haplogroup A comes before B and this sequence is generally (mostly!) followed down to Haplogroup T. There has been changes in the estimated time of various Haplogroups since causing adjustments to be made. To see the basic phylogentic structure, see: https://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroup#Phylogenetic_structure

Using Haplogroup R as an example, it has sub-groups or haplotypes defined by confirmed SNPs. A brief view of its sub-tree can be seen at: https://en.wikipedia.org/wiki/Haplogroup_R_(Y-DNA)#Structure – Please the pedigree shows defined SNPs with its longhand classification.

Another example of the Y-Chromosome Phylogentic Tree (Y-Tree) breakdown comes from the Carpenter Cousins Y-DNA Project[17] and the example of FTDNA kit number 5734 at: https://carpentercousins.com/RealDeepAncestry.pdf[18] In this example, it follows the longhand progression of Haplogroups to their haplotypes along with the specific SNPs tested with an estimation of when those specific SNPs occurred. I cite that specific portion here with its references removed. For its references see the link cited just above.

Y Chromosome Phylogenetic Tree breakdown by
Major Haplogroups then Haplotypes:
A (Sample SNPs M42, PR2921, M94, etc) – abt 140,000 years ago
B (SNP M168, P9, M181) – abt 65,000 years ago
F (SNP M89, M213) – abt 50,000 years ago
K (SNP M9, P128) – abt 48,000 years ago
P (SNP M45, M74) – abt 39,000 years ago
R (SNP M207, M306) – abt 32,000 years ago
R1 (SNP M173, M306, P225) – abt 26,000 years ago
R1a (SNP M511, M513, M420) – abt 23,000 years ago
R1a1 (SNP M459, SRY 10831.2) – abt 21,000 years ago
R1a1a (SNP M17, M198) – abt 15,000 years ago
R1a1a1 (SNP M417) – abt 7,000 years ago
R1a1a1b (SNP Z645, S441, CTS4385) – abt 6,500 years ago
R1a1a1b1 (SNP Z283, S339, PF6162) – abt 6,000 years ago
R1a1a1b1a (SNP Z282, S198) – abt 5,500 years ago - short hand code example: R-Z282
R1a1a1b1a3~ (SNP Y2395) – abt 5,000 years ago
R1a1a1b1a3b? (SNP Z284) – maybe abt 4,000 years ago
R1a1a1b1a3c~ (SNP YP694) – maybe abt 3,000 years ago
Ria1a1b1a3c~? (SNP YP6281) – maybe abt 2,500 years ago

For the current view of R HaploTree from ISOGG, please see: https://isogg.org/tree/HaplogroupR2019.html - See link there to download the current version into Excel or similar spreadsheet. For example: Do a keyword search or find function for: Z282 to see the example cited above.

Conclusion[edit | edit source]

In conclusion, the older or lower resolution SNPs are higher on the tree. The lower the defined SNP is on the tree, it is farther away in time from the top of the tree. This is where you will find the newer or higher resolution SNPs.

Please remember that SNPs are useful for "Deep" Ancestry" or for the time AFTER the genealogical time period.

References[edit | edit source]

  1. See the International Society of Genetic Genealogy (ISOGG) Wiki article: “Single-nucleotide polymorphism” See more at: https://isogg.org/wiki/Single-nucleotide_polymorphism
  2. See the ISOGG Wiki article: “Haplogroup” at: https://isogg.org/wiki/Haplogroup

    See also: The Wikipedia article on “Human Y-Chromosome DNA haplogroup” at: https://en.wikipedia.org/wiki/Human_Y-chromosome_DNA_haplogroup

  3. See the ISOGG Wiki article: “Y-DNA tools” at: https://isogg.org/wiki/Y-DNA_tools
  4. See the ISOGG Wiki article: “Short tandem repeat” at: https://isogg.org/wiki/Short_tandem_repeat
  5. See the ISOGG Wiki article: “Y-SNP testing” at: https://isogg.org/wiki/Y-SNP_testing
  6. Carpenter Cousins Y-DNA Project is a surname project handling genealogy and DNA testing. The Carpenter Cousins Y-DNA Project shows the results of genetic triangulation in its results tables that are cross linked (via Table 1) to the related individual and group lineages. See more at: https://carpentercousins.com
  7. See the ISOGG article: “Genetic genealogy” at: https://isogg.org/wiki/Genetic_genealogy
  8. See the ISOGG article: “Triangulation” at: https://isogg.org/wiki/Triangulation
  9. See the ISOGG article: Genetic anthropology” at: https://isogg.org/wiki/Genetic_anthropology

    See also the related Wikipedia article: “Molecular anthropology” at: https://en.wikipedia.org/wiki/Molecular_anthropology

    See also the related article: “Population Genetics” from the University of California Davis Anthropology Department at: http://anthropology.ucdavis.edu/research/evolutionary-anthropology-research/population-genetics

  10. See the University College London article: “Understanding genetic ancestry testing” by the Molecular and cultural Evolution Lab at: https://www.ucl.ac.uk/mace-lab/debunking/understanding
  11. Article: “Inferring Genetic Ancestry: Opportunities, Challenges, and Implications” by Charmaine D. Royal,1, John Novembre,2 Stephanie M. Fullerton,3 David B. Goldstein,1 Jeffrey C. Long,4 Michael J. Bamshad,5 and Andrew G. Clark6 dated 14 May 2010 at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2869013/
  12. See the ISOGG Wiki article: “Y-DNP testing” at: https://isogg.org/wiki/Y-SNP_testing
  13. Article: “Binary Logistic Regression - The mathematical model for YSNP prediction based on YSTR signatures” By Robert Casey, Revised Version, March 7, 2017.
  14. See ISOGG Wiki article: “Y Chromosome Consortium,” at: https://isogg.org/wiki/Y_Chromosome_Consortium
  15. International Society of Genetic Genealogy (ISOGG) - The mission of the International Society of Genetic Genealogy is to: Advocate for and educate about the use of genetics as a tool for genealogical research while promoting a supportive network for genetic genealogists. – See: https://isogg.org/
  16. ISOGG Y-DNA Haplogroup Tree 2019 – see: https://isogg.org/tree/
  17. Carpenter Cousins Project at: https://carpentercousins.com
  18. For another example, see:“R-M269 verses R1b1a1a2” at: https://www.carpentercousins.com/R-M269.pdf