09 October 2017

DNA Analysis: Random is Most Important Factor

Correctly analyzing DNA matches for genetic genealogy is much harder than most researchers may think.

What is the most important thing to remember when interpreting DNA matches to determine relationships?


CC0 License, Debbie Parker Wayne, Random DNA Word Cloud

Researchers must remember that random recombination and mutations make it impossible to predict exactly how much DNA, if any, will be shared by two people, in general. The charts giving shared percentages of 50%, 25%, 12.5%, and so on are based on statistical probabilities. Real life seldom ever exactly matches a statistical probability. One exception is that each person does inherit one-half of the autosomal DNA from each parent.

Any reader of a mail list, forum, or Facebook will constantly see questions such as, "I share XYZ% of DNA with personXYZ. What relationship do we share?" And that reader will see tons of responses such as, "You must be XYZ relationship." The more savvy researchers will indicate there are several likely relationships and point to charts such as The Shared cM Project.1 There are also some tools, such as the matrices on GEDmatch.com and the relationship predictions made by the testing companies, that use the statistical shared percentages to predict relationships.

Researchers must remember to use these predictions only as clues and not as a hard-and-fast limit to accurately analyze DNA findings.

The first chart below uses GEDmatch matrix tools to demonstrate how even full siblings can share widely varying amounts of DNA with a DNA match. Four full siblings are compared to a known fourth cousin. One sibling shares only 12.4 cM of atDNA, one shares 19.8 cM, one shares 50.2 cM, and one shares 52.1 cM. The second chart shows that the GEDmatch generations matrix tool predicting the number of generations between the test-takers varies from 4 to over 7. (GEDmatch changes the order of the siblings in the different matrix views.)


© 2017, Debbie Parker Wayne,
GEDmatch Shared atDNA Matrix, Siblings to 4C

>
© 2017, Debbie Parker Wayne,
GEDmatch Generations Matrix, Siblings to 4C

Blaine published "The Shared cM Project" data using a Creative Commons License which gives permission for others to use and adapt the data as long as the adaptation is also made freely available and follows a few other restrictions.

Jonny Perl at DNA Painter adapted the data to create a Shared cM Project tool that highlights relationships that have been shown to share a specified amount of DNA. There are some differences in the highlighted relationships for 12.4 and 52.1 shared cM as shown in the images below.


CC0 License, Jonny Perl,
DNA Painter Shared cM Project Tool, Heading


CC0 License, Jonny Perl,
DNA Painter Shared cM Project Tool, 52.1 shared cM


CC0 License, Jonny Perl,
DNA Painter Shared cM Project Tool, 12.4 shared cM

The moral of the story is, as the Genetic Genealogy Standards indicate, there may be more than one way to interpret DNA test results:
19. Interpretation of DNA Test Results. Genealogists understand that there is frequently more than one possible interpretation of DNA test results. Sometimes, but not always, these possible explanations can be narrowed by additional testing and/or documentary genealogical research. Genealogists further understand that any analysis of DNA test results is necessarily dependent upon other information, including information from the tester, and that the analysis is only as reliable as the information upon which it is based.2


1. Blaine T. Bettinger, "The Shared cM Project," The Genetic Genealogist (https://thegeneticgenealogist.com/). Search the blog posts for the most recent update to the project.
2. Genetic Genealogy Standards Committee, Genetic Genealogy Standards(http://www.geneticgenealogystandards.com/).


To cite this blog post: Debbie Parker Wayne, "DNA Analysis: Random is Most Important Factor," Deb's Delvings, 9 October 2017 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2017, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

3 comments:

  1. Thanks Debbie - this should be required reading in any class on autosomal DNA.

    ReplyDelete
  2. I use the probabilities in this table to determine which "group" of relationships is most likely for each match.
    http://thednageek.com/the-limits-of-predicting-relationships-using-dna/

    I've got a very rough version of a tool that will do the same for multiple matches. Hoping to be able to share it soon!

    ReplyDelete