10 October 2017

Selected References for Math, Biology, and DNA Testing Company Algorithms and Features

My blog post on "DNA Analysis: Random is Most Important Factor" generated a discussion that resulted in some questions on where to find more about probability and the algorithms companies are using when analyzing our DNA data.

I have been working to update and merge my online DNA bibliography and the bibliography I provide to students attending the DNA courses at institutes, but it is still a work in progress. The following list is far from comprehensive, but lists some selected resources that provide useful information.

I tried to find public links for all of these, but some may require you to login to the website to access the papers. All URLs were accessed 10 October 2017.

Any Topic Related to Genetic Genealogy

Whenever I want to learn more about any topic related to genetic genealogy, I check the following sources first. These are all currently active and written by experienced genetic genealogists who also have scientific or engineering backgrounds.

Probability and Statistics

Biology and Genetics

Newer editions of these are available. I wanted it for the basics of biology and genetics so this older version covered everything I needed and was more economical.

  • Robert J. Brooker, Genetics: Analysis & Principles 4th ed. (New York: McGraw Hill, 2012; https://www.amazon.com/Robert-J-Brooker-Genetics-Principles/dp/B008UBBDDY/)

  • Robert J. Brooker and Johnny El-Rady, Student Study Guide / Solutions Manual to accompany Genetics: Analysis & Principles 4th ed. (New York: McGraw Hill, 2012; not available on Amazon when I checked recently).



Family Tree DNA


And, of course, I recommend the book that Blaine T. Bettinger and I co-authored:
Blaine T. Bettinger and Debbie Parker Wayne, Genetic Genealogy In Practice, published in September 2016 by the National Genealogical Society (NGS).

To order the print version, click here, then click the cover image on the displayed page or go directly to the online store. Price is $30.06 for NGS members, $36.05 for non-members. The print version is best for working the exercises.

For the Kindle version ($9.99), click here.

Note: As an author I receive royalties on sold copies of Genetic Genealogy In Practice. I receive no incentives from any other entities named in this post.

11 October 2017: Corrected spelling of the name of one author.

To cite this blog post: Debbie Parker Wayne, "Selected References for Math, Biology, and DNA Testing Company Algorithms and Features," Deb's Delvings, 10 October 2017 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2017, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

09 October 2017

DNA Analysis: Random is Most Important Factor

Correctly analyzing DNA matches for genetic genealogy is much harder than most researchers may think.

What is the most important thing to remember when interpreting DNA matches to determine relationships?

CC0 License, Debbie Parker Wayne, Random DNA Word Cloud

Researchers must remember that random recombination and mutations make it impossible to predict exactly how much DNA, if any, will be shared by two people, in general. The charts giving shared percentages of 50%, 25%, 12.5%, and so on are based on statistical probabilities. Real life seldom ever exactly matches a statistical probability. One exception is that each person does inherit one-half of the autosomal DNA from each parent.

Any reader of a mail list, forum, or Facebook will constantly see questions such as, "I share XYZ% of DNA with personXYZ. What relationship do we share?" And that reader will see tons of responses such as, "You must be XYZ relationship." The more savvy researchers will indicate there are several likely relationships and point to charts such as The Shared cM Project.1 There are also some tools, such as the matrices on GEDmatch.com and the relationship predictions made by the testing companies, that use the statistical shared percentages to predict relationships.

Researchers must remember to use these predictions only as clues and not as a hard-and-fast limit to accurately analyze DNA findings.

The first chart below uses GEDmatch matrix tools to demonstrate how even full siblings can share widely varying amounts of DNA with a DNA match. Four full siblings are compared to a known fourth cousin. One sibling shares only 12.4 cM of atDNA, one shares 19.8 cM, one shares 50.2 cM, and one shares 52.1 cM. The second chart shows that the GEDmatch generations matrix tool predicting the number of generations between the test-takers varies from 4 to over 7. (GEDmatch changes the order of the siblings in the different matrix views.)

© 2017, Debbie Parker Wayne,
GEDmatch Shared atDNA Matrix, Siblings to 4C

© 2017, Debbie Parker Wayne,
GEDmatch Generations Matrix, Siblings to 4C

Blaine published "The Shared cM Project" data using a Creative Commons License which gives permission for others to use and adapt the data as long as the adaptation is also made freely available and follows a few other restrictions.

Jonny Perl at DNA Painter adapted the data to create a Shared cM Project tool that highlights relationships that have been shown to share a specified amount of DNA. There are some differences in the highlighted relationships for 12.4 and 52.1 shared cM as shown in the images below.

CC0 License, Jonny Perl,
DNA Painter Shared cM Project Tool, Heading

CC0 License, Jonny Perl,
DNA Painter Shared cM Project Tool, 52.1 shared cM

CC0 License, Jonny Perl,
DNA Painter Shared cM Project Tool, 12.4 shared cM

The moral of the story is, as the Genetic Genealogy Standards indicate, there may be more than one way to interpret DNA test results:
19. Interpretation of DNA Test Results. Genealogists understand that there is frequently more than one possible interpretation of DNA test results. Sometimes, but not always, these possible explanations can be narrowed by additional testing and/or documentary genealogical research. Genealogists further understand that any analysis of DNA test results is necessarily dependent upon other information, including information from the tester, and that the analysis is only as reliable as the information upon which it is based.2

1. Blaine T. Bettinger, "The Shared cM Project," The Genetic Genealogist (https://thegeneticgenealogist.com/). Search the blog posts for the most recent update to the project.
2. Genetic Genealogy Standards Committee, Genetic Genealogy Standards(http://www.geneticgenealogystandards.com/).

To cite this blog post: Debbie Parker Wayne, "DNA Analysis: Random is Most Important Factor," Deb's Delvings, 9 October 2017 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2017, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved