11 September 2021

DNA Painter's Shared cM Tool — Ranges, Probabilities, and Histograms

(Click on any image to view a larger version.)

Many genealogists are familiar with using the tools at DNA Painter to find clues to the possible relationship when the relationship is not known between two DNA test takers. The clues are based on how much DNA in centiMorgans (cM) the two share. The tools can also provide evidence to disprove or support a known relationship. Anyone who has worked with DNA for very long knows that what we “know” or “believe to be true” may be disproven by DNA evidence. Even when we think a relationship is known based on family memories, the conclusion is more sound with supporting DNA evidence.

The data displayed by the “The Shared cM Project 4.0 tool v4” at DNA Painter (https://dnapainter.com/tools/sharedcmv4) is a collaboration between Blaine T. Bettinger, Leah LaPerle Larkin, and Jonny Perl.

“The Shared cM Project 4.0 tool v4” provides three data elements for analysis of relationship possibilities based on the amount of autosomal DNA (atDNA) shared by two test takers:
  1. the average amount of atDNA shared and the range of values (low-to-high) reported by actual project participants,
  2. the probabilities (based on simulated data) for various relationship possibilities based on the amount of shared atDNA entered, and
  3. a histogram of the reported data for a given relationship as calculated from the Shared cM Project reported data1
Details are provided here for using these data elements on an example case where BJ04 shares 111.32 cM of DNA with DP5, known to be a first cousin twice removed (1c2r).

Using the Shared DNA amounts

Comparing the shared DNA between two people of a specific relationship is easily done using the chart at https://dnapainter.com/tools/sharedcmv4. Obtain the number of cMs shared by two test takers and determine if it falls within the range specified on the chart for the relationship believed to be shared by the two. This can be done by manually searching the chart for the known relationship box, then comparing the amount of DNA shared by the test takers to the range of values listed in the box.

Entering 111.32 into the tool highlights the blocks for many possible relationships (some shown in figure 1) that can share this amount of DNA. The 1c2r block is one of those highlighted. This block indicates 1c2r share an average of 221 cM, with the reported range being 33-471 cM.


Figure 1. Shared cM Tool on DNA Painter with only potential relationship blocks highlighted.


Using the Relationship Probabilities

Figure 2 illustrates the relationship probability display. The probabilities, displayed after entering a cM value in the “Filter” box, are best used to determine the most likely place to begin looking for a family link when the relationship is unknown. When a relationship is known, the probabilities primarily are used for a possible or not possible indication. A low probability percentage does not rule out the possibility of that relationship unless the probability percentage is zero (0). If the probability percentage displayed is zero then that relationship is impossible based on the amount of shared DNA (assuming no pedigree collapse or endogamy). Any non-zero probability percentage, even a very low probability, is possible. Low probabilities generally need more supporting evidence.

Random recombination of DNA can result in some test takers sharing more or less than the expected amount of DNA for a specific relationship. Relationships more distant than second cousins may result in no shared DNA at all even though the genealogical relationship is real.2

The probability numbers should be evaluated within the context of all of the evidence, DNA and documentary. The probabilities should not be used alone as confirmation of a relationship; again, any probability other than 0 indicates the relationship is possible. Probabilities for relationships for which more DNA is shared are more likely to be accurate. For more distant relationships the smaller amount of shared DNA can be typical of multiple relationships. Some probability percentages may be lower, but are consistent with the hypothetical relationship as long as the probability is not zero.

In this example shown in figure 2, there is a documented 1c2r relationship between two test takers who share 111.32 cM of atDNA. The probability that they are 1c2r is 9%, the fourth most likely percentage offered with twelve more likely relationships, four other relationships that are as likely as 1c2r, and three relationships less likely but still possible. All of these are possibilities as the probability percentage is not zero.

In cases like this, with a 9% probability, more information will be needed.


Figure 2. Relationship probability display for 111.32 shared cM from the Shared cM Tool on DNA Painter.


Using the Histograms

Histograms are displayed by clicking on a relationship box in the Shared cM Chart on the DNA Painter website. The histograms, illustrated in figure 3, are used to determine if the amount of shared DNA is at the peak of the curve, within the curve, or an outlier on the far shoulders of the histogram curve. Shared cM values that fall far out on the shoulders of the histograms or outside of the reported range are known as outliers. Outliers require additional investigation and more evidence to be accepted as true; often more test takers are needed. Outliers will often lead to more tentative or qualified hypotheses and need more explanation in the analysis.3

Figure 3 is an annotated image of the histogram displayed for the 1c2r relationship. The added red markings indicate the peak of the curve, position of values described as “within the curve,” and outliers (on the shoulders) of the curve. Numbers outside of the range of shared values are also considered outliers.

To determine where on the curve a DNA match falls, look along the horizontal axis for the numbers closest to the amount of DNA shared by two test takers. Mentally note the location for the shared cM number on the horizontal axis. Mentally draw a line up to intersect the curve. Determine if the amount of DNA shared by the two test takers is at the peak or within the curve of the histogram or if the value may be an outlier (on the shoulders of the curve or outside of the range of the curve).

Using figure 3, the number 111 falls closer to the number 100 than to 150; 111 is not far out on the shoulder of the curve, but it is not high within or near the peak of the curve.


Figure 3. Annotated image of the histogram displayed by the Shared cM Tool on DNA Painter for the 1c2r relationship.


Table 1 correlates the information gathered so far about the relationship between BJ04 and DP5—shared DNA amounts from the testing company website and also Shared cM average and range, probabilities, and placement on the histogram for the known relationship from DNA Painter.



Again, more evidence is needed. Additional family members can be tested or the DNA match list may contain serendipitous matches that add evidence to this analysis.

Gathering More Evidence

In this case, multiple siblings and a first cousin of DP5 were tested or were already listed in the DNA Match Lists. The shared DNA amounts between these test takers is shown in table 2. BJ04 is a 1c2r to all other persons listed in the table. DJP, RAP, DP4, and DP5 are full siblings. DGS is a half sibling to DJP, RAP, DP4, and DP5. PCF is a first cousin to DJP, RAP, DP4, DP5, and DGS.

The full siblings DJP, RAP, DP4, and DP5 all share between 2526.74 and 2932.92 cM (yellow highlights in table 2). The Shared cM Tool predicts a 97% to 100% likelihood of a full sibling relationship. The half sibling, DGS, shares between 1638.74 and 2077.44 cM with the other siblings (bold, red values in table 2). The Shared cM Tool predicts a 90% to 100% likelihood of a half sibling relationship. The siblings share between 865.03 and 997.11 cM with their first cousin (see the row or column for PCF in table 2). The Shared cM Tool predicts a 97% to 100% likelihood of a first cousin relationship. These probabilities indicate the known relationships are likely true.

These numbers also demonstrate the amount of shared DNA between the siblings and their first cousin is consistent with their relationships to each other. The numbers are within the Shared cM ranges and well within the histogram curves.



All of these added cousins share more DNA with BJ04 than DP5 shares (see the row or column for BJ04 in table 2). Entering these cM numbers into the Shared cM Tool on DNA Painter indicates a 1c2r relationship probability of between 34% to 62% (see table 3).

The higher probabilities that BJ04 is a 1c2r to others in the study than the 9% probability of being a 1c2r to DP5 adds more credibility to the conclusion that BJ04 is a 1c2r to all the others in this study, including DP5.

Table 3 correlates the information gathered so far about the relationship between BJ04 and all other test takers in this study—shared DNA amounts from the testing company website and also Shared cM average and range, probabilities, and placement on the histogram curve for the known relationship from DNA Painter.



The documentary research (not detailed in this blog post for privacy purposes) clearly supports the known relationships. The birth certificates, census enumerations, death certificates, obituaries, and memories of living family members are all consistent with the claimed relationships.

Together, the DNA and documentary evidence strongly support the hypothesized relationships even though, alone, the amount of DNA shared by BJ04 and DP5 was not conclusive.

This is a clear case where testing siblings and first cousins can provide DNA evidence to help answer some questions. The 111.32 cM shared by BJ04 and DP5 is low on the histogram curve. This number is not far out on the shoulders of the histogram, but it is definitely on the waning, low end of the curve.

Conclusion

Astronomer Carl Sagan made the statement, “extraordinary claims require extraordinary evidence." Wikipedia calls this statement the “Sagan standard” and tells us, "The standard illustrates a core principle of the scientific method and skepticism and can be used to assess the validity of a claim." Sagan wasn't the first to make this claim, but Cosmos may have been the first exposure for many of us.4

When the probability of a relationship is low, as it is when only 111.32 cM are shared by 1c2r, it is necessary to provide supporting evidence to increase the credibility of the conclusion. Here, the amount of shared DNA between more siblings and their first cousin with the person known to be a first cousin twice removed (1c2r) provided more evidence. If the only two test takers available were DP5 and BJ04 a genealogist could assume the 1c2r relationship might be an error. The low likelihood of the relationship of 1c2r between DP5 and BJ04 needs supporting evidence to make it more credible even though the 9% probability indicates the relationship is possible. Adding the DNA evidence from more close cousins and the documentary evidence for these relationships provides strong evidence that the relationship has been correctly identified.

For more information on the randomness of DNA inheritance see my earlier blog post, "DNA Analysis - Random is the Most Important Factor," Deb’s Delvings in Genealogy, 9 October 2017 (http://debsdelvings.blogspot.com/2017/10/dna-analysis-random-is-most-important.html).


1. Blaine T. Bettinger, Leah LaPerle Larkin, and Jonny Perl, “The Shared cM Project 4.0 tool v4,” DNA Painter (https://dnapainter.com/tools/sharedcmv4).
     Bettinger provided the Shared cM data, self-reported by (and likely to contain some errors) from actual test taker data. Blaine T. Bettinger, “The Shared cM Project, Version 4.0 (March 2020),” The Genetic Genealogist, PDF online (https://thegeneticgenealogist.com/wp-content/uploads/2020/03/Shared-cM-Project-Version-4.pdf). Page 5 covers data collection methods. Pages 8-51 cover the histograms. Also see Blaine T. Bettinger, “Shared cM Project,” The Genetic Genealogist (https://thegeneticgenealogist.com/2020/03/27/version-4-0-march-2020-update-to-the-shared-cm-project/). Also see “Collecting Sharing Information for Known Relationships,” The Genetic Genealogist (https://thegeneticgenealogist.com/2015/03/04/collecting-sharing-information-for-known-relationships/) and “Collecting Sharing Information for Known Relationships – Part II”, (https://thegeneticgenealogist.com/2015/04/06/collecting-sharing-information-for-known-relationships-part-ii/).
     Larkin provided the underlying data for the probability indications based on Ancestry.com simulated, statistical data. Also see Leah LaPerle Larkin, “The Limits of Predicting Relationships Using DNA,” 19 December 2016, The DNA Geek (https://thednageek.com/the-limits-of-predicting-relationships-using-dna/); probabilities based on simulated data citing “AncestryDNA Matching White Paper,” AncestryDNA, 31 March 2016 (https://www.ancestry.com/dna/resource/whitePaper/AncestryDNA-Matching-White-Paper).
     Perl developed the website where these tools are housed, the user interfaces, and other tools on the site. These include the Chromosome Mapper that was the initial offering at DNA Painter as well as the Cluster Auto Painter, Inferred Segment Generator, cM Estimator, several tree and pedigree tools, and more. Also see the online help at https://dnapainter.com/help.

2. Blaine T. Bettinger, “Q&A: Everyone Has Two Family Trees – A Genealogical Tree and a Genetic Tree,” The Genetic Genealogist, 10 November 2009 (https://thegeneticgenealogist.com/2009/11/10/qa-everyone-has-two-family-trees-a-genealogical-tree-and-a-genetic-tree/). Also “Cousin statistics,” International Society of Genetic Genealogists Wiki (https://isogg.org/wiki/Cousin_statistics).

3. Blaine T. Bettinger, “The Shared cM Project Version 4.0 (March 2020),” PDF online (https://thegeneticgenealogist.com/2020/03/27/version-4-0-march-2020-update-to-the-shared-cm-project/), 8–18.

4. "Sagan standard," WikiPedia, The Free Encyclopedia (https://en.wikipedia.org/wiki/Sagan_standard).



All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "DNA Painter's Shared cM Tool — Ranges, Probabilities, and Histograms," Deb's Delvings in Genealogy, 11 September 2021 (http://debsdelvings.blogspot.com/2021/09/dna-painters-shared-cm-tool-ranges.html : accessed [date]).

© 2021, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

No comments:

Post a Comment