30 July 2018

Texas State GS 2018 Annual Conference Schedule

Texas State Genealogical Society (TxSGS) has announced the conference schedule for San Antonio on 2-4 November 2018. This year celebrates 300 years of history since the founding of San Antonio with three days (and over 3,000 minutes) of great genealogy education!


Registration is available at http://www.txsgs.org/txsgs-2018-conference-registration/.

Lodging and venue information is available at http://www.txsgs.org/2018-conference-lodging-and-venue/.

Nearby restaurant information is available at http://www.txsgs.org/2018-conference-restaurants/. Note that lunches are not available through the conference this year.

The conference schedule details and list of sessions are available at http://www.txsgs.org/2018-txsgs-schedule-overview/. Exciting sessions start at 9:30 a.m. on Friday and run through 4:30 p.m. on Sunday.

If you are like me and need the conference schedule in a grid format to make your session choices you can use the grid I created at http://debbiewayne.com/temp/Txsgs_2018_schedule_grid.pdf.

This year I will be presenting a two-hour workshop on autosomal DNA analysis from 1:30 to 3:30 Friday afternoon. This workshop has limited seating available and an add-on cost of $30.

I will be presenting "Organizing Genetic Genealogy" at 11:00 on Saturday and "Documenting DNA Analysis" at 2:00 on Saturday. I am scheduled before and after lunch; it will be a busy mid-day on Saturday.

Our plan is to unveil our new Early Texas DNA Project website at this conference. I will be answering questions and featuring the website at a TxSGS booth when I am not speaking.


Speakers include Mic Barnette, Jim Brewster, Evan Christensen, Schelly Talalay Dardashti, Debra Dudek, Mary Esther Escobedo, Patti Gillespie, Sharon Gillins, Sara Gredler, Colleen Greene, Tony Hanson, Kevin Klaus, Devon Noel Lee, Janice Lovelace, Bernard N. Meisner, Kelvin Meyers, Betsy Mills, Laurel Neuman, David Passman, Lisa Reed, Diane L. Richard, Mary Kircher Roddy, Lisa Toth Salinas, Carl Smith, Kathy Strauss, Michael L. Strauss, Pam Vestal, Eric Wells, and Ari Wilkins.

Normal sessions include diverse topics with research tips on African Americans, DNA, Germans, Hispanics, land, methodology, military, publishing and preservation, records and repositories, Russian Jewish immigrants, and technology.

Workshops include Autosomal DNA Analysis, Metadata and Digital Archiving Your Family History Photos and Documents, Researching Your World War II Ancestors, and Spreadsheets 101—An Excel-lent Hands-on Tutorial.

There is something for every researcher at every knowledge level. I hope to see you there.



Debbie Parker Wayne will receive remuneration as a speaker for this conference and is a board member as the DNA Project Chair.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "Texas State GS 2018 Annual Conference Schedule," Deb's Delvings, 30 July 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

21 July 2018

Learning DNA and Getting Help with Analysis Tools

More people are jumping into DNA testing and genetic genealogy who are not experienced in DNA or genealogy before taking that first DNA test. Joining a social media group or a mail list or forum provides exposure to many programs and tools, terms, and techniques that make it seem like a fire hose is aimed at you at full blast.

It is great to jump in. It is great to ask questions to learn. But you never know how much the person answering you knows. And they may not even know they are giving you information that is not completely accurate because they misunderstood your question.

Below are some places to (1) learn more about DNA and (2) get better help when one of the DNA tools does not work as you expected.

To learn the basics of DNA you can

Start small when learning something new and build up to higher levels. This applies to studying DNA using the recommendations above and to learning new tools.


When learning a new tool or process test first with a small dataset. For example, when I first downloaded the version of Progeny Charting Companion that creates DNA analysis charts, I created a small RootsMagic database with only four DNA test takers and the direct lines back to their shared ancestors (as shown in the chart above). I created a dummy CSV file with the minimum amount of data needed for those test takers' DNA data (as defined in the Charting Companion's help files). I used this small dataset to play with the charts offered by Charting Companion until I understood how the options worked to get the output I desired. Once I was comfortable using the tool I then accessed my full RootsMagic database after adding the new facts needed for DNA charts to work properly (like DNA kit numbers for each test taker).


After you begin using a new tool, it may not always work as expected and you may need help. To get better help when one of the DNA analysis tools does not work as expected (most of this applies to any program or app)
  • read the instructions (built-in help files, a user's guide, how-to instructions on the program's website)
  • really read the instructions—do not just scan them—and be sure you followed every step carefully, including the steps that are linked into or referenced from the first help page you access (most problems are due to not following instructions; trust me on this, I worked tech support and trained computer users for much of my "life before genealogy")
  • if you followed the instructions carefully and still have problems, make note of any error messages displayed (or failure mode) and step-by-step what you did just before the failure or error
  • use Google or another engine to search for the error message or failure mode (if the program uses Facebook to offer technical support, use Facebook's "Search this group" feature)
  • if potential solutions are found try them
  • if no solution is found by searching or the solutions found do not work for you, then post a message asking for help; include
    • the tool name and version of the tool you are using (also indicate if you recently updated the tool)
    • the error message received or exactly what you saw that was not "right"
    • the step-by-step list of what was done before the error message was received or the program failed
    • whether you are using a Windows, Mac, Android, iOS, or other device and the version of that operating system
    • whether this is something that worked in the past or this is your first time to try this procedure

These recommendations should help you get better technical support and help you learn new programs and DNA analysis more productively.

Update 23 July 2018: Fixed minor typo, added NGS online training courses, and added to disclaimer royalties for courses and books.


All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.

Debbie Parker Wayne receives royalties for the NGS course she authored on autosomal DNA analysis and books for which she is an author or editor.


To cite this blog post:
Debbie Parker Wayne, "Learning DNA and Getting Help with Analysis Tools," Deb's Delvings, 21 July 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

DNA Simulation added to DNA Matrix in Progeny Charting Companion

Pierre Clouthier, president of Progeny Genealogy, has been raising the bar for DNA analysis charts in a genealogy program for the last year or so. He was the first to automate the "McGuire Chart" in his "DNA Matrix" tools. See http://debsdelvings.blogspot.com/2017/03/wanted-genetic-genealogy-analysis-tools.html and http://debsdelvings.blogspot.com/2017/06/one-dna-analysis-chart-process.html for more info.

Progeny just released Charting Companion version 7 with a major addition to help during DNA analysis as described in his announcement below (URLs were changed to go directly to the Progeny website and not to the advertising site in the email sent to me so as not to mess with stats from email accesses).

Charting Companion 7 features a new technology to help place adoptees and orphans in a family tree: the DNA Simulation. Based on the DNA Matrix, the DNA Simulation will construct a Descendant tree, then will systematically try to link the "orphan" to every person in the tree, one at a time. Charting Companion will validate the tree by calculating the expected centiMorgan (cM) implied by the hypothetical relationship, and comparing it to the actual laboratory DNA test results. Each iteration is called a "scenario". If the DNA test results are outside the cM range, the scenario is bad, will be discarded, and Charting Companion will advance to the next possible position of the orphan in the tree. If the DNA results are consistent, the good scenario will be recorded. All possible scenarios can then be reviewed for further investigation. (see video [at https://youtu.be/yBe6Pd8g5no]).


In addition to linking to existing persons, Charting Companion will also insert hypothetical or placeholder spouses and children, and attempt to link the orphan to these additional people. The added persons represent potential extramarital relationships, previous unknown marriages, unknown children, children given up to adoption, non-paternal events, etc. They are meant to suggest possible connections that would otherwise be very time-consuming to evaluate manually.

The DNA Simulation is available in Charting Companion 7. See video [at https://youtu.be/yBe6Pd8g5no] for detailed explanation.

If you have an earlier version of Charting Companion, our upgrade policy is:

If you purchased within the last year, you get a free upgrade (contact [Progeny]).
If you purchased within two years, you get a 20% discount (contact [Progeny]).
Within three years, purchase a Registration Key [at http://progenygenealogy.com/Products/Family-Tree-Charts].

Charting Companion works with all genealogy programs: Family Tree Maker, RootsMagic, Legacy, Ancestral Quest, Family Historian, GEDCOM, etc.

Progeny Genealogy
10037-20 Silver Fox Ave. New Minas, Nova Scotia B4N 5K1 Canada
(902) 681-3102



All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "DNA Simulation added to DNA Matrix in Progeny Charting Companion," Deb's Delvings, 21 July 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

23 May 2018

DNA Standards - Pedigree Analysis (Tree Analysis)

20 June 2018: Updated to link to new page for the survey on BCG website.

This is part of a series on Genealogy Standards for using DNA. This series represents the opinions and interpretation of the proposed standards by this author and does not necessarily reflect BCG’s official position. The proposed standards are not being addressed in numerical order, but all articles will be linked. For other parts in the series see

You can participate in a survey and provide your opinion on the Proposed DNA Standards through a Google Docs survey linked from https://bcgcertification.org/proposed-dna-standards-for-public-comment/. Please leave comments by 23 July 2018 explaining your agreement or disagreement with the proposed standards. Comments will be used to modify the standards as needed before acceptance and publication.

Proposed new DNA standard (proposed standard numbers may change before acceptance and publication) #3 is:



What does this all mean? Standards are written formally and most of us understand informal language better. Breaking down each segment makes the meaning more clear.

The first thing many researchers who share DNA do is compare pedigrees (family trees) or surname lists looking for a common ancestor (or an ancestral couple; the term common ancestor will be used for simplicity even when an ancestral couple may be the source of a DNA segment). The first common surname, person, or couple found is often assumed to be the source of the shared DNA segments. That assumption may be right or wrong. More evidence is needed to determine which is more likely.

Genealogists must also consider that two test takers who share DNA may not have inherited all of that DNA from one common ancestral couple. If two test takers are related in more than one way (such as through pedigree collapse or endogamy) this can be difficult to determine except with thorough research and correlation of the DNA and documentary evidence.


© 2018, Debbie Parker Wayne

When analyzing pedigrees there are three critical concepts. Some common things to review when analyzing pedigrees are listed here.

  1. Accuracy of the pedigree: a pedigree either has the correct ancestors linked for each generation or it does not. If the pedigree of any DNA test-taker under analysis is inaccurate then the common ancestors may never be identified.

    Accurate pedigrees are the result of research that meets the Genealogical Proof Standard (See "Useful References" below; the GPS summarized and paraphrased is): A focused research question, thorough research, correctly cited sources, thorough and competent analysis and correlation of all evidence that is pertinent to the question, resolution of any conflicting evidence, and a sound written conclusion).

    Researchers can analyze the accuracy of pedigrees by confirming the consistency of assertions (no children born when a parent would be too young, too old, deceased for more than nine months, in a different location at the time of conception, etc.) and that the most credible sources support each assertion.

    See “Accuracy” at the bottom left of the pedigree image.

  2. Depth of the pedigree: ideally, each DNA test taker’s pedigree chart should be complete back to the level of the hypothesized common ancestor, and preferably a few generations further back. If two DNA test takers are predicted to be third cousins, then both pedigrees should be complete at least back to the second-great-grandparents (the hypothesized common ancestral level). An extra generation or three in each tree helps if the test takers inherited more than the statistical average amount of DNA; in that case they may actually be fourth or fifth cousins instead of the predicted third cousins.

    See “Depth” at the top left of the image. In this example, all names are complete up to the great-grandparent level that would be shared with second cousins. However, all of the missing information on the birth, marriage, and death of many of these ancestors indicates this tree is not deep enough or verifiably accurate enough even at this level.

  3. Gaps in the pedigree: ideally, each pedigree will be complete with no gaps. In the real world many researchers have brick walls on some lines or just have not had time to research every possible line yet. Add to that the fact that every time a new ancestor is identified the next step is to identify that ancestor’s parents making genealogy truly a never-ending search.

    See “Gaps” at the top right of the image. Those gaps in the tree may be hiding the common ancestor or perhaps a second (and third, fourth, and so on) common ancestral line shared by two DNA test takers. Our conclusion may be easily overturned if we do not consider those other possible shared ancestors. We can address the gaps by one or more of the following

    • Doing further documentary research to fill in the gaps—we would want to do this eventually as we work on our pedigree, but a specific DNA match may focus our research on a specific line now

    • Target test more cousins, or find more test takers in our match list who share the same ancestor, to gain more DNA evidence to support the conclusion—in some cases (like burned counties) there may be little to no documentary evidence to be found. DNA evidence may help answer the question, but more than two or three DNA test takers will be needed to credibly support most conclusions

    • Clear explanations may justify a conclusion that a gap is irrelevant to the research question—perhaps the pedigree gap is in a line that originated or resided in a locale that is irrelevant to the focus question, or it is a line with a different biogeographical origin, or the gap is so far back in the pedigree it is not relevant based on the DNA evidence, and there are other possibilities

    • Segment triangulation does not work in every situation, but when it exists it can be strong evidence—all cousins will not share every triangulated segment, but groups of cousins may share one triangulated segment, while some of those cousins may also share segments with cousins in a different group—showing how each of the groups overlaps may support a conclusion

    • Clustering and genetic networks work in a similar way to triangulated segment groups. Many names are used for clusters or networks: shared matches, in common with groups, DNA circles, matches who share DNA with both of two kits, and more—for example, a group of cousins share DNA with each other, a second group of cousins share DNA, and there may be some cousins who are in both groups providing a link to the common ancestor

Useful References:

Board for Certification of Genealogists, "Ethics and Standards," scroll down to "Genealogical Proof Standard (GPS)" (https://bcgcertification.org/ethics/ethics-standards/).

Board for Certification of Genealogists, Genealogy Standards, 50th anniversary ed. (Nashville, Tennessee: Turner Publ., 2014; https://bcgcertification.org/product/bcg-genealogy-standards/).

Genetic Genealogy Standards Committee, Genetic Genealogy Standards, http://geneticgenealogystandards.com/.





Full disclosure:

I have held Certified Genealogist® credentials from BCG since September 2010. I helped form the BCG Genetic Genealogy Committee to discuss DNA standards. I resigned from the committee due to personal commitments, but have continued to participate as an adviser, reviewer, and in other ways.

I support the adoption of standards to be used when incorporating DNA analysis into a genealogical conclusion. I support BCG seeking input on the proposed standards from the greater genealogical community using DNA. I see this as a positive step to ensure newly adopted standards will meet the needs of the entire research community. No matter what is adopted, updates will certainly be needed just as research methodology and documentary research standards have evolved over the decades.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "DNA Standards - Pedigree Analysis (Tree Analysis)," Deb's Delvings, 23 May 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

DNA Analysis Standards

20 June 2018: Updated link to DNA standards survey page.

This is part of a series on Genealogy Standards for using DNA. This series represents the opinions and interpretation of the proposed standards by this author and does not necessarily reflect BCG’s official position. The proposed standards are not being addressed in numerical order, but all articles will be linked. For other parts in the series see

Way back in 2013 an Ad hoc committee formed to develop genetic genealogy standards. Those standards were released in January 2015 and are available at http://geneticgenealogystandards.com/.1 These standards are recommended by many organizations and by most speakers covering DNA topics

Those original standards primarily deal with ethical issues. The plan was to eventually add technical standards with more details on depth of testing, resolution of tests, and many other critical elements of using DNA test results to answer genealogical questions. As with so many other things, life got in the way and the additional work was never completed.

In the intervening years, we have learned a lot more about using DNA test results effectively and how varied and "random" the results can be from one family to another. Real life results do not always match the statistical average predictions. By definition, an "average" is the typical result in a data set, but that means there are real results on either side of that average. This leads to many questions. How many men need to be tested in a Y-DNA line to prove or disprove a theory? How many markers should be tested? How many markers can differ? How big should an X-DNA segment be before you spend time searching for the common ancestor who passed it down to the people living today? There is no definitive answer to these questions. Many variables will affect the answer for a specific family under investigation although there are some general guidelines to consider.


Many of us think we need defined standards for using DNA evidence to reach a genealogical conclusion even though there is no "magic number" answer to many questions. What should a thorough researcher do when incorporating DNA evidence into a genealogical conclusion? What do you look for other than the name of the same ancestor when analyzing another person's family tree? How do you document the analysis?


Years ago researchers had similar questions related to documentary research. The community responded with books to provide guidance to researchers. A selected list includes Genealogy as Pastime and Profession in 1930 and revised in 1968,2 Genealogical Research: Methods and Sources in 1960 and revised in 1980,3 Genealogical Evidence in 1979,4 Genealogical Standards of Evidence in 2010,5 and Elements of Genealogical Analysis in 2014.6

[Added: Mea culpa. I left off one of the best and newest books: Thomas W. Jones, Mastering Genealogical Proof (now Falls Church, VA: National Genealogical Society, 2013). And don't forget the analysis chapters at the beginning of Elizabeth Shown Mills, Evidence Explained, 3d. ed. (Baltimore, MD: Genealogical Publ. Co., 2015).]

The Board for Certification of Genealogists (BCG) published The BCG Genealogical Standards Manual in 2000.7 This was reorganized, updated, and published as Genealogy Standards in 2014.8 These standards reflect best practices for the genealogical research community, not just those applying for BCG credentials. Some genealogists think these standards are all we need—that we do not need more specifics for DNA.


My colleague, Harold Henderson, CG, makes an excellent point as to why DNA standards should also be spelled out (paraphrased and used with permission): A highly competent genealogist would be able to formulate standards based only on the elements of the Genealogical Proof Standard (GPS).9 By expanding the concepts of the GPS into the Genealogy Standards, BCG saved time for us all. Each researcher can understand the fine points of performing quality documentary research without having to recreate the standards. Defined DNA Standards provide the same service for those seeking to incorporate DNA analysis.

DNA standards will help members of the general community
  • Researchers adding DNA analysis to their skill set
  • Authors incorporating DNA evidence
  • DNA test takers and those requesting others to take tests
  • Instructors teaching others to analyze DNA test results

DNA standards will also provide benefits for BCG
  • Applicants and those renewing credentials will know what is expected when incorporating DNA
  • BCG judges will all be judging to the same published standards for DNA
  • Updated Genealogy Standards will reflect the current state of research (we have been using DNA for genealogy for over twenty years now and testing has increased exponentially in recent years)

The BCG Genetic Genealogy Committee has drafted a set of DNA Standards that reflect the practices of some of the most experienced genealogists using DNA today. BCG is surveying the community for input on these proposed standards. Some current Genealogy Standards are modified and expanded to more clearly define the needs when using DNA. New DNA Standards address DNA testing, interpreting DNA test results, identifying shared ancestry, accessing test results, and integrating DNA and documentary evidence. These standards are focused to provide specific guidance yet broad enough to allow for differing family composition and random factors encountered with DNA.

You can participate in the survey and provide your opinion through a Google Docs survey linked from https://bcgcertification.org/proposed-dna-standards-for-public-comment/. Please leave comments by 23 July 2018 explaining your agreement or disagreement with the proposed standards. Comments will be used to modify the standards as needed before acceptance and publication. There is also a link from which you can download a PDF file with the proposed standards.



Feel free to leave comments here, but only comments submitted through the official portal above will be considered by the committee.




1. Genetic Genealogy Standards Committee, Genetic Genealogy Standards, http://geneticgenealogystandards.com/.
2. Donald Lines Jacobus, Genealogy as Pastime and Profession (1930, revised 1968; reprint, Baltimore, Maryland: Genealogical Publishing, 1999).
3. Genealogical Research: Methods and Sources, 2 vols. (Vienna, Virginia: American Society of Genealogists, 1980-1983).
4. Noel C. Stevenson, Genealogical Evidence: A Guide to the Standard of Proof Relating to Pedigrees, Ancestry, Heirship and Family History (Laguna Hills, California: Aegean Park Press, 1979).
5. Brenda Dougall Merriman, Genealogical Standards of Evidence (Toronto: Ontario Genealogical Society, 2010).
6. Robert Charles Anderson, Elements of Genealogical Analysis: How to Maximize Your Research Using the Great Migration Study Project Method (Boston: New England Historic Genealogical Society, 2014).
7. BCG Genealogical Standards Manual (Washington, DC: Board for Certification of Genealogists, 2000).
8. Board for Certification of Genealogists, Genealogy Standards, 50th anniversary ed. (Nashville, Tennessee: Turner Publ., 2014; https://bcgcertification.org/product/bcg-genealogy-standards/).
9. Board for Certification of Genealogists, "Ethics and Standards," scroll down to "Genealogical Proof Standard (GPS)" (https://bcgcertification.org/ethics/ethics-standards/).


Full disclosure:

I have held Certified Genealogist® credentials from BCG since September 2010. I helped form the BCG Genetic Genealogy Committee to discuss DNA standards. I resigned from the committee due to personal commitments, but have continued to participate as an adviser, reviewer, and in other ways. I support the adoption of standards to be used when incorporating DNA analysis into a genealogical conclusion.

I support BCG seeking input on the proposed standards from the greater genealogical community using DNA. I see this as a positive step to ensure newly adopted standards will meet the needs of the entire research community. No matter what is adopted, updates will certainly be needed just as research methodology and documentary research standards have evolved over the decades.

All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "DNA Analysis Standards," Deb's Delvings, 23 May 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

15 May 2018

Whole Genome Sequence (Part 2) - Analysis Tools

For earlier parts see
Whole Genome Sequence (Part 1) — YSEQ.net options and files received

I am investigating bioinformatics1 tools to analyze Whole Genome Sequence (WGS) data. I have access to a WGS for someone who has also tested at several genealogy testing companies. I want to do some comparisons between the raw data from the genealogy testing companies and the WGS, checking for accuracy of the reads. To satisfy my curiosity, I plan to investigate some of the medical implications and traits discussed in scientific papers.

Once I have multiple WGSs from relatives, I plan to do some comparisons as to whether segments that the testing companies indicate match really do match completely with the higher resolution data. I am interested in how closely the statistical predictions on linkage disequilibrium and crossovers mirror what is seen in real family multi-generational studies. For example, in the shared segments marked below, not every SNP is tested. A number of SNPs in a segment are tested and we assume the non-tested SNPs match based on statistical predictions.


My previous career in software development, testing, and support made me familiar with Open-Source Software so I look for available tools before spending time writing my own. Tools I am checking out include Samtools, National Center for Biotechnology Information's (NCBI) Genome Workbench, and Broad Institute's Integrative Genome Viewer (IGV).

This new BioRxiv paper is timely for my quest:

"A large-scale analysis of bioinformatics code on GitHub," by Pamela H Russell, Rachel L Johnson, Shreyas Ananthan, Benjamin Harnke, and Nichole E Carlson, doi: https://doi.org/10.1101/321919. The meaty data is in the supplemental material which consists of several large files (some over 200MB) linked from the article abstract.

By the way, just as with some of the best genealogy articles, the reference notes in this article led me to several additional sources I now need to consult.

As a woman, this sentence is especially depressing: "... the proportion of female contributors decreases for high-profile repositories and with seniority level in author lists".2 I hope this changes and more women participate in bioinformatics.



geralt, Pixabay (https://pixabay.com/en/learn-mathematics-child-girl-2405206/ : accessed 15 May 2018), CC0 Creative Commons.

I am impressed with how many databases and tools are out there for DNA analysis. I did not realize there are over 1,700 bioinformatics repositories and "23 'high profile' GitHub repositories containing source code for popular and highly respected bioinformatic tools."3 "Our analysis points to simple recommendations for selecting bioinformatic tools from among the thousands available."4 Some of these will not be useful for genealogy, but some will.

One tool aimed at the genetic genealogy community is Thomas Krahn's tool for annotating a BigY VCF file and identifying derived and novel SNPs.5 Thomas kindly shared this tool so others can do the analysis instead of having it done by his company YSEQ.net.

Some of the discussions in the scientific world parallel those we are having in the genealogy world.

"In recent years, the explosion of genomic data and bioinformatic tools has been accompanied by a growing conversation around reproducibility of results and usability of software. Reproducibility requires that authors publish original data and a clear protocol to allow repetition of the analysis in a paper."6 In the genealogy world we are discussing publicly available DNA data, such as on GEDmatch.com, allowing DNA analysis to be reproduced and referenced from a publication.



OpenClipart-Vectors, Pixabay (https://pixabay.com/en/analysis-biology-biotechnology-2025786/ : accessed 15 May 2018), CC0 Creative Commons.

"The bioinformatics field embraces a culture of sharing — for both data and source code — that supports rapid scientific and technical progress."7 In the genealogy world we are discussing privacy issues versus sharing data, especially with the recent proliferation of stories on law enforcement use of genealogy databases.

I have been musing on whether to learn Python or Ruby. A recent discussion with a young programmer had me leaning towards Python. Since the "greatest amount of code in the main dataset was in Javascript, followed by Java, Python, C++, and C"8 maybe I will stay with Javascipt and Java, which I already know, if I develop any new tools for web usage. I have a few tools I wrote in Perl for my own use that I hope to clean up and share eventually.

In addition to DNA adding to my knowledge of my family tree, it is forcing me to upgrade my data analysis knowledge and computer tools familiarity. I hope all of this study helps keep my mind active and reduces those "senior moments" that seem to occur more frequently with the years.



1. The science of collecting and analyzing complex biological data such as genetic code.
2. Pamela H Russell, et al., "A large-scale analysis of bioinformatics code on GitHub," 15 May 2018, BioRxiv pre-publication, https://doi.org/10.1101/321919, line 35.
3. Ibid., line 27.
4. Ibid., line 148.
5. Thomas Krahn, "bigY_hg39_pipeline.sh," GitHubGist (https://gist.github.com/tkrahn/283462028c61cd213399ba7f6b773893).
6. Russell, "A large-scale analysis of bioinformatics code on GitHub," line 84.
7. Ibid., line 120.
8. Ibid., line 208.


All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "Whole Genome Sequence (Part 2) - Analysis Tools," Deb's Delvings, 15 May 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved

25 April 2018

Whole Genome Sequence (Part 1)

This first article on Whole Genome Sequence (WGS) analysis is posted today to celebrate DNA Day, 25 April 2018.

This is the first in a continuing series on the files received when a person's entire genome is sequenced, the contents of those files, the tools needed to access the file data, and some things a genealogist can do with the data.

I now have access to the WGS data for several people who have also tested at most of the genealogy companies offering DNA tests. I am excited to be able to analyze these files so others can decide if a WGS test may be right for them now that prices are below $1000 and probably going lower "soon." When ordering higher resolution sequencing that is consistent with medical testing the price may be over $1000.


The first WGS I have access to was done through YSEQ.net. This is the company of Thomas and Astrid Krahn who are well known in the genetic genealogy community. YSEQ.net has excellent explanations of the options and processes on their website and are very responsive to questions via Facebook and their online contact form.

YSEQ.net was chosen as the testing company because they
  • provide the data files on a micro SD card as an option
  • offer 15x, 30x, and 50x options for coverage (30x coverage is generally the minimum used for medical purposes; the test taker wanted to be able to use this for health purposes and did not want to pay for additional sequencing later unless it becomes possible to phase data as it is sequenced)
  • provide privacy acceptable to the test taker (the outsourced sequencing does not have the test taker's name attached, the outsource sequencing company will not use the DNA data for other purposes, raw data is archived at YSEQ.net where German law prohibits the data being used without permission from the test taker)
  • and the reputation of the company owners



Adapted by Debbie Parker Wayne from mcmurryjulie, chromosomes, Pixabay (https://pixabay.com/en/chromosomes-genetics-dna-genes-2817314/ : accessed 15 November 2017), CC0 Creative Commons.

A kit was ordered from YSEQ.net on 15 November 2017, four swabs arrived on 18 November 2017, the kit was returned on 20 November 2017, and received by the lab on 27 November 2017. Online mtDNA results were available 39 days later on 5 January 2018. Online WGS results were available 24 days after that on 29 January 2018. That is only about 73 days including mail time between the USA and Germany. The micro SD card was received later.

The files received consisted of
  • a text file with information on how to download the online DNA data, an mtDNA comparison to the rCRS, and Y chromosome analysis if the test taker is male
  • a text file with 23andMe V3-style data with about 958,000 lines that could be used with third-party DNA websites and tools; any test taker who has also tested at other testing companies can compare the two files to see if both companies found the same allele values at all locations
  • an mtDNA FASTA file (this is also a plain text file format); any test taker who has also tested the full mtDNA sequence can compare the two files to see if both companies found the same allele values at all locations
  • a very large Variant Call Format (VCF) file with a complete set of extracted mutations - about 695MB - readable with a text file reader such as NoteTab Pro, but may slow down your system due to the size; this has interesting information on the length of data read from the test taker's chromosomes and the mutations of this test taker (provided as a TBI and GZIP file which you must unzip)
  • a BAM and BAI (BAM Index) file with the WGS data - these will require special tools to view as the files are compressed (this will be covered more in a later post; BAM is a binary or compressed version of a SAM file; a SAM (Sequence Alignment Map) file is a text-based format for storing biological sequences aligned to a reference sequence; Samtools are available for LINUX systems at http://www.htslib.org/)
  • a BAM.stats and BAM.idxstats.tsv file - both readable in a plain text file reader; these are small - 450 to 5,000 lines - and can be read by any text file reader (stats is an abbreviation for statistics; idx is a common abbreviation for index in the computer world; TSV is a tab-separated-value file similar to the CSV comma-separated-values files we use all of the time in DNA analysis)
  • what seems to be the mtDNA data in a BAM file format along with a BAM index


Image by Debbie Parker Wayne

I am in the process of installing Samtools on my LINUX system so I can read the BAM files. I suspect many genealogists will not do this unless they have experience with LINUX/UNIX systems. There are some Windows/Mac-based genome analysis tools also.

Even without Samtools there is a lot of interesting information here to analyze in the coming months and compare to data from the genealogy testing companies. If you are interested in learning more about BAM files see Samtools and NCBI Genome Workbench.




All statements made in this blog are the opinion of the post author. This blog is not sponsored by any entity other than Debbie Parker Wayne nor is it supported through free or reduced price access to items discussed unless so indicated in the blog post. Hot links to other sites are provided as a courtesy to the reader and are not an endorsement of the other entities except as clearly stated in the narrative.


To cite this blog post:
Debbie Parker Wayne, "Whole Genome Sequence (Part 1)," Deb's Delvings, 25 April 2018 (http://debsdelvings.blogspot.com/ : accessed [date]).

© 2018, Debbie Parker Wayne, Certified Genealogist®, All Rights Reserved