Wednesday, January 28, 2015

Small Segments (and Endogamy)

The issue of small segments
When we compare autosomal matches on the basis of individual chromosomes, there is a natural tendency to concentrate on the larger segments. If you match someone on 30 or 40 centiMorgans (cM), it is clearly a good match. Small matches of one or two cM get overlooked - often deliberately.

When I first began looking at segment matches, I wondered about that because all my DNA came from my parents, grandparents and great-grandparents, even the smallest segments and if I match someone who got those same segments from his ancestors, perhaps both of us got them from a common ancestor.

After I raised this question several times at the GRIP course last summer, CeCe Moore convinced me otherwise by agreeing with me. That is, she agreed that these small segments - I prefer the term slivers - had to have come from somewhere in my past but since they are small, they probably came from so far back that searching for a common ancestor on that basis would not be a productive use of my time.

FamilyTreeDNA's chromosome browser starts showing matching segments with a minimum of five cM, but you can raise it to ten cM or lower it to three or even one cM. GEDmatch suggests seven cM, but you can change that to whatever you wish. But when you download raw data or total matches, you can pretty much do as you like.

There are researchers who begin any examination of matches by deleting all segments that are less than whatever minimum threshhold they set for themselves, never looking at those small segments again. Some go so far as to say that it is wrong to look at small segments.

When I met with Kitty Cooper and Gaye Tannenbaum in Salt Lake City last summer, we discussed the logic of starting with nine or ten cM but once you have a segment of that size, other smaller matches - perhaps even four cM - become relevant.

Not everyone takes this approach. One of the most consistent and convincing champions of using small segments is Roberta Estes of DNA Explained. Roberta has defended advocated the use of very small segments for triangulation and is rightly proud of her successes in having done so. Last week, Roberta posted a long blog after several weeks of laying the foundation. As you can see from what she writes, Roberta is a friend of this blog and I want to make a number of comments on what she wrote.

Moshe Hersch (And you thought we were finished with him!)
But first I want to show you something I found during the last few days in my own work which demonstrates the importance of very small segments.

Some weeks back, I concluded a discussion of two men named Moshe Hersch Pikholz, whom I thought might be the same man. Great-great-grandchildren of one (Charles and Leonora, second cousins to one another) and great-great-grandchildren of the other (Jane and Nan, also second cousins to one another) did Family Finder tests.

The maternal grandmothers of Charles and Leonora (sisters) are the daughters of two Pikholz parents whose relationship to one another is unknown. Aside from that, Leonora's maternal grandfather also has two Pikholz parents, in this case first cousins. So on one hand, Charles and Leonora have extra doses of Pikholz DNA, but on the other hand it makes it very difficult to say for certain which ancestor contributed what, moreso than with normal European-Jewish endogamy.

Nonetheless, I concluded that the genetic match between the two pairs of second cousins was good enough to demonstrate that the two Moshe Hersch Pikholz are indeed the same person.

This week, I took a closer look at chromosome 20 of the four cousins, using GEDmatch at a threshhold of 5 cM..
The bar graph is illustrative but it is not at all proportional.
In the first segment of chromosome 20 (the left side of the bar graph and the top row in the two charts above) Charles and Jane have a large match of 34.1 cM. Both match Leonora on the first part of that segment and both match Nan on the second part. Leonora and Nan do not match each other, but that is not a problem. I don't need all four to match.

The second segment (the right side of the bar graph and the second row in the two charts) is not so simple. Here we have a match of 60 cM between Jane and Nan, part of which matches Charles and part of which matches Leonora. Charles and Leonora are not a match. However nearly a quarter of Charles' match with Nan and Jane overlaps with Leonora's match with Nan and Jane. If this description is complete and correct, something must be wrong, because it is inconsistent.

I asked Roberta what she thought and she suggested that I lower the threshhold as far as possible. Perhaps, she suggested, there are some small segments that explain the inconsistency.

So I lowered the threshhold to one cM.

The long blue bar at the top right is the 60 cM match between Jane and Nan. The medium-sized blue bar at the right of the third line is the 24.5 cM match between Jane and Leonora. The bottom right where there is supposedly no match between Charles and Leonora, we see a series of about a dozen small matches in the same segment where Jane and Nan match. It is as though the long matching segment, to use Roberta's phrase, "has been chopped up." Or if you prefer, disintegrated.

If we ignore the red breaks, Charles' match with Leonora extends nearly all the way to the right end of Jane's match with Nan. Not only that, but Leonora's match with Charles extends Leonora nearly all the way to the left. If we count the small segments, all four line up very well together. To me it is clear that the 60 cM segment that Nan and Jane share came from Moshe Hersch (or his wife, assuming he had only one) and that it began to break down somewhere along the ancestors of Charles and Leonora, perhaps as early as their great-grandmother.

This may not always work so neatly and so conclusively, but to repeat a mantra of Roberta's, if you throw out the small segments even before you begin your analysis, you will never see this obvious result.

But that does not mean that I have totally signed on to Roberta's attachment to small segments. When you are talking about matches that are only small segments, the kind that do not overlap large ones, CeCe is probably right. It's generally not a productive use of my time to examine them.

That is even more valid when talking about endogamous populations where we know in advance that there are distant common ancestors simply by virtue of our being Jewish. For us, the strategy I discussed with Kitty Cooper, looking at smaller segments once you have a large match as an umbrella, is still the way to go. How large is large and how small is small is still a matter of personal preference - and mine is to be conservative. To quote myself in another context "If it might be wrong, it doesn't belong."

For the non-endogamous, such as Roberta, you can probably afford to be more liberal.

A Study Using Small Segment Matching, by Roberta Estes
I  am going to step through Roberta's blog and comment as I go along.

Sherlock Holmes is quoted as saying "When you have eliminated the impossible, whatever remains, however improbable, must be the truth." That does not mean that if we have nothing to go on aside from DNA, then DNA must have contain a usable truth. Maybe yes and maybe no.

Roberta writes " So we need to establish guidelines and ways to know if those small segments are reliable or not." I say, very carefully. Different circumstances require different tools and also create different opportunities. I want to read what all the experienced experts have to say but then I want to make my own decisions for my own families. Usually I will write about those decisions and will entertain debate. Ridicule, not so much. Genetic genealogy is way too new to have hard and fast rules, especially ones that begin :You can't..."

Roberta is obviously correct when she says "assuming the position that something can’t be done simply assures that it won’t be." That is true for an individual project which discards small segments according to some rule, as well as studies on small segment research as a genre. Roberta says correctly "The only way we, as a community, are ever going to figure out how to work with small segments successfully and reliably is to, well, work with them." To that I add if you have a few cases that are proven based on small segments, there are almost certainly many others which are not proven because those small segments were never examined.

I am well-aware that my work is different from that of most others because I an not looking for "new" relatives, rather looking to figure out how the ones I know fit together.  One-name studies is a legitimate field with its own requirements and opportunities.

Finding three people who match on the same segment may be "the commonly accepted gold standard of autosomal DNA triangulation within the industry" but among the endogamous, we strive for the platinum standard. There are too many ways to be wrong if you have only three people using segments that are not large enough and not numerous enough.

Sometimes I want to get more than one trangulation within a potential family group. I suppose that has to do with endogamy. I think of these multiple triangulation scenarios like this.

Roberta's Sarah Hickerson article "was meant to be an article encouraging people to utilize genetic genealogy for not only finding their ancestor and proving known connections, but breaking down brick walls." Absolutely. Many of us read to find not only ideas but encouragement. And some of us write not to show how smart we are but to bring others to the point where they say "I can do this too."

Roberta, please note - for some of us 5-6 generations does not qualify as "low hanging fruit."  And still our small segments can be useful.

I can understand that FTDNA and the other companies must draw a line dividing matches from non-matches. But it would me very very helpful if we could get at our non-matches on FTDNA's chromosome browser. Not everyone is on GEDmatch.

I think that will do.

1 comment:

  1. Israel, this is very helpful and highly relevant to me. I am trying to figure out how a presumed 2-3 cousin is related to my mother and brother and another very probably second cousin of my mother. Don Worth suggested I contact you. Would you be willing to give me some guidance? I am a fellow TTT member and you also show up as a cousin on my brother's matches on GEDmatch. I am just having a hell of a time using GEDmatch and using DNAgedcom as well as FTDNA. Thanks.