Thursday, November 29, 2018

When GEDmatch and The Testing Company Are Far Apart

I have autosomal DNA tests with four companies. I am not going to say which one this refers to. They can all say "This isn't us. Must be someone else."

When you test with the DNA companies, you know that you will have matches available to you. Some send you notices, some do not but allow you to look for yourself. When I see such matches for my own test results, I contact the owner of the matching kit and ask if the kit is on GEDmatch. That would give me a possibility to see who else in my family matches this kit. Is this new match on my father's side or my mother's side. Which grandparent or great-grandparent? The Hungarians, the Slovakians, the Galicianers or the Russians.

Some of these new matches reply, some do not. When they do, I take their GEDmatch kit and see how it lines up with my family. Usually nothing comes of it, but sometimes we get some sense of geography, if not specific family connections.

But sometimes the match does not show up on GEDmatch at all. And this problem is very common - perhaps 30% of the time - with one particular company. I have lost count of how many times I have written:
Unfortunately, despite what the company says, GEDmatch does not show a match with me at all. Or my brother. Or my sisters. Or my father's sister and brother. Or my first cousins.
Here is a recent example. The company labelled the match as a suggested "second cousin to fifth cousin." They said that we had shared 70.5 cM spread across nine segments, with the longest 13.5 cM. It looked worth an inquiry, though not particularly promising.

I wrote to the man and his wife came back to me with his GEDmatch number. I looked at his GEDmatch results andand I was nowhere on his one-to many. I tried on the Tier1 one-to-many, searching all his matches, not just the 2000 that is the GEDmatch limit. And there I was. A match with a total of 18.5 cM and a longest segment of 11.2 cM. There were only two segments.

I dropped the search threshold to 3 cM and found our match with four more small segments bringing the total to 32.5 cM. This is less than half of what the company showed. And the company's longest segment is 13.5 cM while GEDmatch shows only 11.2 cM.

But it gets worse.

The longest segment on GEDmatch is on chromosome 7, a segment where the company shows only 6.0 cM.  The company's longest segment is on chromosome 15; GEDmatch has nothing at all on chromosome 15.

In total, GEDmatch has six segments to the company's nine. Most of those segments are on different chromosomes entirely.

We know that there is much to be done as DNA for genealogy emerges from its infancy. Basic consistency would be a good place to start.

Wednesday, November 28, 2018

Cheryl - A GEDmatch Case Study - Part Two

Cheryl matches my families
A few days ago, I wrote about my neighbor Eric whose wife Cheryl has a promising match on GEDmatch with a Skalat Pikholz descendant named Gene. Aside from Gene, Cheryl has a number of other matches with my families.

There are five segments of interest, none very close, but together they may be useful, both for Cheryl herself and as examples of how to use GEDmatch.

Eric was here for a little over an hour, so after we looked at Gene, we went straight to the old reliable one-to-many search which looks at Cheryl's top 2000 matches. I was looking for segments of ten or more centiMorgans where Cheryl has multiple matches in my families. Segments smaller than ten cM may be real, but are certainly too far away to be useful for families such as ours where we have few surnames or records before 1800. And I use multiple matches because I want some evidence that the match is from "our side" and not from the other side of some second, third or fourth cousin.

The five segments of interest
On chromosome 1, Cheryl has about 11 cM with Aunt Betty and Uncle Bob, my father's sister and brother. This is not a large segment and we cannot tell whether this is on my grandfather's side or my grandmother's. And given that, the common ancestor probably lived well before 1800.

On chromosome 5, Cheryl matches my brother, one of my sisters and me on a segment of 10.7 cM. No cousins on either side, so this match could come from anywhere. And it is not large.

On chromosome 9, Cheryl matches my brother, three of my sisters, me and my second cousin Roz, on my father's father's side, with a segment of about 12 cM, plus a nearly-adjacent segment of about 5.6 cM. We all match each other, so we have triangulation. So this segment comes from either my grandfather's Pikholz father or his Kwoczka/Pollak mother, all from the Tarnopol area of east Galicia.

On chromosome 11, Cheryl has a segment of about 15 cM with two of my second cousins on my mother's side. One is Beth, whose grandfather is my grandmother's brother and one is Liya whose grandmother is my grandmother's sister. They match each other, so we have triangulation. My grandmother's descendants do not appear here, nor do Beth and Liya's two first cousins. Is there something here? Maybe. But we have only one surname in my grandmother's family - Rosenbloom from Borisov in Belarus.

Finally, on the X, Cheryl has 20.5 cM with Aunt Betty and 14.9 cM on the same segment with my cousin Roz. They triangulate, so this is a real match. Aunt Betty could not have gotten the X from her father's father and Roz would not be expected to match Aunt Betty's mother, so the match must be from the Kwoczka/Pollak side. We cannot know if it is from the same common ancestor as the segment on chromosome 9, but the possibility is intriguing.

Digging deeper with a better way
At this point, my discussion of last week on the GEDmatch Tier1 one-to-one kicks in. I told Eric that after he signs up for Tier1, he should see if any of Cheryl's matches beyond the initial 2000 can tell us more about these five segments.

But there is a better way. GEDmatch has a tool called "Multiple Kit Analysis." It is marked as NEW, but it has been around for quite awhile. And it is not a Tier1 tool, so it is freely available.

In the Multiple Kit Analysis there are two tabs. Choose the one on the right: "Manual Kit Selection/Entry." If I enter Cheryl's GEDmatch kit in the first box ("Kit 1"), I can compare her to all the kits I enter in the subsequent boxes, whether or not they are in Cheryl's first 2000 matches. For instance, if I enter all the kits from my mother's side, I can see if anyone besides Beth and Liya match Cheryl on the segment on chromosome 11. But entering all those kits by hand is tedious and prone to error. Here I have a short cut that Eric cannot use without knowing the GEDmatch numbers for all my relevant kits.

I use a free program called ShortKeys which I have set up to fill out this form for each of my families. When I did that for my Borisov family (the ShortKey code includes many Borisov residents who are not specifically related to me), I got one more second cousin on the segment with Beth and Liya - Liya's first cousin Lydia. This strengthens this segment as a useful connection between Cheryl and my grandmother's family beyond what we had on the basic one-to-many. Eric could have done this using the Tier1 one-to-many but I find this easier for matches with my families.
Cheryl matches Beth, Lydia and Liya together
That same search gave me another bit of information regarding my mother's side. On chromosome 5, we see the three 10.7 cM matches that I mentioned above - my brother, one sister and me. But we also have 12 cM with my first cousin Mike (line 5), on my mother's side. He triangulates with us here, so this is real.

There are no second cousins on the segment so we don't know which of my mother's sides is represented here.

Uncle Bob, Aunt Betty and Pinchas
line up together on chromosome 1
I also did the Multiple Kit Analysis for the other three segments - the ones that show matches between Cheryl and my father's side. On chromosome 1, where Cheryl's one-to-many showed a match with Aunt Betty and Uncle Bob, the Multiple Kit Analysis gave us one more name: my third cousin Pinchas. Pinchas' great-grandfather is the brother of my great-grandmother Jutte Leah Kwoczka, whose mother is a Pollak.

This match triangulates so it clearly comes from the Kwoczka/Pollak side - not Pikholz and not my grandmother's Hungarian/Slovakian families.

Chromosome 9 showed us nothing new, so we have only the original matches with my second cousin Roz. Maybe Pikholz, maybe Kwoczka/Pollak.
Five of my parents' children and our second cousin Roz (on line 6). Not large but definitely my father's father's side.

Chromosome 23, the X, gave us two new matches - Rhoda and Terry.
The two large yellow matches (20.5 cM) are Aunt Betty and Roz' first cousin Rhoda. The first green match is my second cousin Terry, with 15.2 cM. The second green match is Roz, with 14.9 cM. They all triangulate. This is clearly a Kwoczka/Pollak match.

This X match and the Kwoczka/Pollak match on chromosome 1 may or may not be from the same ancestral source.

The ball is now in Eric's court. He can do Matching Segments and write to others who share these matches. But perhaps more important, he can get some of Cheryl's first and second cousins to test, so he can see which of Cheryl's ancestors provided these segments.

When he comes back to me, I'll report it here.

Housekeeping notes
I shall be speaking, in Hebrew, for the Rishon LeZion branch of the Israel Genealogical Society on Monday, 14 January at 7 PM at the Rishon LeZion Museum, 2 Ahad Haam Street. This is not a DNA presentation, though there are a few DNA references. The topic is

מֵעֵבֶר לְסָפֵק סָבִיר
מה שיודעים, לעומת מה שאפשר להוכיח
What We Know vs. What We Can Prove

Monday, November 26, 2018

Cheryl - A GEDmatch Case Study - Part One

I see many people - both on Facebook and in personal correspondence - who have no idea what to do with GEDmatch matches or even if the whole GEDmatch experience is worthwhile. So here is the beginning of a case study.

Yesterday morning I received an inquiry from Eric, a  fellow here in the neighborhood. His wife Cheryl had just done a Family Finder test and he had uploaded the results to GEDmatch. It seems that one of her top matches is with someone in my Pikholz Project who appears on the GEDmatch one-to-one with a match of 124 cM and a longest segment of 40.5 cM. Eric wanted to know how this match sounds to me.

A single segment of 40 cM looks like a third cousin or closer - maybe a fourth. In any case, probably not something from the 1700s. In a word, promising.

I had a look and saw that this particular match is a Pikholz descendant from Skalat, a man named Gene whose mother is descended from Berl Pikholz (~1789-1877). Berl's precise relationship to the other Pikholz Skalat families of that period is unclear and there is no one to test for Y-DNA. Gene has Pikholz third cousins who have tested and his kit is managed by a cousin on his father's side. My obvious first suggestion was that Eric contact Gene's cousin (whose email appears on Gene's FTDNA kit).

Then I had a look. First of all, Gene is Cheryl's fourteenth best match on GEDmatch, though it is not obvious why the other thirteen are better matches.

I sorted on the "Longest cM" column and Gene is Cheryl's third longest matching segment.

I assumed that the other 84 cM in Cheryl's match with Gene is made up of  many small segments so I did a one-to-one between Cheryl and Gene . I was surprised to see that this showed only four segments, including one of 24 cM.

(The total is different - 89.7 cM as opposed to 124 cM - because one-to-many uses segments smaller that one-to-one. Perhaps the thirteen "better matches" don't have so many of these small segments.)

This looked promising, even though I could not tell if these four segments represent one, two, three or four different common ancestors. I also looked at Cheryl's other matches to my families and saw some of interest. I suggested that Eric come over so I could show him how I think he should proceed. He came later that day.

Although he is a regular reader of my blog, this was Eric's first hands-on interaction with GEDmatch's Tier1. We looked at Cheryl's matches on chromosome 22 using the Matching Segments tool, with the minimum set to 12 cM.

There are eleven kits which match Cheryl on parts of the segment she shares with Gene. (Gene is far and away her largest match.) They don't necessarily all match Gene, as some may match Cheryl's mother with others matching her father - and we have no idea which side Gene is on. I left it to Eric to triangulate to see which of the eleven match both Cheryl AND Gene. (One of those eleven is my father's cousin Shabtai and his kit does triangulate.)

Those eleven kits are listed with names and contact emails and I suggested to Eric that he contact at least the ones that triangulate. But first we did Matching Segments again, this time for the 24.3 cM segment on chromosome 2.

Here Cheryl has many more matches. What I wanted to see is whether any of the people she matches on chromosome 22 also match her on chromosome 2. That would hint  at the common ancestor being the same as on chromosome 22. There are none - at least not with 12 cM or more.

I left it to Eric to look at smaller matches as well as the matter of triangulation. I also left it to Eric to look at Cheryl's matches with Gene on chromosomes 8 (10.7 cM) and 15 (14.4 cM) in the same matter. To see if  any of them are the same people as those on chromosomes 2 or 22.

I suggested he do all this before contacting Gene's cousin - the one who manages his kit. My suspicion is that the cousin cannot help much with this, as he himself does not seem to appear among Cheryl's matches.

As a matter of due diligence, we also looked at Cheryl's matches with one of Gene's Pikholz third
cousins who shows up with a small match on the Cheryl's one-to-many. The third cousin indeed shows up with 5.5 cM on Cheryl's segment with Gene on chromosome 2 but the segment does not triangulate, so this is not meaningful.

This is meant to be a series, so I expect to be back with the results of Eric's efforts. I  also will have some things to say about Cheryl's matches with the rest of my family, perhaps later this week.

Housekeeping notes
With the help of my Russian-speaking colleague Galit, we have ordered a test from FTDNA during their current $39 sale, for a Pikholz descendant in St. Petersburg. He is a nephew of one of our mystery Pikholz descendants whose DNA results raised more questions than it answered.

We may have a second new one as well.

Wednesday, November 21, 2018

(These) GEDmatch Inconsistencies - SOLVED

The problem
Last week I reported in this space about a problem I was having with GEDmatch. A match named Lauren had eleven matches with my families (using the one-to-many search) which her father George did not share, but were definitely not from her mother. When I dug deeper, I saw that George in fact matched all eleven when I used the one-to-one search.

I sent a link to that blog to the GEDmatch team and gave them the relevant kit numbers. Since then, I have been going back and forth with John Olson and I am pleased to report that we have a solution which John asked me to pass on to my readers.

How the basic one-to-many works
As we know, most of us endogamous folk have a few tens of thousands of matches on GEDmatch, but they only show the first 2000. (Early GEDmatch showed only 1500 matches, which proved inadequate.) "First" in this case means the lowest numbers in the "autosomal generations" column, which is the default sorting key. Other matches are available on the one-to-one searches, but when you manage a large number of kits, as I do, looking for those one-to-ones is not practical.

Most of my kits are given a name beginning with "*0Pikh..." so they will all sort together, near the top and I had always understood that when I sorted on the name column, they would show the first 2000 names from the entire match list. It turns out that this is not the case. The first 2000 matches are fixed and any sorting works only within that set of matches.

In this specific case, George's first 2000 matches go up to 3.9 generations while Lauren's go up to 4.5 generations.

Here are the last four matches of each of them:

George's matches with the eleven "missing" kits are all further than the last of the 3.9 generations that are displayed.

This may be a problem peculiar to endogamous populations where the number of matches is huge. Perhaps non-endogamous populations will have in their first 2000, matches that go to 5.0 generations or more.

And George may have more matches under 4.0 than most endogamous kits. But I see that I also go up to 3.9 generations and my two first cousins (not siblings) with one Jewish parent, both go to 4.4 generations. Frankly, 3.9 generations is not enough, nor is 4.4, so we need a way to enlarge the match list.

The solution
The way to solve this is by using the Tier1 one-to-many. Tier1 is a set of seven (at last count) GEDmatch tools which are available to those who make a donation to GEDmatch. This is not a subscription. You can do a single month for $10 each time you need it. (I think they deserve ten dollars a month just on general principle so am always signed in to Tier1.)

The Tier1 one-to-many gives you a choice among seven match limits, from a low of 500 up to 100,000. Both George and I have bit more than 40,000. My two first cousins with one non-Jewish parent have about 29,000 and 33,500 total matches. And it covers all the matches, with the same sorting capacity that I have gotten used to.

The 100,000 match limit search took me less than a minute, so it's not terribly burdensome.

So henceforth all my one-to-many searches will be with Tier1.

Thursday, November 15, 2018

GEDmatch Inconsistencies

MyHeritage and George
Every Sunday, I receive a notice of up to ten matches from MyHeritage. I write to the matches and ask them if they have GEDmatch numbers, so I can do a proper comparison with my whole family set. Last Sunday, one of my matches was with George.

A second-fifth cousin, with 83.8 cM and a longest segment of 21.2 cM
His daughter Lauren responded and gave me the GEDmatch numbers for her father and herself. I went to GEDmatch to see the match and to see who else in my family George matches - which parent, which grandparent, etc. I started with a one-to-one - just George and me.
Nothing at all over 7 cM. Certainly no longest segment of 21.2 cM.

This itself was no surprise. There are a few like this from MyHeritage every week. False match alerts.

Nonetheless I compared him to the rest of my family kits, using the one-to-many function. There were thirty-seven matches across my related families. But not my brother or my sisters. He matches my father's sister and her son but not my father's brother. Or any of my other first cousins. No second cousins on my father's side and four on my mother's side. And he matches a few more distant Pikholz descendants, both from Skalat and from Rozdol..

I ran the "new" one-to-one on Tier1 using the 5000-match threshold and it gave me the same thirty-seven matches.

GEDmatch and Lauren
While I was looking, I compared Lauren to my family kits on a GEDmatch one-to many. She has thirty-five matches with my group, including eleven matches that George does not have. Those eleven are three Pikholz descendants from Rozdol who are not particularly close to one another, my brother, my sisters Judith and Amy, Judith's son and her late twin's son and three other Skalat Pikholz descendants whose connection to me is unclear.

The >10 cM matches with my brother, sisters and one nephew are all on one segment, on the right side of chromosome 6. (My father's sister is on that same segment.) The others are all scattered. But the numbers for the group of eleven are not insignificant.

I figured that Lauren must have gotten those from her mother, so I looked at her mother's GEDmatch. There was one small, obscure match with someone who has nothng much to do with any of us. Lauren matches both of her own parents as expected, so where did she get these eleven matches?

One-to-one with George
I learned some time ago that occasionally a match with a single segment will not show up on the GEDmatch one-to many but will show up on the one-to-one.

So I looked at George's one-to-one matches with each of the eleven.

George - who has as expected, the same four matches on chromosome 6 as does Lauren - matches all eleven of Lauren's matches, even though GEDmatch does not show them on George's one-to-many.

This is not good.
This is not right

On one level, it is a question that I would like GEDmatch to address. Quickly.

But more importantly, at least for now, is that it is a phenomenon that we researchers have to be aware of. There are meaningful matches that show up on the one-to-one that do not show up on the one-to-many - not the old reliable version and not the new, improved Tier1 version.

Housekeeping notes
I shall be speaking, in Hebrew, for the Rishon LeZion branch of the Israel Genealogical Society on Monday, 14 January at 7 PM at the Rishon LeZion Museum, 2 Ahad Haam Street. This is not a DNA presentation, though there are a few DNA references. The topic is
מֵעֵבֶר לְסָפֵק סָבִיר
מה שיודעים, לעומת מה שאפשר להוכיח

What We Know vs. What We Can Prove