Sunday, June 28, 2015

Large Segments

The survey
The genetic genealogy community has been known to disagree about the usefulness of small segments. Although less contentious, large segments is also "a thing."

Blaine Bettinger asked on Facebook a couple of weeks ago
How many matches do you have using a threshold of 25 cM for a GEDmatch One-to-Many autosomal DNA comparison?
Of late, Blaine has taken to crowdsourcing Facebook for statistics in order to create databases for comparison. Most recently Blaine developed his Shared cM Project, where he asked people to tell him the sizes of matches they had with known relatives. The chart on the right shows the results.

It is anecdotal and self-selecting, to be sure, but for many people it feels better than the theoretical tables that we have been working with until now. ISOGG has even added it to their Autosomal DNA Statistics page. 

So Blaine's most recent crowdsourcing challenge has been large segments, something which I admit I have not paid alot of attention to. It is one thing to evaluate our matches by total matching segments and quite another to look at individual large segments.

The default threshold for GEDmatch is for kits which have a match of 7 cM or more, but as I mentioned recently in passing, we can change that threshold to suit our own needs.
For this exercise, Blaine wanted us to choose 25 cM.

So I looked at my own matches and there were twenty-six. For eighteen close relatives (up to second cousins) the range was from thirteen to thirty-seven. But of course most of those were just us matching each other. Once I eliminated the matches up to second cousins, I was left with eight. Most of the others were in the 10-18 range. Aunt Betty had twenty-two, Herb had twenty-three and oddly enough, one of my sisters had nineteen.

Meantime, other people were reporting back to Blaine that after removing close reatives, they were getting segments of 25 cM or more with fifty-sixty, even a hundred other people. This surprised me, so I began looking a bit deeper.

Matches with strangers
My interest in this exercise was not my usual how-are-we-related-to-these-other-Pikholz-descendants, but rather the strangers. I suppose that decision was trivial because none of my family members had matches of 25 cM or more with any Pikholz from Rozdol or with any descendants of Nachman or Peretz Pikholz. Or, for that matter, Vladimir or Joyce.

My list of strangers was the shortest - only four people. The first thing I did was to look at the strangers who appeared as matches for several of us  Eva, for example, matches six of us at 25 cM or more. So I looked at Eva's matches from 15 cM.

It was no surprise that all six are on the same segment. It was a bit of a surprise that she didn't match any other Pikholz at 15-25 cM. I would have thought there would be a few there. The key here is Rhoda, who makes it clear that this match is on the side of my father's paternal grandparents, but with no additional matches, that's as far as I can go.

Another match named Al showed quite the same sort of results.

Another match on my grandfather's side but with no smaller matches and not much else to say.

A third one told a different story. There are two, actually - a mother and daughter. This is the mother.
The first seven are more or less the usual group, some descendants of my great-grandparents. But they are followed by three more distant Pikholz descendants, two of whom have matches in the 18-19 cM range. Those two would be Judy and Leonora who are related to me through both parents of my great-grandfather.

Anna is a bit more specific. She is a fourth cousin of mine on my great-grandfather's mother's side. I am not sure how important that is because my great-grandfather's parents are some kind of cousins, but nonetheless it gives a bit of direction.

The daughter's matches are about the same. I wrote, hoping to find some names or geography we could work with. But I was disappointed. These matches are from the mother's unknown father. They are hoping for some direction from me. We are corresponding but for now, I don't think anything will come of it.

But while I am mentioning Anna
Our matches with our fourth cousin Anna, are unusual to say the least. Anna and her half-brother (who have both done Family Finder tests) are related to us through their Pikholz-descended father. Both their mothers are non-Jewish, so any Jewish DNA comes from the father that they share.

To confirm that there is no significant Jewish DNA from the mothers, I simply counted their matches. I have 4471 Family Finder matches and other members of my family have more or less (mostly more) than that.  Anna has 2366 and her half-brother has 2115. These numbers are consistent with having one non-Jewish parent.

If we look at the chromosomes below, we see that Anna matches everyone in my family except my sister Sarajoy and me. Her brother does not match the two of us nor does he match our second cousins Rhoda and Terry.

On Chromosome 8, both have a nice set of matches with Aunt Betty, Uncle Bob and Herb - who, remember, are their third cousins once removed.

Both have very large matches with Marty on Chromosome 15 - Anna's is 50 cM!

But Chromosome 3 is remarkable. Anna's brother has a nice set of matches with five of us, two of which are a bit more than 20 cM. But Anna has seven matches, all over 30 cM and four of them are 57-69 cM! This is huge for four fourth cousins and three third cousins once removed. And keep in mind that Sarajoy and I are not there at all.

If Anna were not known to be a cousin, these numbers would jump off the page - but only by looking at the largest segments or the individual chromosomes.

In fact, if we only looked at the Family Finder match list (on the right), we would see nothing remarkable at all. We would not even see that Anna's matches with us are significantly different from her brother's.

There are lessons here galore. Lessons about looking specifically at the large matches. Lessons about looking at the chromosomes, not just at the total cMs and the overall suggested relationships. 

And perhaps most important is the lesson about testing cousins and siblings. Before Anna tested, her brother's results were anything but inspiring. If someone had said "Why do we need her? We have her brother!" look what we would have lost out on. 

And even with Anna, if all we had from our side had been Uncle Bob, Terry, Rhoda, Lee, Judith, Sarajoy and me, it would have been a fine test collection of seven people but we would have missed the best results.

I referred to Anna as a known fourth cousin. That is true now. It wasn't true six months ago, before we had seen Anna's results. For it was this set of results that clarified our relationship with Anna's family.

Sunday, June 21, 2015

Let's Be Realistic

Let's be realistic. Your run-of-the-mill researcher has no business expecting that the genetic test he just ordered will bring contacts with actual relatives.

Sure it happens. There are success stories. Adoptees find someone who tests as a cousin and that gives an initial lead where nothing was known previously. And occasionally a "new" "close" cousin will pop out of the woodwork.

But most genealogy researchers will already know their first and second cousins and often some of the thirds, and the ones who aren't interested in being found aren't usually the ones out there taking Family Finder tests. (And don't get me started on those who test but do not list their ancestral surnames!)

My cousin Sam did a Y-37 test and found a grand total of three matches. His haplogroup is J-M172 and his three matches are at a genetic distance of two, three and four.

My cousin Leonard (E -L117) did a Y-37 and has five matches at zero genetic distance (two of them from one family) and ten with a genetic distance of one - this with a surname which we know goes back three hundred years. There is no one close there either and none of his matches shares that common surname.

Aunt Betty (H10a1b) has nine MtDNA matches, including three zeroes, but not a one with a Family Finder match at any level.

My cousin Joe (K2a2a1) has eighty-two MtDNA matches with zero genetic distance, but only three Family Finder matches among them - and they all appear remote.

I have over seventy suggested second-fourth cousins and over six hundred suggested third-fifth cousins - aside from known family members - and that is after Family Tree DNA has invoked their magic algorithm that supposedly accounts for endogamy. WHO ARE ALL THESE PEOPLE? And more important, where are all the real third-fourth cousins who are surely out there someplace?

What all this overlooks, of course, is the perspective of timing and numbers. I did my Full MtDNA test (U1b1) and at the time I had six matches with zero genetic distance and one match with a genetic distance of one. Now, four years later, I have fourteen of the former and four of the latter. Essentially that means that when I joined FTDNA, there were seven matches "waiting for me." And my time waiting for new matches has brought eleven more.

I started with about 2200 matches on Family Finder three years ago and now I have 4471. My matches have doubled in three years.

If we look ahead another ten years, my matches could increase say three or fourfold. From that vantage point, the vast majority of my matches will not be people whom I found waiting for me, but people who found me waiting for them.

Sam has three matches, none closer than a genetic distance of two, but ten years from now, he may well have a dozen of more, including one or two with zero genetic distance. This is particularly true of people who are part of non-American populations therefore less exposed to the idea of genetic testing. Here in Israel, genetic testing seems to have a very small following among the veteran Ashkenazic population, so many of our cousins may be late coming to the game.

So the truth is, the realistic view is that with only five years of autosomal testing in the various companies' databases, we should not think that we are testing to find our relatives. We are testing so that when our relatives test someday, we will be there waiting to be found. In the meantime, we check our new matches every week or two. That "someday" may be this week.

Sunday, June 14, 2015

Finding Max Greenberg

The family
Fifteen years ago, back before JRI-Poland began working with the AGAD archives in Warsaw on east Galician records, Jacob Laor and I had our own project to collect records from from the Pikholz strongholds Rozdol and Skalat. One of the searches we ordered produced a three-page list of records and we ordered all of them.

Among the Skalat families we were able to reconstruct was Jakob Pikholz and his wife Henie Malka Ginsberg, the daughter of Abish and Lea Mariem. In time, we gathered what appears to be the whole set of births for this couple.

Two of those records are for the eldest daughter, Leie Mariem, one dated 22 September 1877 and the other dated 1 November 1877. When we saw the actual records, it turned out that the second was a death record dated 4 November

Second was a son Perec (=Peretz), born 1878. He went to New York in 1902 and we are in touch with his granddaughter. In New York, he was known as "Barney."

Third was a daughter Jente Rachel, born 1880. Seven years ago, we learned that she arrived in Jerusalem during WWII and died here in 1970. She has one granddaughter and I have met her several times. The granddaughters of Perec and Jente Rachel are the same age and both work in the legal profession.

Fourth was a daughter whom AGAD listed as Bassie, born 29 August 1882. The actual birth record calls her Bassie Rosa.

Fifth was a daughter whom AGAD called Roze, born 28 July 1884. The conflict between daughters named Roze and Bassie Rosa was obvious and we assumed that Bassie Rosa had died and the name Roze was recycled to the next daughter. Fifteen years ago, I was quite the greenhorn.

Sixth was a son Abysch Abraham, born 1886 and died 1889.

Seventh was a son Szyje Izak, born 30 September 1888. Shammai Segal told me that he knew a butcher by this name, but knew nothing about his family.

Last were a daughter Cirl Ester, born 22 November 1890, and a son Berysz, born 18 January 1894, about whom we know nothing at all.

Rose in New York
In 1907, Rosa went to New York on the President Lincoln. She is clearly identified on the passenger list as Rosa Pickholz, age 22 from Skalat, daughter of Jakob Pickholz who lived in Skalat.

In 1912, she married Samuel Greenberg, a fellow Skalater. He spelled it Gruenberg in Skalat. The parents' names are correct and she gives her age as 26 instead of 28.

In the 1920 census, they are in the Bronx with a son Max, age four years and some illegible number of months. Rose is thirty-four years old and both she and Samuel are clearly identified as born in Skalat.

And there, dear readers, the story ended for me. I could not find any of the three anywhere, including in the Social Security Death Index. Neither of Rose's great-nieces had ever heard of her or of Max. I would look at records from time to time, but either I was not seeing them or they were not there. Sam, Rose and Max Greenberg are common names and that certainly didn't help. It's not like looking for Pikholz.

When I had another look at the 1930 census three months ago, I saw Max, age fifteen born in New York, with his widowed mother Rose, age forty-one born in Austria. They were living in Brooklyn.

I moved on to the 1940 census and found them, still in Brooklyn. Max is a twenty-five year old law clerk (the law runs in this family!) and has acquired the middle initial "M."

Rose is married to Morris Gross and his daughter, son-in-law and granddaughter are part of the household. Rose is fifty-two. I found the 1937 marriage record of Rose and Morris, which named her parents and thus nailed down Rose's identity.

I enlisted the help of Renee Steinig, who is way better at finding living people in the United States than I will ever be, and she found Morris Gross' granddaughter. The granddaughter did not know Max, but she did know of Martin. She had a name for his wife, who predeceased him by some fifteen years. There is - or was a second wife - before Martin M. Greenberg, the attorney, died in 1991. He had one daughter who died at fifty-six in Montgomery County Maryland. Her son is on Facebook, but has not yet responded to my attempts at contact.

Rose, who died in 1965, Martin and Martin's first wife are buried in adjacent graves in Montefiore Cemetery. Rose's age is seventy-seven. Samuel is elsewhere and I have not yet identified him among the 1920s New York deaths.

I acquired a copy of the probate file and it contains Martin's death certificate, excerpted below. The informant was the second wife.

His mother is not Rosa. His mother is Basha Rosa. Bassie Rosa, the older sister born in 1882? But how can this be? If Bassie Rosa was alive, how was the next daughter called Roze?

Similar names
I went back to have a look at Roze's birth record, but rather than rummage through my printed records from fifteen years ago, I went through JRI-Poland. There I saw that the indexer in Warsaw had written "Raze," not "Roze" and that was, in fact a much better transcription of the original. Raze (pronounced "Rah-tze") is not a form of Rosa. It is a distinct Yiddish name. So in fact there is no conflict between the names of the two sisters, Bassie Rosa and Raze. I have no idea what became of Raze.

It is important to work with original documents whenever possible. Judy Russell, The Legal Genealogist, addressed this issue a few weeks ago, and not for the first time.
Of course, it helps  to know what you are talking about. Fifteen years ago, I did not know the name Raze, so looking at the original more carefully would not have prevented my error.

That brings me back to one of my favorite points. I am supposed to know what I am doing. Other members of the family assume that I do and are hardly likely to recheck my work. Heck, if not for Martin's death certificate, I wouldn't have rechecked it either!

Sunday, June 7, 2015

Improved Strategies with GEDmatch

It's been not quite two years since I first began uploading raw autosomal and X-chromosome data to GEDmatch. Because of the nature of my research, I really want all the kits I manage to show up together when sorted alphabetically on a "one-to-many" comparison. For that reason, I assigned each of my kits an alias that begins with "Pikholz" followed by initials or a nickname. My own kit was called "Pikholz - IP." Aunt Betty's was "Pikholz - AB" and Gary's was "Pikholz - GZP."

For some reason, GEDmatch treats all these as though they have an asterisk in front of them (ie "*Pikholz - IP"). In an alphabetical sort, the names with an asterisk come before those without and that's fine with me.

When a few of my mother's Gordon family began testing, I added a "G" after "Pikholz" to help both with the identification and the sorting.

Some months ago, I decided that I wanted the Rozdol Pikholz descendants (there are twelve of these now) to sort together, so I added "Roz" to their aliases. Gary is now "*Pikholz - Roz - GZP."

Recently, I made two other changes. I have nearly fifty Skalat kits and it was getting cumbersome. First of all, I added "Sk" to all the Skalat Pikhlz descendants - and further added coding for descendants of my great-grandfather Hersch Pikholz and for descendants of Peretz and Nachman Pikholz. I became "*Pikholz - SkH - IP" and others begin with "*Pikholz - SkP -" and "*Pikholz - SkN." Other Skalaters begin with "*Pikholz - Sk -."

I did one other thing. All my kits now begin with the number "1." My alias is now "*1Pikholz - SkH - I." I did this of course because I wanted my kits to sort near the top. But it isn't just an issue of convenience.

GEDmatch processes all the data but only shows the first 1500 results. When you want the results to sort to show your closest matches, 1500 is plenty, even as the number of kits in the system has grown. But for those of us who sort alphabetically, that's not good enough because often our matches will not make the cut. For awhile I have been raising the threshold to 8 cM (the default is 7 cm) in order to reduce my matches, but often that is not enough and frequently I have to raise the threshold to 9 cM to reduce the number of matches displayed even further.

Sorting by email doesn't help because the email that represents all my kits begins with "israelp@," which comes out somewhere in the middle. I could change my email to something beginning with "ZZZisraelP@" and sort in reverse, but that seemed like alot of trouble.

So the aliases of all my kits now begin with "*1Pikholz" and I can go back to the default threshold of 7 cM. Eventually, enough other people will figure this out and perhps I'll have to change them to "*00Pikholz," but for now this will do.