Up a LevelClick the arrow to go up a level to the DNA & Family Traits page.
Follow the above link or click the graphic below to visit the Homepage.

HomepageDNA Results
Cullens of Upton
Jim Cullen


DNA Results for the Cullens of Upton, Nottinghamshire


NEWS: Recent changes in the ISOGG Haplo-I tree (10JUL2011) has changed our designation in the tree yet again. No longer I2b2, we are now known as I2a2b or more correctly I2a2b*, unless we have the L533 mutation, in which case we would then be I2a2b1. There is an error in ISOGG's tree so their current designation of our subclade is not correct. SNP (Single Nucleotide Polymorphism) testing on the DNA of the Cullens of Upton has been completed! Our results are L38+, L39+, L40+, and L65+, confirming our I2b2 or Hg I-L38 status. Recent SNP testing has been completed. We are L533- and L272.3+. So far, everyone in I2b2, who has been tested for the L272.3 SNP, has tested positive. It may be then, that the L272.3 SNP is redundant to L38, L39, and L40. The results have come in also for the new SNP L460, which was derived, so we are L460+ as expected. Technically, the L65 SNP is now known as L65.1+ for I2b2 since a parallel mutation has been located within Haplogroup J. These tests have been carried out in several stages over the last year or so. The results are as illustrated in the latest Haplo-I SNP Tree:

Haplo-I SNP Tree showing Cullen of Upton SNP results: M170+ P19+ P217+ M223- L39+ L40+ The M170+ and P19+ SNP's define Haplogroup-I, mutations we share with everyone else in the Haplo-I Tree. FTDNA tested my DNA in a 'backbone test' and found positive results for both M170 and P19 as was entirely expected. The tree is split into I1 defined by M253, and I2 defined by P215. It was clear that my haplotype was in the I2 tree so FTDNA started my testing there. I2 has two main branches: I2a defined by P37.2 and I2b defined by P217. Again, the suspicion of our haplotype being I2b led directly to the testing of P217 which was found to be positive. In cases where there are two possible paths such as P37.2 vs P217, if you test positive for one then you must be negative for the other; you can't have positives for both at once and so testing every single SNP is not necessary in most cases. Within I2b there are the I2b1 branches which share M223. My DNA tested negative for M223. This left the possibility of I2b* meaning we are postive for P217 and no other downstream SNP's. This actually was our haplogroup designation prior to the recent discovery of the L38-L40 series of SNP's. My results for L38, L39 and L40 were positive, confirming our I2b2 or 'I(x) Cluster' status. I've also tested positive for the new L65.1 I2b2 SNP. So far, all L65.1 results indicate that this SNP is likely redundant to L38-L40 for everyone in I2b2. An unknown proportion of I2b2 will test positive for L533 but my sample tested negative... research is ongoing. Current results suggest that DYS454=11 instead of 12 is indicative of L533+ status. My results are now located at FTDNAs I2b2 L38+ DNA Project. Additional Haplo-I SNP Tree information can be found at ISOGG's Haplo-I Tree. There is also an excellent Haplogroup I tutorial located at GeneBase.


Y-STR testing can be compared to genetic 'stuttering', where short segments of DNA are repeated over and over again. The number of 'stutters' is recorded as our marker value and this value can change over the course of many generations depending on mutation rate. SNP testing on the other hand can be compared to a single letter having suffered a transcribing error and is a much more permanent kind of mutation; either you have the transcription error or you do not. This error or mutation is passed on genetically to all male descendants and the mutation rate is so extremely low that each SNP is expected to have happened just once in all of human history. This makes the SNP a most dependable marker for DNA classification and so the human phylogenetic tree is defined by these SNP mutations. According to the latest Haplo-I SNP tree published by ISOGG and FTDNA, our current haplogroup is I2b2, defined by the recently discovered SNP's L38, L39, L40, and L65.1. The shorthand notation would be I-L38 though many of us still prefer 'I(x)', 'X-Cluster', or sometimes 'Lichtensteiners'. For some cutting-edge research into the origins of I-L38, please refer to the website created by Hans De Beule, In Search of the Origin of I-L38 (aka I2b2). You will find there four research papers, 'Origin, Distribution and Migrations of I2b*-Subclades' (Sep2008), 'Origins of Hg I-L38 (I2b2) Subclades' (Apr2009), 'Early Bronze Age Origin and Late Iron Age (La Tène) Migrations of I-L38' (Nov2009), and 'Phylogenetic Relations and Geographic Distribution of I-L38 (aka I2b2)' (Jun2010). If you are a member of I-L38, these papers should be considered required reading.

The results for the 67-marker upgrade to the Y-STR DNA results for the Cullens of Upton has been received and are displayed in the below table. Our other Cullen descended from the Upton, Nottinghamshire line remains at the 37-marker level. The DNA results of a Chris Cullen of Devon, England indicate a near perfect match between our 37-marker haplotypes, putting our estimated time to a common ancestor well within the historical period and most likely early in the Upton family line as was suspected. Chris' results have provided much needed confirmation of our shared DNA signature and also indicate that some unique DNA markers may be able to identify Cullens of this group specifically. Further details are available below.

To date, only three Cullens descended from the Upton, Nottinghamshire line of Cullens have been DNA tested. There is still the need for confirmation of the family's genetic signature by having other Cullens, not closely related to me but still descended from the same line of Cullens, to be DNA tested also. If our signature is correct, your results will be very similar to mine but off on maybe two of the values in the tables below. It would also be VERY interesting to see results of DNA tests on Cullens descended from the Manorhamilton, Co Leitrim line. A connection between these two families has long been suspected and now we have the ability to actually make a strong case for or against the connection. If anyone is at all able to do so, please contact Bernie Cullen or myself with any questions you may have about DNA testing. For anyone descended from the Cullens of Co Leitrim there is a special incentive. I'm so keen on seeing the results that, if you're the first to agree to a 37-marker test, I'll split you 50/50 on the cost. Contact me for details.

I highly recomend, at the very least, a 25-marker test! Current testing and analysis on Y-STR haplotypes, along with the recognized subclades and other information available, indicates that a 37-marker test is the most useful test economically. Based on the Cullen results obtained so far, your results are very likely to be some variety of "R1b" or "I". If your results indicate "R1b" then you will need those extra markers just to distinguish you from every other "R1b" out there since this is a very common haplogroup. If your results indicate "I" then you'll want to make maximum use of Ken Nordtvedt's information which almost requires those extra markers to place you in the "I" haplotree. The 12-marker test is fine for low resolution work on a global scale but, for precise genealogical comparison and geographical distribution information concerning your particular haplogroup, you really do need at least 25 markers.

To read more about the Cullen DNA results we have so far, I've started a page for Cullen DNA Results. There are still Cullen families that have not yet had any DNA representatives. Cardinal Paul Cullen's ancestors are suspected to be the same as the Anglo-Norman family prominent in Cullenstown, Co Wexford. Some Cullen families of Co Leitrim may have descended from the same Cullenstown family. There is also the possibility that the Cullens of Upton, Nottinghamshire are a branch of the Cullen families from nearby Kent. One of my closest matches is a DNA result for another family in Kent, England. There is also a prominent line of Scottish Cullens from Lanarkshire. By comparing DNA results, supposed connections between these various family lines can either be given strong verification or shown to be highly unlikely. Given the poor paper trail on some of these Cullen family lines into the distant past, DNA is one method available to us to gain more information that would otherwise not exist.

Those of you who are descended from the Cullens of Upton, Nottinghamshire will find the following information to be very surprising, very informative, and at the very least - intriguing. I have already joined the Cullen Family DNA Project and submitted my sample to FamilyTreeDNA for analysis. It's a simple matter of scraping the inside of your cheek with a device that looks very much like a paper toothbrush. Getting the sample really is fast and painless - the seven week wait for the results to come back is not so fast - or painless! The results returned were quite surprising.

According to our results so far, the Cullens of Upton are members of the "I" haplogroup. A haplogroup is simply a group of individuals having DNA with similiar characteristics or shared key values in their results that indicate a common ancestor. Only about seventeen percent of the European population falls into the "I" haplogroup and we are NOT one of the Big Three 'subgroups' in this haplogroup which accounts for 95 percent of haplogroup "I", so we are very much a minority; about six tenths of one percent of all lineages in the world today. Upon further inspection it was found that our particular DNA type has no cursory matches whatsoever in the database of tested individuals, meaning that our DNA type is indeed extremely rare. It is not unusual for DNA types to have hundreds of 12-marker matches and possibly a dozen or more 25-marker matches with varying degrees of relation. At a similiar resolution of searching, our Cullen DNA reveals zero matches.

Our closest 25-marker matches in the YSearch database are at a genetic distance of seven, meaning that the most recent common ancestor is likely to be at least a couple thousand years ago. There is now a temporary page, Search Results, showing the closest matches to my haplotype. There isn't a whole lot of information there besides the tables themselves but it's there for those of you who are interested in the search results. Some of our closest genetic kin have very old roots in England, with a few scattered in Scotland, Germany, Ireland, and France. There will be another page soon also for other Cullens, now that I've gotten the method of transferring search results to HTML table format automated. The other Cullen results are absolutely fascinating and we're lucky to have such outstanding and interesting results for the few Cullens that have tested so far!

After some research and a discussion with Bernie Cullen, it is highly likely that our Y-STR DNA type is a specific and recently (May 2005) uncovered branch of the "I" haplogroup known as I-L38 or "I2b*" (per the 2008 ISOGG SNP tree). We were previously designated "I1b2*" but, with the rearrangement of the subclade tree due to the discovery of new SNPs, our designation has changed though our genetic type as referenced to our SNP status is still the same. As it is currently understood, the I-L38 haplogroup is divided into two varieties, I-L38-A and I-L38-B. Several values in our DNA are indicators that we may be of the I-L38-A subclade. For more on the "I" haplogroup, see Ken Nordtvedt's excellent web resource, Population Varieties within Y-Haplogroup I. Ken's work is right there on the edge, pushing into uncharted territory. He's very confident and very good at what he does. Ken's page is definitely THE place to look for breaking news on the "I" haplotree.

Prior to our I2b2 designation, and "I2b*" designation of 2008, and our "I1b2*" designation of 2006, our subclade was identified and given the unofficial name "I(x) Cluster" or simply "I(x)". The 'X' represented, of course, the unknown. The 'X' was also to differentiate us from a similar cluster within Y-Haplogroup 'I' known as the "I(y) Cluster". The fame of I(x) began with Glen Todd who was himself I(x) and was the first to SNP test as I(x). Study of the X-Cluster, soon to be identified as "I1b2*" was headed up by Glen Todd, Ken Nordtvedt, and Andrew Lancaster, among others.

So what does it all mean? I'll get to that in a minute... first let's have a look at the results as they are usually presented. DYS values are basically indicators of locations on the Y-chromosome where there is a test point. And the value associated with a DYS location is the number of repeats of the short section of DNA code at that location - thus the term "Short Tandem Repeats" or STR's for short. The collection of STR's at all the DYS locations tested is your haplotype - your genetic signature. If you're already familiar with YSearch, you can find my profile with the code DXF2E. There are three sets of results below - my STR's are indicated by JC and Chris' by CC. The third set of 12-markers, indicated by DC, is from an Australian-Tasmanian branch of the family:

 3
9
3
3
9
0
1
9
3
9
1
3
8
5
a
3
8
5
b
4
2
6
3
8
8
4
3
9
3
8
9
|
1
3
9
2
3
8
9
|
2
4
5
8
4
5
9
a
4
5
9
b
4
5
5
4
5
4
4
4
7
4
3
7
4
4
8
4
4
9
4
6
4
a
4
6
4
b
4
6
4
c
4
6
4
d
JC1425171113181113111211291781010122414192814151515
CC1325171113181113111211291781010122414192814151515
DC142517111318111311121129             

 4
6
0
G
A
T
A
 
H
4
Y
C
A
II
a
Y
C
A
II
b
4
5
6
6
0
7
5
7
6
5
7
0
C
D
Y
a
C
D
Y
b
4
4
2
4
3
8
5
3
1
5
7
8
3
9
5
S
1
a
3
9
5
S
1
b
5
9
0
5
3
7
6
4
1
4
7
2
4
0
6
S
1
5
1
1
JC1010191914131718353712101181516811108119
CC101019191413171835361210          

 4
2
5
4
1
3
a
4
1
3
b
5
5
7
5
9
4
4
3
6
4
9
0
5
3
4
4
5
0
4
4
4
4
8
1
5
2
0
4
4
6
6
1
7
5
6
8
4
8
7
5
7
2
6
4
0
4
9
2
5
6
5
JC122122151112121481323211213121411121211
CC                    


Some of the markers in the above table are color-coded. Red indicates markers that Chris and I do NOT match on. Chris and I do not run into a common ancestor until about three generations after Richard Cullen who died in Upton in 1579, making our common ancestor right around the mid to latter 1600's. It is a natural thing that harmless mutations will accumulate at average rates over time in our respective lines. Given the time our family lines have been separate, the above number of mismatches is well within the expected range for natural mutations. This is actually good since our lines now have identifying mismatches. Any Cullen who tests in the future can look to our distinguishing markers to determine which line may represent closer relatives.

The markers in blue indicate those markers that help to identify our subclade as I-L38. The first twelve STR's or markers can be used to determine that Haplogroup "I" is the proper designation for our values. Further analysis indicates that we also have some distinctive marker values that indicate Hg I-L38. At DYS455,454 we match the unique modal 10,12 for those markers. DYS454=11 and DYS455=8 or 11 for every other "I" haplogroup except I-L38. In the marker order that is standard for FTDNA, the distinctive signature "8,10,10,12" at DYS459a,b,455,454 is a sure sign of I-L38 status. There are other markers that can be helpful in determining if a haplotype is indeed I-L38. The most useful of these indicators is the YCAIIa,b combination of 19,19 which, as you can see in the table, is exactly what our repeats are for those markers.

There are three underlined markers; DYS19, DYS447, and DYS448. These markers help to identify which of the three subgroups of I-L38 Chris and I are members of. Since we are almost perfect matches we are of course members of the same subgroup which in this case is I-L38-A. I will explain more about these subgroups shortly.

The markers in green, which both Chris and I share, are almost unique within I-L38. Without Chris' markers to compare with mine I would almost have suspected an error in FTDNA's labs. Since Chris and I received the same results, a nearly perfect match, we can be sure that our unique mutations are not due to lab error. The rare 18 repeats at DYS385b is found in only three I-L38 families; Cullen of Upton, Adam in Scotland, and Miranda in Mexico. DYS385b is normally 16 with a few 15's and 17's on either side so 18 repeats here is not too common at all. Even more distinguishing is our shared 14 repeats on marker DYS437. This marker is almost universally 15 repeats within I-L38 and there are less than a handful that have repeats other than 15. Only one other known I-L38 haplotype shares our 14 repeats at DYS437 and this is a Steinmetz of Maryland, USA. This Steinmetz however does NOT share our 18 repeats at DYS385b. His matching value of 14 at DYS437 is due to random chance and not close relation. Our shared mutation at DYS607 is also a rare 13 repeats while the rest of I-L38 is almost always found with 14 repeats. These three rare mutations are found individually at 5% or less of the I-L38 population. No other person within I-L38 has any two of our unique mutations in combination. The Cullens of Upton have all three with odds, going strictly by the probability of these mutations appearing, of about 1 in 33,000 within the I-L38 subclade. At 0.6% of the world's population, that places our odd marker values as probably appearing in only one out of about 5.5 million people. There are two further uncommon marker values that Chris and I share. The appearance of 15 repeats at DYS464b appears in only about 12% of the I-L38 population, and our shared 29 repeats at DYS389-2 is found in only 12% of I-L38. Additionally, these rare marker mutations are present in every single panel so far tested at FTDNA. There are two rare mutations in the first panel, two more in the second panel, and another in the third panel - this totals five rare marker values in FTDNA's first three marker panels. These odd marker combinations explain why we have zero matches at any level with anyone else in the world at YSearch; we have extremely rare marker mutations on an already rare Y-STR DNA type.

Chris' family and mine share a set of Cullen relatives in Radford, Nottinghamshire, meaning that our common ancestor was a Cullen from that village or was an ancestor of the branch of the Cullens that relocated there from the nearby village of Upton. Seeing how close Chris and I match on our DNA signature and, knowing how rare this signature is, we may also state confidently that there were no NPE's in either of our lines back to the point of our common ancestor. The 'NPE' is an abbreviation for 'Non-Paternity Event' which is the polite way of saying that there was a 'milkman' in the family tree. At an accepted rate of about five percent chance per generation for an NPE, the 25 plus generations that separate us is an indication that we've dodged the bullet. This is not to say that there is no chance that Chris and I happen to carry the genetic signature of some remote 'milkman' on the shared portion of our family trees, or that such an event never occured ANYWHERE in the Upton Cullen family tree. This only indicates that it would be extremely unlikely that a 'milkman' exists in either of our family lines back to our common ancestor. The possibility of an NPE elsewhere in the Upton Cullen family tree still exists and is another reason that further DNA testing on related branches of the Cullen family tree is still necessary.

Chris and I represent two separate Cullen lines from early in the family's history in Upton yet we share some very unique markers. If we take our uncommon I-L38 subclade markers in conjunction with our rare distinguishing markers within the subclade, then we can in theory, based on current knowledge, compose a set of markers to uniquely identify the Cullens of Upton. We have of course the distinctive signature "8,10,10,12" at DYS459a,b,455,454. Within this group of I-L38 we add the uncommon 13,18 combination at DYS385a,b and the very rare 14 repeats on marker DYS437 along with the rare 13 repeats at DYS607. Until our knowledge of marker differences within the Upton Cullen family tree changes, we can call this the DNA signature of the Cullen Family of Upton, Nottinghamshire:

Y-DNA STR Signature of Cullen of Upton, Notts
3
8
5
b
4
5
9
a
4
5
9
b
4
5
5
4
5
4
4
3
7
6
0
7
1881010121413


I find it amazing that one particular family can be identified on the basis of just seven markers. What's even more amazing is that we could probably drop DYS459a and DYS459b from the above signature and we would probably STILL have a unique set of only five markers that would identify our family specifically from all other known lineages in the world today with an extremely high probability. I say probably because we don't have a complete genetic map of the Upton, Nottinghamshire Cullen family tree. It's possible, for instance, that DYS437 may be 14 repeats for half of the descendants and 15 repeats for the other half. Our unique 18 repeats at DYS385b may not apply to ALL descendants. Their may be any number of combinations of the above. There may also be other unique mutations that have not shown up yet because Chris and I are the only two so far to have been tested and we may not have these supposed unique mutations. We're pushing the limits here though since 16 generations is just not enough time to allow that many mutations to occur. We can only hope that, as more Cullens in this extended family are tested, we are able to identify unique combinations of marker repeats for each branch of the family.

There is some indication that these unique mutational characteristics are found only in very closely related haplotypes - if not only in the Cullens of Upton, Nottinghamshire. The closest genetic relatives to the Cullens of Upton can be found in a search for genetic matches at YSearch comparing 37-marker haplotypes. Representative DNA samples from a family by the name of Brooks, also haplotypes belonging to I-L38-A, are separated from the Cullens by a genetic distance of seven. Actually this is not all that far if you compare this to Chris Cullen's genetic distance of two from my own haplotype. The Brooks, as close as they are genetically, do not have the unique mutational characteristics of the Cullens. In the Brooks samples, DYS385a,b is found to be a perfectly normal 13,16 and their DYS437 is also the very common value of 15 repeats. The next closest genetic group is a family by the name of Chewning/Chowning from Co Kent in England. They are at a genetic distance of twelve from the Cullens of Upton. The DYS385a,b results for this family are 13,17 repeats and their DYS437 results are an expected 15 repeats. For the few haplotypes that share one or the other of our two unique mutations, we find that their overall genetic distance is quite large - meaning that they have acquired a mutation resembling ours but this was by chance and not by close genetic relation. They are located several branches away as compared to the Chewning/Chowning and Brooks families. The unique mutations that Chris and I share must then be confined within somewhere between two and seven genetic mutations away from us.

There is one exception. Our rare 13 repeats at DYS607 is a marker value that has appeared several times within the I-L38 subclade. It seems that we likely share this mutation with our closest genetic relatives, the Brooks family. What this means is that the mutation to 13 repeats probably happened prior to the time when the male line of ancestors split into the male lines that would go their separate ways to the Cullen and Brooks ancestors. At one or two other times in the distant past, probably two thousand years or more ago, DYS607 mutated to 13 repeats also in the cousin lines of our ancestors. We can use DYS607 to identify close genetic relatives if they are at a close genetic distance to us but we have to remember that DYS607 is shared with a cluster of closely related family lines. We would have to inspect also our other rare marker mutations to confirm a closer genetic relationship to the Cullens of Upton, Nottinghamshire.

Continuing with the analysis of the subclade designation for the Cullens of Upton, Nottinghamshire. There are two varieties of I-L38 at present; the 'A' and 'B' varieties found by Ken Nordtvedt. Whether a given haplotype is the 'A' or 'B' variety is determined mainly by inspection of DYS448. Refer again to the table where our Cullen DNA results are shown with these markers underlined. At DYS448 you can see that we have 19 repeats, indicating we are likely to be the more specific I-L38-A. If DYS448 were 21 instead, then we would be I-L38-B. I note also that there are a couple people out there with DYS448=20 so I'd expect to see a possible third haplogroup, I-L38-C sometime in the future. Whether or not this happens depends on determining if this variation is a simple mutation or an actual division of the variety. The data as it stands right now seems to indicate that these examples of DYS448=20 are due to diversity or spread through random mutations over time. There do seem to be weak correlations with other markers so the I-L38-C will stay for now.

As of 31Mar2007, a third variety of I-L38 is official. Ken Nordtvedt has added this variety to his Haplogroup I Population Varieties spreadsheet ( FounderHaps.xls ) at http://knordtvedt.home.bresnan.net. I-L38-C is characterized by the usual 8,10,10,12 signature at DYS459a,b,455,454 and by 20 repeats at DYS448. Along with DYS448, I-L38-A,B,C can be identified by 17,16, or 15 repeats at DYS19/394.

There are other markers that help separate the 'A' and 'B' varieties of I-L38. I prefer to stick to the combination of DYS447,448 since they are close to each other in the FTDNA marker order and they are fairly reliable. A 24,19 combination here indicates 'A' variety while a 25,21 combination indicates 'B' variety. Low values indicate 'A' and high values indicate 'B' so those of you who have 18 repeats at DYS448 will almost certainly find that you have 24 repeats at DYS447, indicating 'A' variety. Likewise, those with 22 repeats at DYS448 will find elevated values at DYS447 as well, indicating 'B' variety. The following graphic illustrates the difference between the two varieties of I-L38. The chart is based on weighted genetic distance, according to mutation rate, interpolated between two sets of modal haplotypes, one set for the X-Axis and one set for the Y-Axis. Two things of importance can be learned from the chart. One is that DYS448 separates the two varieties of I-L38 quite cleanly. The vertical line at about X=0.5 separates 'A' from 'B'. The other thing to learn is that, if a given haplotype has 20 repeats at DYS448, then there is a good chance of that haplotype being of the 'A' variety. Note that the green dots mark those haplotypes with DYS448=20. There are only two dots on the right or the 'B' side while the rest are on the left or 'A' side. Only one sets nearly on the dividing line. The graph also shows that, not surprisingly, I-L38 resembles I1c haplotypes more than it does I1b type haplotypes since all of the I-L38 data points lie above 0.5 on the y-axis. My own haplotype is on the 'A' side of the graph. My dot is the top dot in the red triangle of dots at the bottom right of the 'A' side of the chart. Chris Cullen's dot would be found here also. We're somewhat separated from the majority of the other I-L38-A members, which is to be expected - some of our markers are somewhat off of the modals. This just illustrates that our haplotypes, though clearly I-L38-A, lie near the edge of the cluster of haplotypes identified as I-L38-A.



For the details on a more in-depth inspection of apparent clustering within the I-L38, see the page on I(x) Variants. The page gives some analysis of three apparent variants of I-L38. I would not define them as actual clusters of I-L38 since they do not fall clearly within I-L38 based on DYS459a,b,455,454 repeats alone. I have used the term 'variants' since the defining modals are different. Here I will only give a brief description. I-L38 as you already know is defined as a 8,10,10,12 combination at DYS459a,b,455,454. There are other groups of haplotypes readily identified as I-L38 though their repeats at DYS459a,b,455,454 are not 8,10,10,12. I have found three groups of these variant I-L38 haplotypes. The largest variant group is distinguished by an 8,9,10,12 combination at the defining markers and is a clear variant of the I-L38-A subclade. After an inspection of this group, Ken Nordtvedt has come to believe that DYS459b mutated from "10" to "9" in I-L38-A after it arrived in England. The other two variant groups are smaller and distinguished by an 8,10,10,11 or a 10,10,10,12 variant combination on the defining markers. See the I(x) Variants page for more analysis of these variant groups.

To verify for myself that I-L38-A is the proper subclade for my DNA results, I've written a spreadsheet to measure the differences we have as compared to the modal or most frequently observed repeats for the other haplogroups closely related to ours. I've done a simple distance-squared calculation that's skewed (linearly before squaring) according to the mutation rates of the markers. The result is then scaled to a convenient size result. I did not take into account the distinctive modals observed for the various haplogroups since this is, for the most part, handled pretty well by the calculation itself. In the table below, the yellow boxes indicate the distinctive modal values used to identify various haplogroups. Clear boxes are additional markers used to further define the results or help in identification. Red indicates fast-mutating markers, and slow markers are in blue. The scores along the bottom of the sheet indicate how close my DNA matches the various haplogroups; lower scores being a closer match and zero being a perfect match.



As you can see in the results, I-L38-A and I-L38-B are by far the closest matches, with the A variety having an edge over B. My repeats for DYS448=19 also matches the modal for that variety. The results are in green boxes at the bottom of the table. Recent upgrades on my marker results and more clearly defined modals for I-L38-A and I-L38-B has confirmed that the Cullens of Upton are most certainly I-L38-A. According to Ken Nordtvedt, DYS448 is not the only criteria but it is the main one and a very good indicator. Based on the scores in the above table, there is a very large gap in the scores between I-L38 and every other haplogroup in the table. There's little doubt about our I-L38 classification, especially when the above table was calculated without regard to unique modal features of the haplogroups. To have on hand the required markers to work at this level of detail would require the additional twelve markers available in a 37-marker test. I wholly expect that these extra markers will see extensive use in genealogical work, especially as the world haplotree grows in complexity. I would venture a guess that 37 markers will be about the most needed to place one in the world haplotree; even recent branches have been described in detail with these same 37 markers.

As you may have gathered by now, the mutation rates mean that the number of repeats for any given DYS location may change randomly over periods of generations. The mutations of course are natural and harmless as they occur in the junk regions of the Y-chromosome and so are currently understood to really have no function. For those of you concerned with privacy issues, there is no other usable information to be gained from the knowledge of your particular mutations besides what is applicable to genealogical matters. The mutation rates are quoted per generation, meaning that for any given DYS value, there can be a changed value expected every 400 generations or so. This rate varies according to the DYS location but there are other factors not fully understood that can affect the mutation rates. In effect, the values wiggle back and forth and slowly spread over thousands of years, accounting for the differences we observe in the repeat values for various groups around the world. It's the mutations that have made it possible to trace, classify, and arrange the haplogroups into the world family tree. In this view, all humans alive today are the descendants of just one man, known as "Genetic Adam". Those of you who are not great fans of the Bible, fear not; "Genetic Adam" will occur naturally in a population whether there is a Divine Creator or not.

Given that STR's wiggle and spread over time, it is possible to figure an approximate age for haplogroups and assign a geographical point of origin for the group. As people migrated across the globe they took the mutations in their DNA with them and, theoretically, we should find a concentration of that haplogroup in the present day at the place of origin and a declining percentage of individuals in that haplogroup as we move away from the point of origin. Using the same kind of mathematics, it's possible to calculate an approximate number of generations back for two individuals until we can expect to find the MRCA or "Most Recent Common Ancestor".

This is one of the little mysteries I find for the "I2b*". The amount of wiggle and spread indicates a fairly young haplogroup, several thousand years or so before present, according to my own figuring. However there is no real geographical point of origin. We are found spread very thinly, according to Ken Nordtvedt, "well-dispersed in continental Europe from Italy and Iberia, in France and Germany, and up through Denmark". This would seem to indicate an older haplogroup, or possibly one that just migrated faster than usual. At any rate, we do seem to be looking at a founder for our haplogroup being located, at least several thousand years ago, somewhere in the area of Denmark, Northern Germany, and the Netherlands. As more data comes in we will likely see that the origin is actually further south. We would likely have migrated eastwards as Danish invaders or vikings who began to arrive in Britain in earnest the latter half of the 9'th century. Even today we find traces of German/Danish DNA in Britain due to the influence of these early settlers, especially areas that once fell under Danish control. Southwell is one such area, where the concentration is three times higher than in other areas. It's no surprise then to find the Cullen family in Upton by Southwell, Nottinghamshire. We know the Cullens were there several generations before Richard Cullen who died in Upton in 1579/80 but it may be possible that we were there or in neighboring Kent far earlier than we previously thought.


DNA Results from Skeletal Remains in Lichtenstein Cave, Germany


One of the more interesting applications of my WSD method of matching haplotypes to modal values attributed to subclades is the 3000 year old skeletal remains discovered in a cave in central Germany. Lichtenstein Cave is located in the Harz Mountains in Germany. The region is what was known as Lower Saxony and so the remains are sometimes referred to as 'Saxon' though it is unlikely these 3000 year old bones represent a people that we would recognize as historical 'Saxons'. What can be said is that it's unusual that these people saw fit to interr their dead in a cave when it was the practice of the time to cremate their dead and bury the remains in the open. It's possible that this was the only choice they had if they were confined to the mountains for one reason or another. There were about forty individuals interred in the cave and, using genetic evidence, it could be shown that many of the individuals were related as an extended family over four or possibly five generations. This is the first prehistoric family to have been identified through DNA analysis. The skeletal remains contained enough viable genetic material to allow sequencing and the Y-DNA STR data is provided as the usual markers we use today.

You can find more information on the Lichtenstein Cave DNA in this Qiagen article. There is also another good article at the ABC News in Science website. You can also find other sources of information using Google or whatever search engine you prefer.

The surprise is that the subclade for the DNA markers sequenced can be identified and that it is very likely that the designation is 'I-L38-B'. I believe I was the first one to identify the DNA signature and give some quantitative measure to back up this belief. I did this by analyzing the markers using the same WSD method I use to identify modern day subclade designations from Y-DNA STR data. I originally posted the results to Rootsweb's DNA list. You can read the thread here, the contents of which have been reproduced below:

Regarding the 3ky old sample of DNA from Lower Saxony. I've run my own 'predictor' on the data given for his markers. The scale is relative and a lower score is better, zero is a perfect match. A good match is a low score separated from all the rest of the scores by a respectable margin.

The data given was:
DYS393=13, DYS390=25, DYS19=15
DYS391=11, DYS385a,b=13,17
DYS439=11, DYS389i=12, DYS392=11
DYS389ii=27, DYS437=15, DYS438=10

Across the world's haplogroups, the sample scores an average of 50.36 with a min/max range of (19.36-100.63). The score that stands out is Haplogroup I, Ix specifically, with a score of 6.86

J scored 23.39 : I* scored 19.36 : G scored 32.76

I1a scored 23.78 : I1b scored 27.68 : I1c scored 66.46

These are the old naming conventions but it's clear that the data prefers the haplogroups closer to the root of Haplogroup I. Inspection of the markers and simple genetic distance supports this.

I then ran the same data through a second 'predictor', specifically for Haplogroup I and its subclades, according to Ken Nordtvedt's naming conventions. This scale is also relative and, since it spans Haplo-I and its subclades specifically, is a separate scale from that given above.

Across Haplo-I subclades, the sample scores an average of 33.3 with a min/max range of (17.04-103.63). The two scores that stand out are I2b*-A with a score of 6.86 and I2b*-B with a score of 5.82

Again the data prefers subclades closer to the root of the haplogroup, scoring in the area of about 17 for them. I1b* scores 17.26 and I1b1*-Isles2 scores 17.17 which was expected since I2b* and I1b1*-Isles2 have some similar mutational characteristics to I1b*.

The lower score for the B variety of I2b* should be taken with a grain of salt. DYS385b is the main reason for the lower score but there just isn't enough data to make that call. The scaling system is weighted so additional markers could cause the final scores to drift but I am satisfied, due to good score separation, that the Haplo-I subclade for this data is I2b*

Jim Cullen


We are currently attempting to contact the researchers who worked with the DNA samples to discuss possibly measuring one or two extra markers from any genetic material ( if any ) left in the laboratory samples. Ideally we would like to see the number of repeats at DYS454 and DYS455 which would verify the I2b* subclade designation. For I2b* we should see 12 repeats at DYS454 and 10 repeats at DYS455 - a very rare combination for any subclade or haplogroup in the world. Results of the inquiry will be posted here.

Below is a graph of the WSD ( Weighted Squared Distance ) figures of the Lichtenstein Cave DNA from the currently known subclades of Haplogroup I. Again, lower scores are better matches and zero would be a perfect match. The score is a measure of genetic distance scaled by the mutation rate of the individual markers. This puts all markers on equal footing when the individual scores are summed up. What I look for in these graphs is the lowest score and how far it is separated from the rest of the scores. An ideal match is a score near zero and the next closest match is a score that is much larger. Notice that the lowest points in the graph by a good margin are I2b*-A and I2b*-B, with I2b*-B being a slightly better match due mainly to the number of repeats observed at DYS385b. I2b*-A scored 6.86 and I2b*-B scored 5.82 in the Haplogroup I scale.

Graph of WSD figures from Lichtenstein Cave DNA from known subclades of Haplogroup I


An additional piece of evidence turned up when I ran these markers through the 'Search for Genetic Matches' utility at the Y-Search website. There were a dozen results within a genetic distance of 2 and all of them were I-L38 with a fairly equal mix of I-L38-A and I-L38-B varieties. One match is an 8,9,10,12 variant of I-L38-A. The geographic origins are about what one would expect - pedigrees in England and Germany ( one hit in Wales ). Does it make sense that 3,000 year old remains can be identified with a modern subclade? Sure. Our phylogenetic tree begins to branch out 60,000 - 100,000 years ago. Haplogroup I begins splitting prior to the last ice age and continued afterwards as well - this was 15,000 to 25,000 years ago roughly. I-L38 branched off probably midway through the I-Tree. The Lichtenstein Cave remains are dated to roughly 2,700 years ago. Comparitively then, the Lichtenstein Cave DNA is not exactly ancient. Call the 2,700 years roughly 100 generations and then compare this to the average marker mutation period of 500 generations per mutation. In this light I think it can be said that the WSD method should still work fairly well in matching the STR data to the subclades we recognize today. We can be almost certain that the DNA of the individuals interred in Lichtenstein Cave identified as I-L38-B are exactly what they appear to be - relatives of ours from from roughly 100 generations ago in central Germany.

Document In Progress...




Y-DNA Certificate from FamilyTreeDNA

Y-DNA Certificate from FamilyTreeDNA




Use your Back Button or click here to go to the DNA & Family Traits page