Up a LevelClick the arrow to go up a level to the DNA & Family Traits page.
Follow the above link or click the graphic below to visit the Homepage.

HomepageCullen Family DNA
Current "I" Results
Jim Cullen


Current "I" Results from the Cullen Family DNA Project


This page is meant to be an extra resource and an extension of The Cullen Family DNA Project, headed by Bernie Cullen and Terry Barton. The Cullen Family DNA Project was made possible by FamilyTreeDNA.com and WorldFamilies.net. Subclade modal values are those determined by Ken Nordtvedt. His work on the Haplogroup I subclade tree and the geographical distribution of the haplotype populations is an invaluable resource. You can find more information at Ken Nordtvedt's website. Here you will find what I hope to be useful information and analysis of the results obtained so far for Cullen DNA test results. Hopefully it will also convince those of you who haven't considered DNA testing yet, or perhaps haven't been able to make up your minds whether or not it's worthwhile, to take that first step.

For reasons of privacy, the individual participants in the Cullen Family DNA Project are identified here only by number and stated region of origin. If required, more information can be found at The Cullen Family DNA Project website. The results so far have been outstanding. Analysis of the results, even at first glance, proved to me that the Cullen DNA Project has great potential for interesting inferred connections to each other and to known genetic groupings in Ireland and Western Europe in general.

The first observation I made was that, as could be expected, the majority of the results were Cullens of the R1b haplogroup. R1b is the most common haplogroup in all of Western Europe, accounting for almost 70% of all lineages. The sub-haplogroups (I will refer to them as subclades) are branches of the R1b tree, distinct and separate but still frustrating in their similar modal values. For this reason I would highly recommend that you consider only 25-markers or more for the STR Y-chromosome test. FamilyTreeDNA makes this same recommendation for those in the R1b haplogroup. If you've already obtained 12-marker results it's a simple matter to upgrade when you are ready and able to do so. For tough subclades or in cases where more accurate timing for suspected genealogical connections is an issue, I would then recommend upgrading to a 37-marker test. A logical, affordable, and understandable approach is to start with a 12-marker test and upgrade when necessary - such advice is found repeatedly in the various forums and websites.

For all the members of the Cullen Family DNA Project, I've taken the results and compared them against modal (normal, or expected) values for haplogroups from the entire world family tree. The results of these comparisons can be found on the R1b Results Page. Most Cullens should test as members of the R1b haplogroup but there will be a few that test as members of the I haplogroup. To date, all Cullens belong to one of the "I1b" subclades and none belong to any of the "I1a" subclades. The basic determination here can be made by inspecting your counts for DYS455,454 which should be 8,11 for "I1a" subclades. If your counts for DYS455,454 are 11,11 or possibly 10,12 then you are likely a member of an "I1b" subclade. DYS426,388 is also very helpful. For "I1a" subclades, the counts will be 11,14 - 11,15 or 11,16. "I1b" subclades have counts for DYS426,388 that are usually 11,13. Since all our Cullens in haplogroup "I" are in some "I1b" subclade, analysis of results will contain no references to any of the "I1a" subclades.


To identify which subclade a particular DNA result belongs to, I calculate what I call the WSD, or weighted squared distance from a modal haplotype. WSD is simple genetic distance weighted according to mutation rate. The final score is simply the sum of the WSD for all the markers in the haplotype. The reasoning for this is that a variation against a slow marker is more powerful than a variation against a fast marker. Also, modal values that indicate subclades within a haplogroup usually (but not always) tend to be the average or slower mutating markers. Using the weighted difference then amplifies your variations as compared to the important modal values. Squaring the variations amplifies your overall score, making it easier to distinguish between close and 'not so close' matches. In the end, close inspection by hand is still important.

Red background indicates a very clear association with a known subclade and no other close matches. Magenta (pink) background indicates a very probable association with a known subclade. Yellow are the next closest matches. For my own peculiar DNA results, there can be no other determination besides Ix. The scale here is arbitrary; a lower score is a better match and zero is a perfect match but, the actual score depends on the number of markers used and the specific markers that have the variant number of repeats. The key here is to look at the lowest score and see that it is separated from the next lowest score by a wide margin. For the members of the Cullen Family DNA Project, in no particular order, here are the WSD results as compared to the modal haplotypes of the "I1b" subclades:

Cullen Members: I1b Subclade Weighted Squared Distance (WSD)
 C-3C-9, C-20C-5, C-18C-19
Number of Markers -->12253725
I1b1*-Din229.67211.83734.8477.26
I1b1*-Din33.24212.59736.2479.75
I1b1*-West234.01442.96944.69279.69
I1b1*-Isles33.63237.45733.7479.74
I1b1*-Isles221.59104.37583.93180.84
I1b1a-A26.54243.098.9396.80
I1b1a-B33.59249.1110.3498.28
I1b1a-C29.38245.1813.8987.91
I1b1a-D49.83259.523.22127.15
I1b2a*-Root312.46212.66586.3132.74
I1b2a*-Root216.44207.46526.1331.80
I1b2a*-Root18.01218.04510.9627.48
I1b2a1-Isles/Sc19.91243.25526.728.21
I1b2a1-Isles/E9.38222.43485.0225.22
I1b2a*-Cont131.65243.42527.7237.99
I1b2a*-Cont223.39235.15502.4146.26
I1b2a*-Cont329.8222.14553.0839.23
I1b2*-A26.1622.73684.49248.12
I1b2*-B30.4625.51686.47252.86
I1b*19.48255.49500.18138.28
I?39.6256.31721.59124.34
I??20.41254.26488.3115.89


The WSD calculation has made it clear which variety each Cullen result belongs to. The varieties are ordered by observed population. For example; there are three varieties of I1b2a*-Cont, numbered from one to three ( this last, I1b2a*-Cont3, was just recently added by Ken Nordtvedt ). I1b2a*-Cont1 is the most commonly observed, followed by Cont2 then by Cont3. In the above results, all of the Cullens scored closest to the most common variety and the results were verified also by hand.

Cullen C-9 and Cullen C-20 are clear cases of I1b2*-A and match on 35 out of 37 markers. Their pedigrees connect mid to late 1600's in Upton near Southwell, Nottinghamshire, England. For more information on this set of DNA results, please refer to Cullens of Upton Nottinghamshire DNA.

Cullen C-3 is an exception case here. With just 12 markers to work with, it has been very difficult to determine if C-3 is haplogroup "I" or haplogroup "G" and so these results are presented here based on the assumption that haplogroup "I" is the correct designation. Actually, on the same relative scale in the WSD table above, Cullen C-3 scores 6.54 for haplogroup G. However when we perform the same WSD analysis and check the results by hand using John McEwan's STR modal set for the entire world's Haplogroups, we find that Cullen C-3 does match closely those modal sets in Haplogroup I. His haplotype is especially fond of I1b2a* and I1b2a1 subclades. For the McEwan STR modal set, Haplogroup G scores 65.42 but there is one subclade of I that scores even closer. I1cSTR3 scores 54.85 for C-3's haplotype. The modal signature of I1cSTR3 was lined up against Ken Nordtvedt's Haplo-I subclade modals and WSD was run again for I1cSTR3. The result is that I1cSTR3 is very close to Ken Nordtvedt's I1b2a1-Isles/E subclade with a score of 11.73 while the next closest match would be I1b2a*-Cont3 with a score of 32.96 - It's still hard to shake the uncertainty due to only having 12 markers to work with but I'd say it's very likely now that C-3 is definitely some form of I1b2a with a definite preference for I1b2a1-Isles/E.

In order to obtain additional verification of C-3's subclade, a search was made on YSearch to find all entries that matched C-3's first twelve markers within a genetic distance of one. There were ten matches but five of them were 12-marker results and so were discarded. There were several results that were obviously related to one another and so were also discarded. This left one 25-marker result and one 37-marker result. Though they are of different surnames, they matched on the first twenty five markers and so are also likely related. Taking a look at the markers that were mismatched, I discarded the obvious DYS19 mismatch which is a slow marker indicating another subclade that C-3 is definitely not a member of. This left but one match and this haplotype was very obviously I1b2a1-Isles/E. The results that were discarded were tossups between I1b2a1-Isles/E and I1b2a* Root or Cont varieties. The results of this search indicate that C-3 should be I1b2a1-Isles/E. Regardless of what method we use to make our best determination of C-3's subclade, there will always be some uncertainty based on 12-marker results alone.

It was an easy task to determine that Cullen C-19 is a member of Ken Nordtvedt's I1b2a1-Isles/Scottish subclade. The immediate giveaway is his 12 repeats at DYS392, indicating I1b2a status. Currently, I1b2a is defined by a derived (+) state of M223 and has its origins in northwestern Europe. I1b2a and its branches were what was collectively known as 'I1c' in old nomenclature. Cullen C-19 also has 15 repeats at DYS393, 9 repeats at DYS459b, 24 repeats at DYS390, and 11 repeats at BOTH DYS464 a & b. This is a very unusual combination and a clear indicator of the I1b2a1-Isles/Scottish subclade. This suggests that the SNP status of C-19 would be derived for M284, or +M284. This SNP is located downstream of M223. The Isles variety of I1b2a is found almost exclusively in the British Isles with a heavy concentration in Scotland. Both known branches of the Isles variety of I1b2a originated on the continent with the +M284 originating probably in an area around southwestern Europe.


Y-Haplogroup 'I' SNP and Subclade Tree


The Y-Haplogroup 'I' Subclade Tree presented below is based on the work of ISOGG and Ken Nordtvedt. The Subclade Tree represents the branching of Haplogroup 'I' into its various subclades over tens of thousands of years, as defined by what are known as SNP's or Single Nucleotide Polymorphisms. Y-DNA STR 'markers' are defined by the number of repeats of simple 'words' in specific locations within the Y-DNA junk code. Every male has a given STR marker but the number of repeats may be different for different men and the number of repeats can change over generations. In contrast, the idea of the SNP is that it is a solitary typo; a mistake in one single letter of the genetic code - and it is regarded as a single mutational event in human history. Either you have the typo in your DNA or you do not. If you do not have the SNP then you are 'negative' or 'ancestral' for that SNP, meaning that you are not descended from the historic individual within whos DNA this typo originated. If you have the typo in your DNA the you are 'positive' or 'derived' for the SNP, meaning that you are without a doubt descended from the individual who originated the SNP thousands of years ago.

In the subclade tree, SNP's are indicated by small light-brown circles. The SNP codes such as P38, M26, or S23 are so named after the university or laboratory that discovered it. Imagine tracing through the tree from left to right; from where the tree begins at the left edge to one of the subclade endpoints on the right side. You will be 'positive' or 'derived' for every SNP on the path you took - and you will be 'negative' or 'ancestral' for every other SNP in the tree. In a like manner, if you pay a testing company to provide you with a battery of SNP tests, it is possible for you to trace your descendancy through the subclade tree from left to right, following the trail of 'positive' or 'derived' SNP's. In this way haplogroup or subclade status is determined absolutely. As an example, FTDNA recently determined that my haplogroup was 'I' by performing a test on my DNA. I am +P19 and +M170, two redundant SNP's that verified that I am in fact descended from the genetic 'Adam' of Haplogroup 'I'. STR data is also very defining - in some cases just as defining as SNP data - but SNP data is the proof required by genetic professionals.

Y-Haplogroup 'I' SNP and Subclade Tree based on the work of ISOGG and Dr Ken Nordtvedt


The subclade names in bold blue type indicate the new naming convention and shows the name assigned to all branches downstream of the corresponding SNP. For example, consider the SNP M26 ( within Old I1b ) which defines I1b1a. If you are derived ( positive ) for M26 then you are I1b1a-something. You may be I1b1a* or possibly I1b1a1, a subclade defined by M161 which is still a bit uncertain. So M26 indicates, in the new system, that you are some form of I1b1a.

The percentages in red indicate what proportions of Haplogroup 'I' are in each of the subclades. Haplogroup 'I' is comprised of about 16% of all lineages in the world today and ALL of Haplogroup 'I' is derived for SNP's P19, M170, and M258 - therefore a figure of 100% enters the tree at the left side or root of the Y-Haplo 'I' tree. The percentages of Haplo-I in Old I1a, I1b, and I1c are 71%, 8%, and 16% respectively. So the Old I1a clade outnumbers the rest of Haplo-I almost three to one. Of the two remaining clades - Old I1b and Old I1c - we can see that Old I1c outnumbers Old I1b two to one. There are two clades that are not in Old I1a, Old I1b, or Old I1c. First is the old I1*(y) cluster, now known as I1b*, at a frequency of 1.1%. At this frequency, when Haplo-I is 16% of all lineages, means that I1b* is about 0.17% of all the world's lineages. I1b* is easily identified as Haplo-I but is characterized by an unusual pattern of 10,10 at DYS459a,b. The I1*(x) cluster, now known as I1b2*, is found in about 3.8% of Haplo-I or in about 0.6% of the world's lineages. I* and I1* in the new system are two groups that have not been observed but are included in the Haplo-I tree as a 'technicality'.

Notice that there are shaded light-blue regions labelled 'Old I1a', 'Old I1b', and 'Old I1c'. These are outdated naming conventions that have since been changed due to the discovery of new SNP's. The discovery of new SNP's changes the names of the subclades but the status of all previously discovered SNP's remain the same. Since the SNP is a go/no-go test, once you are determined to be ancestral or derived, you will always be so. As an example, my own subclade used to be called I(x) because my subclade is negative or ancestral for S24 and M223, two redundant SNP's that determined Old I1c status. I(x) is -P37.2 and thus NOT in Old I1b. I(x) is also -P30, -P40, -M253, and -M307 and thus NOT in Old I1a. I(x) was called the X-cluster because we were in haplogroup 'I' but NOT in I1a, I1b, or in I1c. With the discovery of some upstream 'S' series of SNP's, I1b and I1c became united and I(x) was found to share +S23, +S30, +S32, and +S33 with the rest of Old I1c. Old I1b and Old I1c also share derived status on S31. I(x) was then formally ( and legally ) inducted into the new I1b2 hall of fame. I(x) is still a bit of an outcast since we do not share the +S24 or +M223 with the rest of our I1b2 brothers. When you test positive for a defining SNP but negative for all other downstream SNP's, you are rewarded with an asterisk at the end of your subclade name - we are thus now known as I1b2* and proud of it!

Though the Haplo-I tree is technically defined by SNP status, Y-STR data can be used to distinguish between the clades of Haplo-I and the rest of the world's Haplo trees. Inspecting the slower markers is an excellent way to identify your haplogroup and clade within that haplogroup. Old I1a is easily identified by their 8,9,8,11 modal pattern at DYS459a,b,455,454. My own subclade is I1b2* and is easily identified by the 8,10,10,12 modal at these same markers. Old I1b and Old I1c can be identified by the 11,13 modal pattern at DYS426,388 and an 8,9,11,11 or 8,10,11,11 modal pattern at DYS459a,b,455,454. To tell Old I1b apart from Old I1c, markers DYS393,392 come in handy. Old I1b is distinguished by its solid 13,11 modal pattern here, while Old I1c has a 14,12 or 15,12 at these same markers.

Here is the tree updated for 2008. The L38 defining I2b2 is the recently discovered SNP that defines the I(x) subclade. There are three other SNP's that have so far tested to be redundant to L38. These are: L39, L40, and L65. Testing is in progress at the FTDNA labs but it may be some time before the SNP's are available commercially and the FTDNA Y-SNP tree for Haplo-I is updated.

Y-Haplo I SNP and Subclade Tree 2008


Document in progress...



Use your Back Button or click here to go to the Homepage