Follow the above link or click the graphic below to visit the Homepage. |

Error Estimates ofASD Calculations Jim Cullen |

Results: Each at 50,000 Sets of Trial Runs Observing 100 Ant Steps | |||||
---|---|---|---|---|---|

ObservedTrial Runs | 5 | 10 | 20 | 40 | 80 |

Avg EstimatedVariance | 80.1536 | 90.1197 | 94.9507 | 97.4862 | 98.7468 |

Standard Errorof the Estimate | 56.5991 | 42.6275 | 31.0947 | 22.5946 | 16.1409 |

When calculating ASD for a haplotype population using N sample haplotypes, ASD is equal to twice the Sum of the Squared Differences of the marker values divided by (N-1). |

Second Run of the Ant Experiment Using Sample Statistics | |||||
---|---|---|---|---|---|

Results: Each at 50,000 Sets of Trial Runs Observing 100 Ant Steps | |||||

ObservedTrial Runs | 5 | 10 | 20 | 40 | 80 |

Avg EstimatedVariance | 99.6241 | 100.1339 | 99.9722 | 99.9861 | 99.9842 |

Standard Errorof the Estimate | 70.0660 | 47.3208 | 32.6990 | 23.1089 | 16.3377 |

(1) The individuals are all descendants of a single haplotype founder and their STR mutations all follow separate paths.

(2) The individuals trace their ancestry back to separate founders, each with identical founding haplotypes. Their STR mutations also all follow separate paths.

The key to the behavior of the Random Walk model as applied to STR mutation simulation is the fact that each member of the population has STR mutations that have followed a separate path from the founding haplotype. This is actually an advantage that can be used to simplify and speed up computer models since it does away with the necessity of having to recreate the entire family tree on computer. However, such a simplification can only be applied in the simplest of population models such as the one we will be working with. In the table below are the results of 10,000 computer simulations on both of the cases mentioned in the previous paragraph. Here, a population of 1024 individual haplotypes is examined in two cases: all having descended through ten generations from one founding haplotype (Simulated Population column) and; all having descended through ten generations from separate yet identical haplotype founders (Random Walk column). The 'Theoretical Calculation' column provides the actual statistical distribution based on the mathematical model of this experiment. In this example one marker with a mutation rate of 0.5 per generation is examined and it is assumed to have a founding, and purely hypothetical, value of zero. In ASD calculations, the actual values of the markers do not matter - it is the variance of those values that's important. The figures in the table record the average observed distribution of STR repeat values after 10 generations of mutation, averaged over the 10,000 runs of the simulation.

FinalValue | TheoreticalCalculation | SimulatedPopulation | RandomWalk | FinalValue | SimulatedPopulation | RandomWalk |

0 | 180.426 | 180.346 | 180.415 | |||

-1 | 164.023 | 163.792 | 164.092 | 1 | 163.975 | 163.991 |

-2 | 123.018 | 122.969 | 123.072 | 2 | 123.010 | 122.949 |

-3 | 75.7031 | 75.9337 | 75.7117 | 3 | 75.7203 | 75.7935 |

-4 | 37.8516 | 37.8337 | 37.8736 | 4 | 37.8846 | 37.8198 |

-5 | 15.1406 | 15.2353 | 15.0702 | 5 | 15.2055 | 15.0915 |

-6 | 4.7314 | 4.7649 | 4.7208 | 6 | 4.7592 | 4.7657 |

-7 | 1.1133 | 1.1403 | 1.1115 | 7 | 1.1108 | 1.1251 |

-8 | 0.1855 | 0.1871 | 0.1786 | 8 | 0.1957 | 0.1833 |

-9 | 0.0195 | 0.0210 | 0.0160 | 9 | 0.0186 | 0.0172 |

-10 | 0.00098 | 0.0003 | 0.0011 | 10 | 0.0060 | 0.0013 |

250 Generations; 70 Haplotypes; 1/140 Rate per Marker | ||||||
---|---|---|---|---|---|---|

#Trials | 5000 | 1500 | 1000 | 500 | 500 | 500 |

#Markers | 1 | 2 | 4 | 8 | 16 | 32 |

Age
Est | 250.4312 | 250.5489 | 250.6625 | 250.3277 | 250.6205 | 250.5670 |

Std
Dev | 46.1748 | 31.9205 | 21.1157 | 15.3092 | 10.4704 | 6.7402 |

Chi-Squared Thumb-rule Formula | |
---|---|

In an ASD calculation where,G _{e} = Estimated age of haplotype population in generationsM = Number of markers used in the calculation N = Number of haplotypes in the sample then E = the Expected Error in generations at the 68.269% confidence level and is given by: | |

E = |
G_{e} * Sqrt( 2 ) |

Sqrt( M ) * Sqrt( N - 1 ) | |

such that the 68.269% confidence interval is defined by the value of 'G _{e}' plus or minus the value calculated for 'E'. |

250 Generations; 70 Haplotypes; 1/140 Rate per Marker per Generation | ||||||
---|---|---|---|---|---|---|

#Trials | 5000 | 1500 | 1000 | 500 | 500 | 500 |

#Markers | 1 | 2 | 4 | 8 | 16 | 32 |

Age
Est | 250.4312 | 250.5489 | 250.6625 | 250.3277 | 250.6205 | 250.5670 |

Std
Dev | 46.1748 | 31.9205 | 21.1157 | 15.3092 | 10.4704 | 6.7402 |

CalculatedExpected Error | 42.6362 | 30.1484 | 21.3181 | 15.0742 | 10.6591 | 7.5371 |