Variance Analysis onSimulated Populations Jim Cullen |

Observed distribution of variance in 20,000 trial runs of the simulated phylogenetic tree. The Log-normal distribution, derived directly from the data statistics, is plotted in red. Note the skewed distribution typical of most simulation outputs. Inset graph shows the same data plotted according to the natural logarithm of the variance, resulting in a Normal or Gaussian symmetrical distribution - an expected characteristic of Log-normal data. |

Observed distribution of variance in 50,000 trial runs of the simulated phylogenetic tree. The derived Log-normal distribution is plotted in red. Inset graph is a higher resolution plot of the data points highlighted in turqoise in the larger plot. Note that the main distribution is composed of smaller distributions and possibly sub-distributions of these. These mini-founder events have distorted the main distribution by dragging probabilities further out on the tail, causing the derived Log-normal distribution to no longer match the data as expected. |

Take a look at the plot above. There are two black dots on the x-axis, marking the outer boundaries of the high-resolution plot you see here. To produce useful data at a bin size of 0.0001 the trials had to be increased to 100,000 runs in order to collect the 8,000 odd data points within the target window at an intended average frequency of 20 counts per bin. The results illustrate the chaotic nature of these variance distributions, especially at small scales, regardless of how smooth the data may appear to be on a larger scale. |

Hardware support for much of the work on this page includes: Microsoft QuickBasic v4.5 for data simulation, preparation, and some analysis; Texas Instruments' Voyage-200 PLT ( 12MHz M68000 CPU ) for some data simulation, algebraic manipulation, curve-fitting, and statistical analysis using Statistics / List Editor v1.03; Hewlett-Packard's HP50G+ ( 75MHz ARM CPU ) for some data simulation, algebraic operations involving advanced functions, and some statistical analysis; and OWBasic v3.5 on the Casio PV-S400Plus for cross-checking the data simulation random generator. The best stuff, as always, is worked out on pencil and paper ( CPU supplied by user ). |