Measuring the Temporal Structure in Serially-Sampled Phylogenies.
Gray RR., Pybus OG., Salemi M.
Nucleotide sequences sampled at different times (serially-sampled sequences) allow researchers to study the rate of evolutionary change and the demographic history of populations. Some phylogenies inferred from serially-sampled sequences are described as having strong 'temporal clustering', such that sequences from the same sampling time tend to to cluster together and to be the direct ancestors of sequences from the following sampling time. The degree to which phylogenies exhibit these properties is thought to reflect interesting biological processes, such as positive selection or deviation from the molecular clock hypothesis.Here we introduce the Temporal Clustering (TC) statistic, which is the first quantitative measure of the degree of topological 'temporal clustering' in a serially-sampled phylogeny. The TC statistic represents the expected deviation of an observed phylogeny from the null hypothesis of no temporal clustering, as a proportion of the range of possible values, and can therefore be compared among phylogeny of different sizes.We apply the TC statistic to a range of serially-sampled sequence datasets, which represent both rapidly-evolving viruses and ancient mitochondrial DNA. In addition, the TC statistic was calculated for phylogenies simulated under a neutral coalescent process.Our results indicate significant temporal clustering in many empirical datasets. However, we also find that such clustering is exhibited by trees simulated under a neutral coalescent process; hence the observation of significant 'temporal clustering' cannot unambiguously indicate that presence of strong positive selection in a population.Quantifying topological structure in this manner will provide new insights into the evolution of measurably evolving populations.