STRIDE: Species Tree Root Inference From Gene Duplication Events
Emms D., Kelly S.
The correct interpretation of a phylogenetic tree is dependent on it being correctly rooted. A gene duplication event at the base of a clade of species is synapamorphic, and thus excludes the root of the species tree from that clade. We present STRIDE, a fast, effective, and outgroup-free method for species tree root inference from gene duplication events. STRIDE identifies sets of well-supported gene duplication events from cohorts of gene trees, and analyses these events to infer a probability distribution over an unrooted species tree for the location of the true root. We show that STRIDE infers the correct root of the species tree for a large range of simulated and real species sets. We demonstrate that the novel probability model implemented in STRIDE can accurately represent the ambiguity in species tree root assignment for datasets where information is limited. Furthermore, application of STRIDE to inference of the origin of the eukaryotic tree resulted in a root probability distribution that was consistent with, but unable to distinguish between, leading hypotheses for the origin of the eukaryotes. In summary, STRIDE is a fast, scalable, and effective method for species tree root inference from genome scale data.