Non-independence in statistical tests for discrete cross-species data.
Graffen A., Ridley M.
The paper described three previously undetected effects, due to biases and non-independence, that can arise in statistical tests for associations between character states in cross-species data. One kind, which we call the family problem, is general to all known methods. In phytogenetic data, the ancestral character state from which changes occur, or below which variation is found, is likely to be the same for many regions of the tree. The family problem interacts with two kinds of non-independence that arise because of the methods of reconstruction of character states that existing tests use. Different kinds of non-independence arise in methods that reconstruct joint, or single, character states, respectively. Methods, like Ridley's (1983), that work with joint character states suffer from the problem that a character state cannot change to itself with parsimony. Other methods that work with single character states suffer from the problem that within a locally variable region of the tree it is more likely with null data that there will be two single changes in the two characters in separate branches than one double change in both; associations opposite to the locally ancestral state are therefore likely to be found in more than 50% of the variable regions. In real data sets, the family problem acts to spotlight the other kinds of bias: if the family problem is large the bias in tests due to the way they reconstruct characters will be large, whereas if it is small, the local biases tend to cancel and disappear in the aggregate.