DeFries-Fulker analysis of twin data with skewed distributions: cautions and recommendations from a study of children's use of verb inflections.
Bishop DVM.
DeFries-Fulker (DF) analysis is an adaptation of multiple regression that is used to estimate heritability of extreme scores (h(g)2) on a dimension. Probands are identified as scoring below a cutoff that defines impairment, and one then uses regression to predict the scores of co-twins from the proband scores and a term that denotes the genetic relationship between twins (1.0 for MZ and 0.5 for DZ twins). This paper reports illustrative data and simulations for the situation where the dimensional variable shows substantial negative skew. Two types of simulation were conducted: in the first, an underlying polygenic liability dimension was normally distributed: skewing was introduced by transforming or truncating the liability distribution. In the second set of simulations, skewing arose because an infrequent defective gene impaired scores. In both sets of simulations DF analysis was robust in the face of severe skewing of the data. DF analysis can provide two pointers to major gene effects on extreme scores on a trait with a skewed distribution: first, group heritability estimates will be higher for the original skewed data than for normalised data; second, estimates of h(g)2 will increase as the cutoff to identify probands is made more stringent. Both these features were seen in data from a test of verb inflections given to 174 6-year-old twin pairs, suggesting that a single major gene may be implicated in causing impaired grammatical development.