Extra-binomial variation approach for analysis of pooled DNA sequencing data.
Yang X., Todd JA., Clayton D., Wallace C.
MOTIVATION: The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. RESULTS: We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. AVAILABILITY: Package 'extraBinomial' is on http://cran.r-project.org/. CONTACT: chris.wallace@cimr.cam.ac.uk. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics Online.