Most brain disease-associated and eQTL haplotypes are not located within transcription factor DNase-seq footprints in brain.
Handel AE., Gallone G., Zameel Cader M., Ponting CP.
Dense genotyping approaches have revealed much about the genetic architecture both of gene expression and disease susceptibility. However, assigning causality to genetic variants associated with a transcriptomic or phenotypic trait presents a far greater challenge. The development of epigenomic resources by ENCODE, the Epigenomic Roadmap and others has led to strategies that seek to infer the likely functional variants underlying these genome-wide association signals. It is known, for example, that such variants tend to be located within areas of open chromatin, as detected by techniques such as DNase-seq and FAIRE-seq. We aimed to assess what proportion of variants associated with phenotypic or transcriptomic traits in the human brain are located within transcription factor binding sites. The bioinformatic tools, Wellington and HINT, were used to infer transcription factor footprints from existing DNase-seq data derived from central nervous system tissues with high spatial resolution. This dataset was then employed to assess the likely contribution of altered transcription factor binding to both expression quantitative trait loci (eQTL) and genome-wide association study (GWAS) signals. Surprisingly, we show that most haplotypes associated with GWAS or eQTL phenotypes are located outside of DNase-seq footprints. This could imply that DNase-seq footprinting is too insensitive an approach to identify a large proportion of true transcription factor binding sites. Importantly, this suggests that prioritising variants for genome engineering studies to establish causality will continue to be frustrated by an inability of footprinting to identify the causative variant within a haplotype.