Multiple reference genomes and transcriptomes for Arabidopsis thaliana.
Gan X., Stegle O., Behr J., Steffen JG., Drewe P., Hildebrand KL., Lyngsoe R., Schultheiss SJ., Osborne EJ., Sreedharan VT., Kahles A., Bohnert R., Jean G., Derwent P., Kersey P., Belfield EJ., Harberd NP., Kemen E., Toomajian C., Kover PX., Clark RM., Rätsch G., Mott R.
Genetic differences between Arabidopsis thaliana accessions underlie the plant's extensive phenotypic variation, and until now these have been interpreted largely in the context of the annotated reference accession Col-0. Here we report the sequencing, assembly and annotation of the genomes of 18 natural A. thaliana accessions, and their transcriptomes. When assessed on the basis of the reference annotation, one-third of protein-coding genes are predicted to be disrupted in at least one accession. However, re-annotation of each genome revealed that alternative gene models often restore coding potential. Gene expression in seedlings differed for nearly half of expressed genes and was frequently associated with cis variants within 5 kilobases, as were intron retention alternative splicing events. Sequence and expression variation is most pronounced in genes that respond to the biotic environment. Our data further promote evolutionary and functional studies in A. thaliana, especially the MAGIC genetic reference population descended from these accessions.