A white spruce gene catalog for conifer genome analyses.
Rigault P., Boyle B., Lepage P., Cooke JEK., Bousquet J., MacKay JJ.
Several angiosperm plant genomes, including Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), poplar (Populus trichocarpa), and grapevine (Vitis vinifera), have been sequenced, but the lack of reference genomes in gymnosperm phyla reduces our understanding of plant evolution and restricts the potential impacts of genomics research. A gene catalog was developed for the conifer tree Picea glauca (white spruce) through large-scale expressed sequence tag sequencing and full-length cDNA sequencing to facilitate genome characterizations, comparative genomics, and gene mapping. The resource incorporates new and publicly available sequences into 27,720 cDNA clusters, 23,589 of which are represented by full-length insert cDNAs. Expressed sequence tags, mate-pair cDNA clone analysis, and custom sequencing were integrated through an iterative process to improve the accuracy of clustering outcomes. The entire catalog spans 30 Mb of unique transcribed sequence. We estimated that the P. glauca nuclear genome contains up to 32,520 transcribed genes owing to incomplete, partially sequenced, and unsampled transcripts and that its transcriptome could span up to 47 Mb. These estimates are in the same range as the Arabidopsis and rice transcriptomes. Next-generation methods confirmed and enhanced the catalog by providing deeper coverage for rare transcripts, by extending many incomplete clusters, and by augmenting the overall transcriptome coverage to 38 Mb of unique sequence. Genomic sample sequencing at 8.5% of the 19.8-Gb P. glauca genome identified 1,495 clusters representing highly repeated sequences among the cDNA clusters. With a conifer transcriptome in full view, functional and protein domain annotations clearly highlighted the divergences between conifers and angiosperms, likely reflecting their respective evolutionary paths.