Multi-locus sequence typing and the gene-by-gene approach to bacterial classification and analysis of population variation
Cody AJ., Bennett JS., Maiden MCJ.
© 2014 Elsevier Ltd. For nearly 30 years, 16S rRNA gene sequencing has been a fundamental tool for identification and cataloguing of bacterial diversity, but the diversity at this locus lacks the resolution to distinguish closely related bacteria. Multi-locus sequence typing (MLST) established the utility of a portable, gene-by-gene approach to population analyses, using both allelic and nucleotide sequence data that catalogue variation at seven housekeeping loci; however, it did not provide sufficient discrimination to define all variants of all bacteria. Recent advances in high-throughput next-generation sequencing technologies have permitted whole-genome sequencing of a wide variety of bacterial species and facilitated the development of genome-wide expanded MLST schemes. This chapter describes a flexible, scalable and hierarchical gene-by-gene approach to bacterial classification and population analyses, based on the concept of seven-locus MLST. Furthermore, the approach is both backwards and forwards compatible since 16S rRNA and seven-locus MLST information can be extracted and compared with original Sanger sequence data and the databases employed can accommodate sequence data from any source. The gene-by-gene approach is detailed in a variety of analyses: (i) speciation of Neisseria using nucleotide sequence from a single ribosomal protein locus (rplF); (ii) utilisation of variation at the 53 ribosomal protein gene loci to accurately and unambiguously identify and differentiate among bacterial species; (iii) identification of core genes and their comparison within a bacterial population; and (iv) the use of whole-genome population analysis for high-resolution studies, for example, the identification of potential disease outbreaks.