by Kostas Konstantinidis and Roberto Kolter
Fig. 1. A graphical representation of the species problem in bacteria. The phylogeny of a hypothetical collection of organisms that results into two clear groupings (a + b on the left) is shown. To designate such groupings as separate species (or genera, etc.), however, it is important to determine whether or not they include ecologically and phenotypically uniform organisms. For instance, when comparing evolutionary distantly related organisms, groupings will almost certainly be the outcome of the comparisons because of the long evolutionary time since divergence but they are not necessarily informative about the homogeneity of the organisms they encompass. Another important issue is how well the collection of the organisms analyzed represents the total natural diversity, that is, by saturating sampling of natural diversity previously recovered groupings might disappear (tree on the right). Prescreening of isolates, either due to the selectivity of the isolation method or the requirement to conform to specific 'species standards' in order for an isolate to be further processed, could make collections unrepresentative of the naturally occurring diversity and hence, the potential groupings recovered meaningless. These issues are relevant to all diversity surveys, particularly in cases where only a small fraction of the natural population has been sampled. Source
The second part of the question in the title of this post is relative easy to answer; the first part is much more challenging. Defining bacterial species is not only an important academic exercise but also has major practical consequences. For example, infectious disease diagnoses, regulations involving the transport of bacteria, and educating the public regarding bacteria that are beneficial to humans, animals or plants are all deeply-rooted in naming bacterial species. It is important to know if a patient is infected with Bacillus anthracis (the nasty pathogen that causes anthrax) or, instead, infected with the much less virulent yet close relative, Bacillus cereus. Having different names for these two species is key because the diseases they cause are very different and are treated very differently. We cannot afford to mistake B. anthracis for B. cereus. Thus, the long held practice in microbiology of naming species serves important functions. But do those named species really reflect biologically discrete species?
How might we be able to determine if bacterial species really exist? It turns out this is a particularly challenging question. Ideally, a species is a collection of individuals that are much more uniform among themselves, genetically and/or phenotypically, compared to other collections of individuals. For humans and higher eukaryotes, species are relatively easy to define morphologically. However, morphological traits are not useful as criteria to distinguish bacteria. Many distinct types of bacteria look the same under the microscope. Of course, these days there is a much better way to type bacteria: comparing whole genome sequences. This is a much more sensitive approach for defining similarities and differences among organisms in general since their entire phenotypic potential is encoded in that DNA sequence. Indeed, humans can be distinguished from their closest primate relatives, the chimpanzees, by ~1.5% difference in the average nucleotide identity (or ANI) of the genes they have in common, i.e., the genes are 98.5% identical at the sequence level, on average. Importantly, the genome sequences of humans do not vary anywhere as much among themselves and neither do those of chimpanzees. Thus, human and chimpanzee genomic sequences distribute into two discrete and well-separated populations. The distribution of the genome sequences can thus be used to define the species. Whether populations of bacteria distribute into similar discrete groups based on their genome sequences remains unclear.
The recent dramatic increases in the speed of DNA sequencing have resulted in several thousand complete bacterial genomes sequenced. When these genome sequences are compared, discrete groups of closely related genomes that might be reflective of species are not always apparent. Instead, bacterial genome sequence diversity often seems to form a continuum, with no clear species boundaries (Figure 1). For example, the available genomes of isolates of B. anthracis and B. cereus can, for the most part, be separated by 2 – 3% difference in ANI, similar to the human-chimpanzee pair. However, when we sequenced hundreds of environmental isolates, we obtained several genomes that showed intermediate levels of nucleotide sequence identity, falling somewhere in-between B. anthracis and B. cereus, blurring the species boundary. Several similar examples from other named bacterial species have recently been reported as well (Figure 2).
(Click image to enlarge)
Fig. 2. A real example of the species problem in bacteria. Where do we draw the line to define E. coli based on this phylogenetic tree? Note that the (named) E. coli clade becomes less distinct, i.e. there is no clear separation from its close relatives Escherichia albertii, Escherichia fergusonii and Salmonella, as more genomes were sequenced. Genome sequences became available by the studies indicated on the right. Source: K. Konstantinidis.
There is another confounding factor that makes defining bacterial species boundaries difficult: bacteria are very promiscuous in exchanging DNA. Such horizontal gene transfer can result from uptake of bare DNA (transformation), transfer of DNA from cell to cell using specialized machineries (conjugation) or the result of infection by bacterial viruses (transduction). The genes exchanged frequently confer key phenotypic traits such as antibiotic resistance or virulence (e.g. anthrax toxin). Thus, defining species based on important diagnostic phenotypes is often not reliable because these traits can be present in isolates with different species names.
So, how are bacterial species defined in practice? Due to the complications described above, there is currently no natural, widely accepted definition for bacterial species. Instead, species are defined based on some rather arbitrary thresholds for phenotypic and/or genetic similarity that attempt to capture as much as possible the phenotypes that are relevant for humans, such as causing a specific disease in humans or animals. A frequently used threshold is that strains of a species should show ≥95% ANI among themselves and <95% ANI when compared to strains from a different species. (A technique used before whole genome sequencing became widespread was DNA-DNA hybridization and the species cutoff was arbitrarily set at 70% hybridization; 70% DNA-DNA hybridization translates precisely to 95% ANI in genomic sequence data) By performing hundreds of comparisons during the last three decades, it was found that this threshold groups together strains that are sufficiently similar to each other phenotypically (e.g. causing the same disease in animals or carrying the same function for an ecosystem) to merit being named as a distinct species. Therefore, the current definition for species is operational, and, it should be noted that, it has served the scientific and clinical communities reasonably well. Nonetheless, this approach has two big problems. First, the threshold is clearly arbitrary. Why use 95% at the cutoff? Why not 96%, or 99% (as in the human vs. chimpanzee example) or even 90%? Second, applying the 95% ANI threshold to Primates will result in a single species that includes humans, chimpanzees and all monkeys. The 95% threshold may be too liberal. As a result, inaccuracies in infectious disease diagnosis and other misunderstandings and miscommunications are not rare. And there is another big problem. The current species definition does not answer a fundamental question: Do biologically relevant bacterial species exist?
(Click image to enlarge)
Fig. 3. Schematic of the metagenomic pipeline to identify sequence-discrete populations. Reads from metagenomic sequencing of microbial community DNA can be assembled into consensus genomic sequences of cells belonging to the same population. Contigs originating from the same population can be identified based on their sequence characteristics and then grouped into nearly closed draft population genomes (binning). When the original reads of the metagenome are mapped against the contigs of a reference population (recruitment analysis; bottom), it becomes apparent that each population is sequence-discrete compared to its co-occurring populations. In this hypothetical example, reads originating from members of the reference population (red) evenly match the assembled contigs that represent the population with high nucleotide sequence identities (>97%). In contrast, reads from other populations (other colors) match the reference contigs at lower sequence identities, forming a sequence discontinuity (“gap”) in the recruitment plot. Areas that deviate from this pattern are limited to highly conserved regions of the genome (e.g., rRNA operons), where reads from related but distinct populations are recruited due to their highly sequence identity to the reference sequences, or regions characterized by intrapopulation heterogeneity, which typically show lower coverage. Source
Perhaps in looking for bacterial species, we've been "barking up the wrong phylogenetic tree!" One key limitation in deriving the currently used thresholds for species is that the results were based on comparisons of strains that almost invariably were isolated from a variety of habitats and sources. For example, the available Escherichia coli strains used to compare their genomes (e.g., Figure 2) originated from diseased and healthy hosts as well as diverse environmental sources such as freshwater and soil. It is likely that different E. coli strains represent different ecological niches and therefore may contain genes that confer selective advantages for survival in the particular microenvironments from where they were isolated. Comparing such different organisms to derive the currently used species thresholds may have been a misguided approach that hid natural ecologically driven species boundaries.
The advent of metagenomics, i.e., high-throughput DNA (and RNA) sequencing in the absence of cultivation, offers the capacity to fully characterize natural bacterial populations. In this way the total genetic and functional diversity in a sample can be determined. The workflow involved in such an analysis is described in Figure 3. Such complete characterization of natural populations over time and space has already provided insights into how populations evolve and speciate and thus has helped advance the definition of bacterial species.
A very striking population pattern emerges from numerous metagenomic studies. Within natural microbial communities bacteria and archaea distribute into discrete populations with high genomic sequence relatedness. Typically these discrete populations share 95-100% ANI. Importantly, these populations are indeed discrete because they usually have <90% ANI when compared to other populations within the same community. We refer to these highly related populations as being "sequence-discrete". Interestingly the genotypes of members of a sequence-discrete population are typically more uniform in terms of evolutionary relatedness, functional gene content and gene expression patterns than the genotypes of members of species that have been identified and named by traditional microbiological criteria.
These sequence-discrete populations appear to be the closest we've come yet to defining ecologically and evolutionarily relevant bacterial and archaeal species. These results may constitute a paradigm shift since many scientists hold on to the idea that microbes do not form distinct species due to their extensive genetic exchange. The metagenomic findings are preliminary and certainly more datasets need to be analyzed for more robust conclusions to emerge. Nonetheless, these findings have brought a lot of excitement to microbiology because the discovery of sequence-discrete populations in natural microbial communities represents a foundation for future studies on how individual populations evolve and respond to environmental perturbations. Maybe bacteria do form distinct species after all and we just needed to change the "lens" we had been using to detect and track them over time and space. Now we may be barking up the right tree!
Recommended reading: Caro-Quintero A, and Konstantinidis KT. 2012. Bacterial species may exist, metagenomics reveal. Environ Microbiol, 14 (2), 347 − 355 PMID 22151572
Dr. Kostas Konstantinidis is an Associate Professor as Georgia Institute of Technology. He received his BS in Agriculture Sciences from the Aristotle University of Thessaloniki (Greece), his PhD in Microbial Ecology at Michigan State University working with James M. Tiedje and did post-doctoral work at MIT Edward DeLong. His research interests are at the interface of genomics and computational biology in the context of microbial ecology with the overarching goal to broaden understanding of the genetic and metabolic potential of the microbial world.
Comments