by Merry Youle
Hic sunt dracones
Figure 1. The marine macrobial dark matter. Hic sunt dracones. Source.
We live in a world run by microbes, the vast majority of which we have yet to identify or name. We can only refer to them collectively as the microbial dark matter (MDM). However you define a prokaryotic species, and however you tally them once identified, there is a huge gap between the 12,000 or so validly-named species and the total number on our planet, currently estimated to be in the millions. The only evidence we have for the existence of that uncultured mob is either a small subunit ribosomal RNA (SSU rRNA) sequence or some hazily-classifiable metagenomic reads. As the speed of sequencing goes up and the cost goes down, this sort of evidence accrues ever more rapidly, further widening the gap. The challenge at hand is to find out more about the organisms that make up that dark matter.
Our Past Bias
What do we know so far? By building trees from SSU rRNA sequences, we now know that we share the planet with at least 60 prokaryote phyla. Half of these are dubbed candidate phyla and thus they will remain, by decree, until someone cultivates one of their members. The other half, the phyla in good standing, have been sampled in a highly biased manner. More than 88% of microbial isolates are from only four bacterial phyla (Proteobacteria, Firmicutes, Actinobacteria, and Bacteroidetes). Cultured members of the other phyla are getting more attention now, deservedly, but this sheds no light on the as yet uncultured MDM.
Figure 2. Statistical information from the Genomes OnLine Database (GOLD) as of September 2011. Phylogenetic distribution of the 8,448 bacterial genome projects. It is time to cut the pie differently. Source.
One by One
A recent paper by Rinke and colleagues reports on work probing this MDM in a painstaking way, sequencing one cell at a time. They do not denigrate the value of the community-level insights afforded by metagenomics; likewise they acknowledge that deep sequencing in low diversity environments has sometimes yielded complete genome sequences of the more abundant species. However, they put their effort into a project demonstrating the productivity of a complementary approach: single cell genomics.
In this study they sampled nine sites representing marine and freshwater environments, hydrothermal vents, sediment, and a terephthalate-degrading bioreactor. They collected individual cells without an intervening culturing step by single-cell flow sorting, then amplified their genomes and screened them based on their SSU rRNA genes. This netted them a collection of 201 genomes of the more abundant representatives of 21 bacterial and eight archaeal lineages. Sequencing yielded 201 draft genomes, 40% complete on average. For comparison, as of September 2011, a total of 2907 microbial genome projects had been completed. Although the single cell genomes assembled in this study are not complete, they nevertheless make a significant jump in the genomic totals as well as the diversity explored.
Figure 3. A historical tally of completed microbial genome projects recorded in the Genomes OnLine Database (GOLD). Source.
The Rewards ?
What insights were gained? With the first detailed genomic data on many of the candidate phyla now in hand, the researchers could confirm their location within the Tree of Life and their relationships to other phyla. For example, a new phylum was added to the Planctomycetes–Verrucomicrobia–Chlamydiae (PVC) superphylum. Similarly, they proposed a new superphylum to accommodate the Nanoarchaeota (featured on this blog here and here) and four candidate archaeal phyla that also have small genomes and very small cell size.
Genes of known function provided a few glimpses of the metabolic capabilities associated with various novel lineages: hydrogen metabolism is widespread; genes for carbon fixation are common among the Archaea. In numerous instances genes or pathways previously associated with only one domain were found to be present in a novel lineage of another domain. First prize here goes to a Nanoarchaeon with a gene from the slime mold Dictyostelium, the first known instance of horizontal gene transfer from a eukarytote to an Archaeon. The list of bacterial genes acquired by Archaea has also lengthened to include complete bacterial sigma factors and multi-domain alarmones. Although the Archaeota don't make peptidoglycan, two Nanoarchaeota are apparently making some use of a bacterial lytic murein transglycosylase, perhaps as a weapon or to facilitate friendly cell-cell interactions with bacteria. The list goes on and on.
Including these genomes in their analyses of 893 publicly available metagenomes provided new or improved classification of 340 million reads. Is 340 million a lot or a little? It is less than a percent on average, more than 2% of the total in 19 metagenomes, and up to 20% in metagenomes from the same environments as were sampled in this study.
The gains in sequencing methodologies come with a cost. It is one thing to accumulate sequence data, quite another to reveal the activities of the microbial world. The bottleneck is shifting from data acquisition to analysis. In response has come the development of 'clustering' approaches that allow computer analyses within a reasonable amount of time. Extensive metadata about sampled environments adds complexity to the storage and interpretation of metagenomic data. Meanwhile, a growing number of conserved hypothetical proteins of unknown function await genetic and physiological investigation.
Syntrophy
Metagenomics, genomics, culturing—each of these feeds the others. Single cell genomes enable phylogenetic classification of previously unclassified metagenomic reads. Metagenomics reveals population structure, diversity, and the metabolic capabilities of the community as a whole. Culturing is a prerequisite for assigning specific functions to the gene sequences generated by the other two. Much remains to be done on all fronts. The majority of the microbial metagenomic reads still can't be classified beyond the domain level. Key community players may be overlooked because they have not been cultured. And the microbial dark matter still offers a vast terra incognita beckoning to eager explorers.
Reference
Rinke C, Schwientek P, Sczyrba A, Ivanova NN, Anderson IJ, Cheng JF, Darling A, Malfatti S, Swan BK, Gies EA, Dodsworth JA, Hedlund BP, Tsiamis G, Sievert SM, Liu WT, Eisen JA, Hallam SJ, Kyrpides NC, Stepanauskas R, Rubin EM, Hugenholtz P, Woyke T (2013). Insights into the phylogeny and coding potential of microbial dark matter. Nature, 499 (7459), 431−437. PMID 23851394
Great article -- really fascinating, especially the prevalence of shared genes among seemingly distant relatives. I'm curious -- is direction of transfer easily inferred? e.g. from Dictyostelium to the Nanoarchaeon?
Merry replies: Easily? Maybe. Sometimes. If the transfer is recent, the transferred genes will still display the codon usage and GC pattern of the source genome which could be distinctly different from that in the recipient. Another tactic would be to look for the genes in relatives of the putative source and recipient organisms. One would predict that the transferred genes would be widely distributed through the source genus or family or even a higher taxonomic grouping, but limited to one recipient and its close relatives. There likely are other ways one could use....but those two come to mind first.
Posted by: Hollis | November 26, 2013 at 10:04 AM