by Christoph
When I was drafting part 1 of the weevil–Nardonella story, the symbiosis aspect, I could hardly wait to take a deeper look at the endosymbiont genomes, literally 'reading' the annotated genomes. Here now I give you a summary of what I read. Yet, do not take this as peer-reviewed scientific work, it's just a finger exercise..
Some more Nardonella genomics...
Figure 2 in the Anbutsu et al. paper (Figure 1.2 in part 1, or here), is so stuffed with information about the weevils' Nardonella endosymbiont genomes that I better tear it apart a little to show you that it is actually possible to observe genome reduction 'at work'. As said in part 1, the genomes are tiny (0.2‒0.23 Mb) and contain ~200 open reading frames. Earlier, Moran and Mira had partially reconstructed the ~4 Mb genome of a hypothetical enterobacterial ancestor of the small Buchnera genomes (0.45‒0.65 Mb) by what could be aptly called 'in silico molecular archaeology'. A similar process of losing more than 90% of the genome of an enterobacterial ancestor was ‒ and probably still is ‒ underway in Nardonella. Figure 2.1 shows that today the four genomes are still almost completely syntenic, that is, have much the same gene order that is only interrupted by very few indels (=insertions/deletions), inversions and transpositions (see legend to Fig. 2.1). This high degree of synteny makes it intuitively clear that the four Nardonellas are isolates ‒ or if you wish: strains ‒ of one single bacterial species irrespective of how you prefer to define a 'bacterial species' (see our post on 'bacterial species'). If you then consider that these Nardonella strains co-evolved with their present weevil hosts, that is, 'survived' the speciation of the latter from their common ancestor, it is mind-blowing how stable these genomes are over millions of years of constant 'editing' despite ongoing genome reduction.
(Click to enlarge)
Figure 2.1. Dot plot comparison of four Nardonella sp. genomes. The DNA sequences of Nardonella isolates NARSGI1 [AP018162.1], NAREPO1 [AP018159.1], NARRFE1 [AP018161.1], and NARPIN1 [AP018160.1] were "blasted" (NCBI BLASTn) against the Nardonella sp. NARSGI1 sequence ("discontiguous megablast" option). Self‑comparison (upper left) results in a straight diagonal while offset diagonal stretches indicate duplications. In the hetero-comparisons, inverted diagonals indicate inversions and horizontally offset diagonals/inverted diagonals indicate transpositions of homologous regions. The direction of transcription of rRNA operons is indicated by red arrowheads; positions of rRNA operons are indicated by red dotted lines (± 1 kb). Author's own work. Frontispiece: transmission electron microscopic image of Nardonella cells in the larval bacteriocyte. Source
The ten-or-so gaps in the NARGSI1 (0.23 Mb) × NARFRE1 (0.2 Mb) dotplot in Figure 2.1 add up, roughly, to the ~30 kb difference in chromosome size of both isolates. Micro-rearrangements (<1 kb) are not visible at the low resolution of the dotplot but it is apparent that an inversion of ~83% of the chromosome occurred in NARFRE1 (Fig. 2.1, NARSGI1 × NARFRE1). This inversion took place between the two rRNA operons in NARSGI1 that are transcribed in opposite direction to each other, and was followed by loss of one rRNA operon in NARFRE1, the smaller genome of these two (red arrows/dotted lines in Figure 2.1). Genomic inversions between rRNA operons are not that uncommon, a well-known case is the inversion in the E. coli lab strains MG1655 and W3110 (see here). The same recombination has occurred in the pair NARPIN1 and NAREPO1 that appear as mirror images when projected onto NARSGI1: after inversion of ~88% of the genome between the two oppositely oriented rRNA operons in NARPIN1 one rRNA operon was lost in NAREPO1, the smaller genome (see here). Already from such a 'quick 'n dirty' analysis of genome dotplots it is possible to infer that the ancestor of these four Nardonella isolates had at least two rRNA operons (free-living enterobacteria usually have ~5), while genomic inversions in two separate lineages with subsequent loss of one rRNA operon ‒ in both cases 'the other one' ‒ led to the isolates studied by Anbutsu et al.. This tentatively inferred branching order is also seen in a monogenic phylogenetic tree based on a single protein (see Figure 2.3 below). It would be tricky but in principle possible to resolve the limited number of smaller inversions and transpositions along the evolutionary trajectory of the Nardonella quartet. Yet I stop right here because untangling these inversions/transpositions is beyond the scope of a blog post.
Figure 2.2. Codon frequency for selected amino acids in the GyrB proteins of E. coli and the four Nardonella isolates (grey background highlights significant differences). Protein accession numbers are given in the legend to Fig. 2.3. Author's own work.
But there is one more thing (to quote Peter Falk's 'Columbo'). The extraordinary AT‑richness of the Nardonella genomes is, of course, reflected in the narrowly spaced coding regions of their ~200 genes. An aside, a very recent paper in PNAS reports on 15.5 kb small plastid genomes of Balanophora plants with 88% AT, and up to 98% AT in one of their 19 ORFs, which apparently requires some tweaking of the genetic code to obtain useful proteins. In Nardonella and other endosymbionts with reduced and AT-rich genomes, AT-rich codons should thus be preferred over GC-rich codons as pointed out by Nancy Moran earlier. This is actually the case as you can see in Figure 2.2 where I took the evolutionary well conserved GyrB protein (beta subunit of toposiomerase II) as example. Note that the strongest tendency towards AT-rich codons is found in NARFRE1 GyrB, that is, in the Nardonella genome with the highest AT-content (84.7%). It's surprising ‒ if not outright shocking ‒ that the amino acid composition resulting from such a strongly skewed codon choice does not completely mess with protein identity. Yet, the GyrB proteins of the Nardonella isolates and E. coli still share 40‒50% identity (=identical amino acids at corresponding positions). Also striking, the four Nardonella GyrB proteins share only 50‒60% identity among each other, while identity exceeds 80% among GyrB proteins from different species of the Enterobacterales, as expected for homologous proteins of closely related species.
(Click to enlarge)
Figure 2.3. Radial phylogenetic tree for selected GyrB proteins (topoisomerase II beta subunit, ~800 res.) from selected Enterobacterales and the four Nardonella sp. (E. coli MG1655 [YP_026241.1], E. cloacae ECNIH3 [AIN20788.1], Pantoea sp. At-9b [WP_013507203.1], D. chrysanthemi Ech1591 [ACT04898.1], S. marcescens [ALE98575.1], Y. pestis CO92 [CAL22663.1], E. tarda EIB202 [ACY82851.1], P. laumondii [CAE12299.1], P. stuartii MRSN 2154 [AFH93954.1], S. glossinidius [WP_011409869.1], S. pierantonius [WP_025246980.1], Sodalis endosymbiont of H. halophilus [WP_097032598.1], G. endobia [WP_067498040.1], P. carbekii US [WP_022564019.1], Nardonella sp. NARRFE1 [BBA84965.1], Nardonella sp. NARSGI1 [BBA85161.1], Nardonella sp. NARPIN1 [BBA84965.1], Nardonella sp. NAREPO1 [BBA84543.1]). Several small gaps from their ClustalW alignment were removed manually. The maximum-likelihood tree (neighbor joining, Jukes-Cantor distances, 1,000 replicates) was drawn using CLC sequence viewer software. Scale bar unit: expected changes per amino acid per position. Author's own work
When these numbers are visualized, it becomes immediately clear that single-protein phylogenies are, well, useless for determining the phylogenetic position of Nardonella within the Enterobacterales (Figure 2.3, with representatives of 6 of the 8 families in this taxonomic order). Also, concatenated (conserved) proteins do not easily yield more robust phylogenies. And phylogenies based on nucleotide sequences, including 16S rRNA phylogenies, tend to fall short when genomes with greatly varying AT‑contents are included (a well-known phenomenon termed long branch attraction). This is a methodological impasse that has also not yet been convincingly tackled for other endosymbionts with reduced and highly AT-rich genomes, say Buchnera aphidicola (~74% AT). Some progress was made in recent years on gene synteny-based phylogenies (see here, for example, and here) but there is no commonly accepted approach on how to reliably rate various synteny features to obtain distance matrices. You see in Figure 2.3 that GyrB of NAREPO1 and NARPIN1 branch off together as do NARSGI1 and NARRFE1 GyrB. You can deduct from Figure 2.1 that the genomes of these 'pairs' are also more closely related to each other than to those of the other pair because they have less synteny breaks by inversions or indels. So, for phylogeny, synteny analysis is a straightforward qualitative assessment but it does not immediately suggest a feasible quantitative approach.
A critical note
Anbutsu et al. state in their paper that "...the tiny genomes encode minimal but complete gene sets for bacterial replication, transcription, and translation." That's clearly not the case for the replication genes if one literally 'reads' these genomes carefully. Since Nardonella is a member of the 'endosymbiont branches' of the Enterobacterales, such a "...minimal but complete gene set for bacterial replication" should be a close match to what is known for E. coli. However...
- The four Nardonella genomes should have genes for the replication initiator protein, DnaA, which they all lack. In fact, the dnaA genes appear as if 'punched out' of the well conserved context rnpA·rpmH·dnaA·dnaN·gyrB (well conserved in the Gammaproteobacteria, and completely conserved in the Enterobacterales). At first sight, this appears as a surgically precise 'genome reduction' in Nardonella but deletions extending into any of the neighboring genes would probably have left them nonviable (dnaN and gyrB encode essential replication factors, rpmH encodes L34, an indispensable ribosomal protein, and rnpA encodes the protein moiety of the essential RNase P).
- They have genes encoding a topoisomerase II (gyrA, gyrB), a DnaG-type primase and a DnaB-type replicative helicase, but they lack genes for either of the two known helicase loaders, DnaC or DciA (the 'textbook helicase loader' DnaC was probably acquired via HGT from a phage by a number of Enterobacterales and replaced the ancestral helicase loader DciA, which has a much wider distribution in the Gammaproteobacteria).
- Their dnaX genes allow for synthesis of the gamma subunit of DNA polymerase III holoenzyme but are too short to encode in addition for the tau subunit (an evolutionary conserved programmed frameshift triggers the balanced synthesis of gamma and tau from the same transcript of the 'long form' dnaX gene). Both gamma and tau subunits are part of the gamma complex that tethers two DNA Pol III alpha subunits together for coupled leading- and lagging strand synthesis.
- Almost appearing as a reflection of the missing tau subunit, the Nardonella dnaE genes encoding the DNA Pol III alpha subunit, the actual polymerase, are unusually short and drive the synthesis of a C‑terminally 'cropped' alpha subunit that lacks the domain responsible for interaction with the tau subunit.
- The 'deficiency' of the gamma complex goes on: holA genes encoding the delta subunit (δ) are missing in all four Nardonella genomes, and only one isolate, NARPIN1, has a holB gene for the delta prime (δ') subunit. The delta, delta prime, and gamma/tau subunits are not only forming the 'gamma complex' but also the 'clamp loader' (γ3δδ') responsible for loading of the beta subunit dimer (DnaN) onto DNA.
- The Nardonella genomes have dnaQ genes encoding the Pol III epsilon subunit responsible for proofreading during replication but they lack holE genes encoding the theta subunit that stimulates proofreading.
- Lastly, all four Nardonella genomes have dnaN genes but it is not known whether gamma alone can perform clamp loading.
Taken together, this reduced set of replication genes in the Nardonellas leaves the question wide open of how exactly they manage the replication of their small chromosomes. It could be well worth studying whether a diminished proofreading efficiency ‒ think of the missing theta subunit ‒ contributes to the strong shift towards AT-richness in the Nardonella genomes. But to make it clear, this criticism does not call the main findings of the study by Anbutsu et al. into question.
Closing remark
When you sometimes experience a sudden "inordinate fondness for beetles" (attributed to J.B.S. Haldane) you may enjoy the impressive picture collections here or here. A fair warning though: there are more than 400,000 described species of beetles, among them ~70,000 weevil species (Curculionidae). In case you get eventually tired of looking at beetle portraits, take a deep breath and consider the many symbioses between bacteria and beetles that still wait to be discovered! Surprises guaranteed, take my word.
Comments