The cicada Diceroprocta semicincta
(male). Size: 3/4 inch. Source.
The endosymbionts of insects provide an unending parade of novelty. They seem to knock on our door, one after the other, each purveying some new twist of microbial evolution. They break the rules, remodel the genetic code, and blur the line between organism and organelle. Here is one more that seems to particularly delight in defying convention.
The average GC content of bacterial genomes varies widely, from 17% to 75%. Moreover, the variation is not random, i.e., the percent GC correlates roughly with genome size. The larger the genome, the higher the GC content. This shows up most dramatically at the low end of the scale, among the insect endosymbionts with greatly reduced genomes. They exhibit the lowest GC percentage. One can't help but wonder why. The argument put forward to explain this has gone like so. All DNA is subject to continual alteration. Two of the common chemical changes are the spontaneous deamination of cytosine and the oxidation of guanosine by reactive oxygen species. Both of these reactions, if not repaired, change a GC to an AT pair. Endosymbionts with reduced genomes have fewer DNA repair genes. Ergo, over time their DNA would accumulate unrepaired GC to AT shifts, thus lowering the average GC% of their genomes.
This made a tidy story.
When Carsonella ruddii came along, it broke the old records. Weighing in with a genome of only 160 Kbp and but 182 protein-coding genes, it hardly seemed to have enough genes to sustain its own life. Its 16.5% GC content was also the lowest yet. Thus it still fit the pattern (see figure below). Now here comes another insect endosymbiont, this one from the Arizona cicada Diceroprocta semicincta and given the name Candidatus Hodgkinia cicadicola. Its genome is even smaller, a mere 143,795 bps. Its GC content? 58.4%. That puts Hodgkinia way out in left field on the graph.
Relationship between genome size and GC content for sequenced Bacterial and Archaeal
genomes ≤ 10 Mb. Red circles = obligate insect endosymbionts; dark blue = obligate
α-Proteobacteria endosymbionts; purple = red + blue = Hodgkinia; light blue = other
α-Proteobacteria; gray = other Bacteria and Archaea. Source.
As if that weren't enough, Hodgkinia has another oddity. It uses a modified form of the "universal" genetic code, a version that has turned up before in some mitochondrial lineages and Mycoplasma, all of which also exhibit genome reduction and low GC content. Here, UGA no longer functions as a stop codon but instead codes for tryptophan. This had been thought to be a consequence of the drop in GC% in these small genomes. When the G in a UGA codon is changed to an A, protein translation isn't affected because the new codon, UAA, is also a stop codon. In the course of time, more UGAs would change to UAAs until eventually, it was imagined, no UGA codons would remain. Then UGA was free to later on be delegated a new function, i.e., encoding tryptophan (otherwise encoded by UGG).
Instead, Hodgkinia tells us that it is genome reduction, not low GC%, that is driving this shift in its genetic code. One of the genes lost by Hodgkinia encodes the translational Release Factor TF2 that recognizes UGA as a stop codon and terminates the protein at that point. Hodgkinia gets by perfectly fine without it because it still encodes Release Factor TF1, the one that recognizes the other two stop codons. Mycoplasmas that use the same modified genetic code also lack TF2.
So, now back to the original question: why the reduced GC% for all endosymbionts so far except Hodgkinia? The answer seems not to be simply the loss of DNA repair enzymes. Other evidence has suggested that GC% is somehow related to variations in a particular subunit of DNA polymerase. Since Hodgkinia has but two genes encoding DNA polymerase subunits, it might serve as a simplified model system for investigating this. The researchers will be questioning Hodgkinia on this matter. Likely more papers will be forthcoming.
McCutcheon JP, McDonald BR, & Moran NA (2009). Origin of an alternative genetic code in the extremely small and GC-rich genome of a bacterial symbiont. PLoS genetics, 5 (7) PMID: 19609354