by Christoph
One of the most convincing arguments for a common origin of all known life forms on this planet is that they all use the same genetic code. In my understanding, the argument is strengthened, perhaps somewhat counter-intuitively, by the numerous findings that many life forms modify the canonical genetic code in ways that suit them, and display astonishing creativity in doing so. One recent example of how the genetic code can be tinkered with made it onto the cover of the January 26 issue of Nature (Figure 1).
(click to enlarge)
Figure 1. Nature Cover from Jan. 23, Vol. 613, Issue 7945. Frontispiece: Blastocrithidia nonstop. F, flagellum; P, promastigote stage; C, cyst-like straphanger stage (length ~3 µm). Photo Jan Votýpka. Source
In a screen for trypanosomatid protists − a family of plant, insect, and vertebrate parasites of clinical and economic concern that includes Trypanosoma cruzi, the causative agent of chagas disease − Záhonová et al. (2023) isolated, cultivated, and sequenced Blastocrithidia nonstop (see frontispiece). In the words of Kachale et al. (2023), authors of a follow-up study, Trypanosomatidae "are known for a wide range of oddities," and a particularly striking "oddity" had caught Elio's attention earlier, in Pictures Considered #44 : The Utter Magic of kDNA. Briefly, kinetoplast DNA (kDNA) of T. brucei is the mitochondrial DNA of this protist, which is actually a tangled DNA meshwork of a few maxi‑circles and hundreds of mini‑circles. From a review by Jensen & Eglund (2005): "Maxicircle transcripts in T. brucei are heavily edited, an amazing reaction in which uridylate residues are incorporated into or removed from specific internal sites within the transcript to create an open reading frame. Minicircles encode most of the small guide RNAs that are templates for editing specificity." If you're thinking of CRISPR now, you're not wrong.
Blastocrithidia nonstop had another "oddity" in store for the researchers. Záhonová et al. (2023) found that the open reading frames of its genes are virtually awash with in-frame stop codons (Figure 2). Out of 7,259 predicted protein-coding genes only 228 lack any in-frame UGA, UAA, or UAG stop codons. These are mainly genes that are highly expressed and encode cytosolic ribosomal proteins, translation factors, histones, mitochondrial electron transport chain subunits, and a few others. From the comparison with homologous genes of other trypanosomatids, the authors concluded that UGA has been reassigned to encode tryptophan (Trp, W), while UAG and UAA (UAR) encode glutamate (Glu, E). Strikingly, UAA and, less frequently, UAG also seem to serve as bona fide stop codons. Because periods within words are so unusual, here is a "smell test":
I .ould b. surpris.d if you could r.ad this t.xt h.r. .ithout assistanc., but th. ribosom.s of Blastocrithidia nonstop manag. to translat. such "dott.d" mRNAs .ffortl.ssly. Here's how they achieve this, in some more detail than explained in Figure 1.
UAA/UAG re‑assignment The nuclear genome of B. nonstop contains 70 tRNA genes, including tRNAGluCUA and tRNAGluUUA cognate to both UAR stop codons (R=G or A). A thorough phylogenetic analysis by Kachale et al. (2023) supports an origin of these stop-codon-recognizing tRNAs from standard tRNAsGlu, and experiments confirmed the expression and charging of both tRNAGluCUA and tRNAGluUUA in B. nonstop and their absence in the trypanosomatid model species T. brucei by Northern blot analysis.
UAA is the prefered stop codon Thus far, these authors have no real clue how translational termination at UAA stop codons at the 3'‑ends of open reading frames (ORFs) can overcome readthrough by tRNAGluUUA. However, and unlike in other trypanosomatids, ORFs in B. nonstop exhibit a significant enrichment of A in the coding strand for the first ~40 bases downstream of a stop codon (followed by a T-stretch). This indicates a possible involvement of poly(A)-binding protein (PABP), known to interact with A+U-rich sequences in other species.
Figure 2. Scheme illustrating possible consequences of AT mutational shift in Blastocrithidia ancestor. Frequent GC-to-AT substitutions in Blastocrithidia ancestor could have led to the following consequences that we observe in B. nonstop genome: overall AT-rich genome; extreme AT-richness of intergenic regions with frequent TAA codons; appearance of in-frame stop codons. Possible UAG-to-UAA and UGA-to-UAA substitutions are evolutionary neutral, but allowed TGG-to-TGA, GAG-to-TAG, and GAA-to-TAA substitutions. Source
UGA re‑assignment A tRNA cognate to UGA encoding tryptophan is missing from the B. nonstop genome, and Kachale et al. (2023) found that the tRNATrpCCA does not undergo a CCA-to-UCA anticodon editing in the cytosol as it does in other trypanosomatids. Secondary structure predictions suggested that the acceptor stem (AS) of B. nonstop tRNATrpCCA is only 4 bp long, whereas the closely related T. brucei and other trypanosomatids possess a 5-bp-long AS of the canonical tRNA length (see Figure 3). A battery of in vitro and in vivo tests in the cognate and in heterologous systems confirmed that the "shorter" tRNATrpCCA does in fact lead to a significant increase in readthrough (=Trp incorporation) at in‑frame UGA codons.
The re‑assignment of three codons, in the case of the UGA re‑assignment in combination with a modified tRNA structure, does not seem to be sufficient for Blastocrithidia nonstop to evolve and then seamlessly cope with its "updated code."
First, Záhonová et al. (2023) had found in the preceding study an unusual Ser74Gly substitution at a highly conserved position in the translational termination factor eRF1. When Kachale et al. (2023) introduced a similar Ser67Ala substitution into the yeast or human eRF1 homologs and expressed them in S. cerevisiae, they found a significantly increased readthrough at UGA but not at all at UAA/UAG codons. This demonstrates that Ser67Ala specifically restricts UGA decoding as stop codon in vivo, which is clearly a desirable effect in B. nonstop.
Second, in eukaryotes, the widespread nonsense-mediated decay pathway (NMD) is responsible for the degradation of mRNAs with premature stop codons as they are found throughout open reading frames in B. nonstop (Figure 2). An aside: bacteria have solved the related issue of degrading mRNAs lacking a stop codon differently, by trans-translation (see here in STC). One branch of the Trypanosomatidae has lost early in evolution Upf1 and Upf2, the key components of the NMD pathway, and Kachale et al. (2023) assume that this might have been one of the prerequisites of stop codon re‑assignment in B. nonstop.
Figure 3. (A) Modified nucleosides, tRNA structure and the strength of codon–anticodon interaction influence on codon decoding accuracy. tRNA decoding fidelity is mainly modulated by the strength of the interaction between anticodon and codon bases, tRNA abundance, modified nucleosides, and tRNA structure. At the third codon position (first anticodon position), there is a decoding flexibility that enables a single tRNA species to decode more than one codon. This means that the 61 sense codons of the genetic code can be decoded by <61 different tRNA species. This decoding flexibility (wobble rule) is modulated by the nature of the first base of the anticodon, particularly its modification and also modification of other bases in the anticodon loop. Modification of base 37 has a strong influence on the fidelity of decoding because it modulates the interaction of the third base of the anticodon with the first codon base. The overall structure of the tRNA and, in particular, the structure of the anticodon stem, also play an important role in maintaining decoding accuracy. Source. (B) tRNA-Phe from yeast showing modified bases in blue m2G: 2-methyl-guanosine; D: 5,6-Dihydrouridine; m22G: N2-dimethylguanosine; Cm: O2'-methyl-cytdine; Gm: O2'-methyl-guanosine; T: 5-Methyluridine (Ribothymidine); Y: wybutosine (Y-base); Ψ: pseudouridine; m5C: 5-methyl-cytidine; m7G: 7-methyl-guanosine; m1A: 1-methyl-adenosine. CC BY-SA 3.0 Yikrazuul
A bit of history The long line of studies of "alternative genetic codes" began in the late 1970s when it became known that the genetic code of human and yeast mitochondria deviates from the canonical genetic code. Santos et al. (2004) said in their review 20 years ago: "These studies have revealed that the genetic code is still evolving despite strong negative forces working against the fixation of mutations that result in codon reassignment. Recent data from in vitro, in vivo and in silico comparative genomics studies are revealing significant, previously overlooked links between modified nucleosides in tRNAs, genetic code ambiguity, genome base composition, codon usage and codon reassignment." And they predicted that "...the study of genetic code variation will probably provide important new insights into how mRNA decoding fidelity (controlled by the translational machinery) shapes genome base composition, codon usage and reassignment, and will ultimately reveal novel molecular mechanisms that link the environment to the evolution of genomes." Variations of the genetic code that involve re‑assignments of one or up to all three stop codons are known from dinoflagellates like Amoebophrya sp., and ciliates like Blepharisma sp., Parduczia sp., and Condylostoma magnum. The study by Kachale et al. (2023) has now added another piece to this great puzzle (see Figure 3 for details).
Before I drift into the metaphysical, here is a practical example (a little exercise, if you like). Suppose you stumble upon the DNA sequence of the rpoA gene encoding 'RNA Polymerase subunit alpha' of Serratia symbiotica CWBI-2.3, and you ask yourself how similar the RpoA proteins of this Serratia and the E. coli K‑12 reference strain MG1655 may be. Because, in general and without going into the details here, it makes more sense to calculate similarity/homology on the protein level than on the DNA sequence level. You find the Serratia RpoA protein sequence in the NCBI protein database with the accession number QLH63909.1. Before you now BLAST the two proteins against each other − and you will come up with "98% identity," no surprise as Serratia and E. coli are first cousins in the order Enterobacterales − you would read the "FEATURES" section in the header of the file QLH63909.1 and find therein important information:
1. the line /coded_by="complement(CP050855.1:2924630..2925619)" tells you that the protein sequence was translated from the DNA sequence and not determined by protein sequencing. This is the rule in genome sequencing projects almost without exception. A more recent exception I'm aware of is Su'etsugu et al. (2008), who determined the correct translational initiation site of the E. coli hda gene at the rare start codon CUG by protein sequencing.
2. the line /transl_table=11 tells you that a specific NCBI codon usage table was applied, click on the "11"; that's the one for Enterobacterales. The same codon usage table was used for the RpoA sequence of E. coli, I checked that for you. To see, for example, the NCBI codon usage table for Blastocrithidia, scroll down to "31". (or click "31" here, for convenience).
So, you can actually compare pears with apples if you take the protein sequences with the correct codon usage table, but not the DNA sequences. Learning/Exercise goals met 🗹
Stop Making Sense is a music film from 1984 featuring the US rock band Talking Heads (see here the famous release poster). It had nothing to do with them at all, but researchers investigating translational termination related it to themselves, and it first appeared as a title in PubMed in 1988: Stop making sense: or Regulation at the level of termination in eukaryotic protein synthesis (Valle & Morch (1988)). Since then, this title came up in PubMed every other year, or so. It attracted Leoš Shivaya Valášek, co-corresponding author of the Kachale et al. (2023) paper, and Kelly Krause, who jointly designed the cover for Nature shown in Figure 1. Apparently, they could not do without it. I couldn't resist either, and have gone along with Leoš and Kelly in their choice of the plural of "stop" since all three stop codons are re‑assigned in the protist Blastocrithidia nonstop. By the way, the species name is not properly latinized according to the rules, but is that not fitting?
Do you want to comment on this post? We would be happy about it! Please comment on mastodon or Bluesky.
Comments