by Christoph
To continue with the comparison of apples and pears from the first of this Of Terms in Biology two-parter, here is an example of a synonymous substitution that can rightly be called a silent mutation.
I said in part 1 that I could easily provide watertight definitions for both terms that make it clear from which perspective which of the two terms make sense, where their meanings overlap and where they do not. Let me digress for a moment and share with you the answer that the chatbot Bing came up with when I asked it: "why are synonymous substitutions in protein-coding genes sometimes called silent mutations?"
Synonymous substitutions in protein-coding genes are sometimes referred to as "silent mutations" because they do not change the amino acid sequence of the encoded proteins¹,². This is possible due to the degeneracy of the genetic code, where different codons can code for the same amino acid¹,².
Historically, it was believed that synonymous mutations, which do not alter the amino acid sequence, would have no effect on the organism and would be 'neutral' during evolution¹,². Hence, they were referred to as 'silent'¹,².
However, it's important to note that while synonymous mutations do not alter the amino acid sequence, they can still cause changes in protein expression, conformation, and function¹,². They can affect transcription, splicing, mRNA transport, and translation, any of which could alter phenotype, rendering the synonymous mutation non-silent¹,². Therefore, the term "silent mutation" is not always accurate¹,².
That's pretty good and would score full marks in a student exam. Does that make us bloggers superfluous? You can decide that for yourself, but I'd better get back to the second example now.
Twin synonymous substitutions that remain silent That synonymous substitutions in protein-coding genes are not necessarily phenotypically silent is old hat and has been known to molecular geneticists since the seventies of last century. Back then, outright spectacular because unexpected were the findings − presented in seminars but rarely in papers − that various heterologous genes could not be expressed in full length in E. coli if they contained in their coding regions AGG or AGA codons for arginine, whereas CGN codons were not a problem (look in a codon table for the six codons for arginine). We learned at that time to take the species-specific codon bias seriously, that is, that the correlation of the occurrence of certain codons for an amino acid in open reading frames (orfs) in relation to the availability of the cognate tRNAs matters. But only for the "rare" arginine codons AGA and AGG in E. coli a specific "grammar" had to be developed, as they are apparently involved in the regulation of expression levels of various housekeeping proteins (see Napolitano et al. (2016)).
Also in the seventies, molecular geneticists became more and more aware that all transcripts − not only "stable" rRNAs, tRNAs plus a few others, but also "unstable" mRNAs − are not linear strings of nucleotides, but tangles with alternating single-stranded (ss) and double-stranded (ds) regions, with hairpins, bulges, and pseudoknots. These secondary structures basically determine the "shelf life" of RNA transcripts in vivo by modulating the accessibility for ss- or ds-specific RNases. In the case of mRNAs, the secondary structures also influence their translatability. Native RNA secondary structures are notoriously difficult if not impossible to determine experimentally, and already in 1995, Olke Uhlenbeck quipped in his essay Keeping RNA happy that isolating native RNA comes close to purifying active enzymes from SDS gels.
In the eighties, however, we already had the "Zuker algorithm," the program Mfold, for secondary structure predictions that came in handy during my PhD for taking a closer look at several synonymous substitutions that I had found when sequencing the ftsI gene encoding PBP3 of E. coli B strain REL606 and comparing it with the known ftsI sequence of E. coli K‑12 (Nakamura et al. (1983), identical to that of reference strain MG1655). An aside: Since both the E. coli B and K‑12 lab strains are derived from different wild‑type isolates, it is better to speak of single nucleotide polymorphisms (SNPs) here because it cannot be deduced in which strain some codons were "substituted" (see below).
A threonine codon in the ftsI mRNA reads ACU in one strain and ACC in the other (Figure 2). According to Comer (1982), both these codons are recognized by tRNA1Thr (thrV) and tRNA3Thr (thrT) that together make up about two thirds of the tRNAThr isoacceptors in E. coli. Thus, both codons should be equally well translatable. A proline codon eleven codons downstream of this threonine codons reads CCA in one strain and CCG in the other. According to Dong et al. (1996), tRNAUGGPro (proM) recognizing CCA is about half as abundant as tRNACGGPro (proK) recognizing CCG in E. coli. However, since the CCA codon is not part of a proline codon pair here it should not cause any of the known problems during translation, as, for example, ribosome stalling or frameshifts.
The interesting point is that the third bases of both codons are perfectly juxtaposed in virtually identical (predicted) secondary structures, that is, basepaired in their respective mRNAs (Figure 2). None of the codons is expected to cause problems in translation, and their positive selection apparently serves to maintain the same advantageous mRNA secondary structure. At the time, I skipped investigating any possible phenotypic differences between the two ftsI alleles and concluded that it should remain as one of the many unpublished examples of "silent mutations" that do not even whisper. But it was fun to find out. And it was fun again to remember this old story.
A footnote Aficionadas y aficionados of E. coli may have noticed that both examples for "silent mutations" were found in E. coli B strain REL606. As detailed in part 1, McGrath et al. (2023) had obtained this strain from Richard Lenski, who employed it in the long-term evolution experiment (LTEE) celebrated here in STC. E. coli B strains have a long and sometimes murky (mutational) history in their journey through various labs since the parental strain was (presumably) first isolated by Felix d'Herelle; you can get an inkling of this by peeking at Figure 1 from the paper by Daegelen et al. (2009). As you are navigating through this Figure 1, you will notice that E. coli K-12 strain MG1655, whose genome (accession NC_000913.3) is commonly regarded as the "reference genome" for E. coli, derives from a completely different lineage than strain REL606. Also, you can see that strain BL21(DE3), which for many practioners is the preferred strain for expression of heterologous proteins in E. coli, is an E. coli B strain, a cousin of REL606.
It is mere coincidence that the example detailed here of twin synonymous substitutions also concerns E. coli B strain REL606, which is not completely true though. In fact, I had cloned and sequenced the ftsI gene from E. coli B strain L44, a derivative E. coli B strain B40 that Luigi Gorini had derived from the Paris E. coli B strain (see again Figure 1). Since the complete genome sequence of REL606 was not deposited in GenBank until 2014 with the accession CP000819.1 but is identical for the ftsI gene to the sequence I determined ~3 decades earlier, I consider this inaccuracy reasonable.
Do you want to comment on this post? We would be happy about it! Please comment on Mastodon, Bluesky, or on 𝕏 (formerly Twitter).
Comments