by Christoph
Figure 1. Size and position of the 9 cryptic prophages in the E. coli K-12 BW25113 genome. Source
When bacteria add new genes to their 'fexible genome' by integration of a temperate phage they find themselves in the precarious situation of 'shared ownership'. Because, at any time or under stressful conditions, the phage can enter the lytic cycle (spontaneously or induced ). The phage then 'reclaims ownership' of its genes by expressing them, and uses the gene products to excise from the host chromosome, to replicate, to pack its multiplied genomes into pre-formed capsids (phage heads ), to burst the cell, and... to leave, as a bunch of progeny virions (= phage particles; the original 'mother phage' is usually left behind ). The 'ownership issue' can be solved in favor of the bacterial cell by any mutation in the prophage genome that prevents it from producing progeny (way too many possibilities exist for phage‑inactivating mutations to list them here ). Such 'incapacitated' prophages have earned the names 'cryptic prophages' or 'defective prophages', and there are 9 cryptic prophages known to reside in the contemporary version of the E. coli K‑12 genome (Figure 1). Theory predicts that selection pressure for chromosome compaction would lead to successive deletions of those parts of cryptic prophages that do not carry genes whose products contribute to the overall fitness of the cells (a process appropriately termed 'prophage decay' ). Indeed, the cryptic prophages of E. coli K-12 total only 166 kbp, considerably less than one would expect for 9 phages of the size of phages from the λ (48.5 kbp ) or P2/P4 (33.6 kbp, 11.6 kbp ) families, their likely ancestors.
Figure 2. Efficiency of colony formation for the Δ9 and the wild-type strain with sublethal concentrations of nalidixic acid (2 μg/ml ) and azlocillin (5 μg/ml ) as well as with 6% NaCl. Powers of 10 indicate the dilution factor. Scale bar represents 10 mm. Source
Theory predicts, in addition, that genes exempt from 'prophage decay' are likely to contribute to the overall fitness of the cells, and this has been studied experimentally by a reverse engineering approach. Tom Wood and his coworkers succeeded in deleting the 9 cryptic prophages with single-base precision, that is, in re‑creating exactly the chromosomal context of the 'pre-prophage genome' of their strain E. coli K‑12 BW25113. Together with the Δ9 strain lacking all cryptic prophages they constructed single‑deletion strains to assay the effects of individual cryptic prophages. The growth rates and growth yields determined for Δ9 and BW25113 at 37°C in rich medium (LB ) and at 30°C in amino acid‑supplemented minimal medium indicated that BW25113 (with the 9 prophages ) faired noticeably better than Δ9 (lacking the 9 prophages ). The single‑deletion strains gave intermediate values but the ΔCP4-57 strain came close to the deficiency of Δ9. Already these results made it clear that the ensemble of E. coli cryptic prophages contributes to the overall fitness of the cells under 'normal' growth conditions. Under a number of stress conditions the differences became more pronounced. When challenged with sub-lethal doses of nalidixic acid (a gyrase inhibitor ), sublethal doses of the β‑lactam antibiotic azlocillin (affecting the 'support stockings' of the cell, the peptidoglycan ), or 6% NaCl (osmotic stress ), the proficiency for colony fomation was reduced by orders of magnitude in the prophage-depleted strain Δ9 as compared to its 'wild-type' parent BW25113 (Figure 2). A transcriptome analysis performed in parallel led the researchers to detect that, with nalidixic acid, 17 prophage genes were induced >2.5-fold in BW25113, and 43 prophage genes >2.5-fold with azlocillin. While genes of the rac and Qin prophages were mainly responsible for the protection from azlocillin in BW25113, the yfdK, yfdO and yfdS gene products of CP-53 were found to be responsible for the protection of BM25113 from oxidative stress (treatment of cells with 30 mM H2O2 for 15 min ). Taken together – and omitting here many more of their detailed results – this study by Tom Wood's lab shows that the 'remnants' of the former prophages in the E. coli K-12 genome contribute significantly to the cells' overall fitness.
Figure 3. Protein interaction assay in vitro. Biotinylated oriC DNA (bio-oriC ) binds strongly to streptavidin beads and thus 'pulls-down' proteins that bind to oriC from the assay mix. Proteins DnaA, DnaB/C, and YfdR bind only weakly to the beads (lane 1 ). DnaA binds strongly to bio-oriC (lane 3 ); YfdR interacts with DnaA and is pulled-down but does not quench DnaA-binding to bio-oriC (lanes 7+8 ). DnaB/C does not bind to bio-oriC (lane 2 ); DnaB/C is tethered to bio-oriC by by interaction with DnaA (lane 4 ). YfdR quenches the DnaB/C interaction with DnaA (lanes 5+6 ). Source
One other protein encoded by the CPS-53 prophage came into focus very recently. In a screen for multicopy-suppressors (a mutation in one gene to be suppressed by overexpression of a second gene present on a high-copynumber plasmid ), Yasunori Noguchi and Tsutomu Katayama found that the yfdQRST genes of CPS-53 suppress the cold-sensitive growth phenotype of the hda- 185 ΔsfiA (sulA ) double mutant E. coli strain (I know it's confusing, but briefly: Hda is a negative regulator of the initiator DnaA. The hda-185 mutation causes severe overinitiation at oriC. The ΔsfiA mutation affects cell septation and, in combination with hda-185, prevents colony formation at the non-permissive temperature ). By testing the effects of the yfdQRST genes individually the authors discovered that the YfdR protein (178 residues ) negatively affects the initiation of chromosome replication by interfering with DnaA functions. One of the functions of oriC-bound DnaA during initiation is the 'loading' of the helicase, DnaBC (6 DnaC monomers attached to the DnaB hexamer ), to the temporarily single-stranded part of the replication origin, oriC, via direct protein‑protein interaction. They showed that in vitro YfdR out-competes DnaBC for binding to DnaA and thus prevents helicase loading (Figure 3). How, then, could YfdR as an inhibitor of replication initiation possibly provide a useful function to wild-type cells? Inhibition of cell division and of new rounds of chromosome replication are among the first reactions of cells to stress (think SOS response ). Therefore, an accessory protein factor that is expressed under stress – as most of the cryptic‑prophage genes are, see above – and promotes the shut-down of chromosome replication can come in handy. The E. coli YfdR protein and its Salmonella ortholog YfbR are 98% identical (a high value even among these closely related species, their DnaA proteins are 96% identical ), which points to a strong selection for sequence conservation. But also YfdR and the Shigella phage SfIV gp32 protein are 98% identical. This suggests that no further mutational adaptation occurred in yfdR when E. coli 'adopted' it as a new gene after the CPS‑53 prophage became a cryptic prophage (does the word 'adoptation' exist? ). It also suggests that YfdR had, as a proprietary phage protein long ago (and still has in phage SfIV ), already the same function: to shut-down, after infection and expression of the phage genes, host chromosome replication by 'sequestering' DnaA, which is dispensable for phage replication. This would allow the phage to 'recruit' the complete set of host replication proteins (helicase, primase, DNA polymerase III ) for its own replication. Call this the perks and perils of 'shared ownership' (if you're all for literary expressionism ) but it should be clear by now that my differentiation of new genes from novel genes here isn't just nitpicking.
After arguing at length why prokaryotes rely mostly on HGT for the acquisition of new genes, now I make a final attempt to come to a conclusive answer as to whether they make novel genes from scratch, or not at all. They do, but only few cases are known so far.
Daubin and Ochman found a small ORFan gene, sra (rpsV ), in the E. coli genome that is located in between the cryptic prophages Qin and rac but outside the known 'array' of their genes (Figure 1). The sra gene product, SRA (S22 ), had been identified earlier as a protein that associates with the small ribosomal subunit preferentially after cells enter the stationary phase. Whether SRA plays a role in ribosome-dimer formation (translationally inactive 100S ribosomes that can be rapidly re-activated ) during early stationary phase is not known. It seems to be a 'lineage specific' ORFan as homologs of the sra gene are not found by BLAST searches in the known phage genomes, and 'outside' the Enterobacteraceae only in the genomes of three bacteria: the Betaproteobacterium Achromobacter sp. ATCC13047, one Vibrio parahaemolyticus isolate (Vibrionales ), and in Pedobacter himalayensis (Bacteroidetes ). It would need decent detective work to find out whether these three apparent HGT events occurred in the wild or in the sequencing labs! The sra gene, although not novel in the strict sense today (the divergence of Salmonella and E. coli occurred >100 million years ago ), is a good candidate for a gene made from scratch. Strong support for this hypothesis could come from detecting the E. coli gene order adhP·maeA·sra·bdm·osmC in another bacterial genome with the sra open reading frame missing (this approach was successful in the case of the yeast BSC4 gene, see the previous post ). However, the complete gene context is not conserved outside Escherichia, Shigella, and Salmonella, which makes this straightforward search approach unfeasible. The 'core' context maeA·sra·bdm is still present in most but not all Citrobacter and Enterobacter species, and maeA is separated from sra and bdm in most Klebsiella species (my own quick & dirty BLAST search effort ). Therefore, sra remains a candidate novel gene for now.
Figure 4. Overlapping yaaW and htgA genes in E. coli. Numbers indicate nucleotide residue for yaaW gene. The initiation codon of hgtA has been predicted to be in nucleotide 632 [14], and more recently in 527 [23]. A gray box indicates the region where we have detected a lowering in the rate of evolution of yaaW sequences with an htgA overlapping gene. Source
If Prokaryotes are restricted from making novel genes in intergenic regions they could resort to evolve them within existing genes, a process that has been dubbed 'overprinting' and was proposed in 1977 by Grassé and later substantiated by Ohno. 'Overprinting', that is, overlapping genes are fairly common in virus genomes, which are constrained in genome sizes due to the volume limits set by their capsids (8 of the 11 genes overlap in the first-ever sequenced genome, the 5,386 nucleotides-long circular single-stranded DNA of phage ΦX174 ).
More recently, Delaye et al. found one case for fully overlapping genes in E. coli , the yaaW and htgA genes, encoded by complementary DNA strands (Figure 4). The htgA gene encodes a positive regulator of the σ32 heat-shock promoter, the function of the yaaW gene product is not known. In this case, the yaaW gene appears to be the 'older' gene as it is evolutionary conserved, with homologs not only within the Gammaproteobacteria but also in Anabaena (Cyanobacteria), Fusobacterium (Fusobacteria), and several Epsilonproteobacteria. The hgtA gene, on the other hand, appears to be a novel gene, restricted to several but not all Escherichia coli and Shigella strains; it is not present in Salmonella.
But there's more in the microbial world than just E. coli, and an intricate case of 'overprinting' was revealed by studies of a small regulatory RNA of Bacillus subtilis. The 205 nt long SR1 sRNA had been shown earlier to regulate, by base-pairing, the 'translatability' of the ahrC mRNA (AhrC is a transcriptional activator of the rocABC and rocDEF arginine catabolic operons ). SR1 does not act like the conventional antisense RNAs by blocking (through base-pairing ) the 5'‑end of mRNAs to prevent translation but binds ~100 nt downstream of the 5'-end of ahrC mRNA and inhibits translation by inducing structural changes downstream of the ahrC ribosome-binding site (RBS). Now it was found that the SR1 sRNA is also translated to yield the 39-residue l SR1P protein that binds to the glycolytic enzyme GapA and stabilizes, by an as yet unknown molecular mechanism, the gapA mRNA. This 'dual function'-gene SR1, with the translated protein being the novel part, is conserved among Bacillus, Geobacillus, Anoxybacillus, and Brevibacillus species and therefore not really 'brand new'.
To conclude: Yes, cells make novel genes from scratch still, even after >2 billion years of evolution. Neither eukaryotic nor prokaryotic cells succeed in doing that very often although at least eukaryotic cells may try more often than generally assumed. If cells succeed in making a novel gene, its establishment in the genome must take a long time, easily several millennia, and in any case more than 'just a few generations'. The known examples demonstrate that such novel genes – and also new genes aquired via HGT – are positively selected if their gene products confer an adaptive advantage, even when only incremental, and if they 'fit' into existing regulatory networks of the cells. These latter constraints make it pretty clear that attempts, as tempting as they are, to learn how the first genes were made at the very beginning by studying the way novel genes are made today are of no avail. That's somehow disappointing, isn't it?
Frontpage picture by Buddhini Samarasinghe, from her blog jargonwall.
Comments