There is now convincing evidence that yes, Eukaryotes make novel genes from scratch (see our previous post ). But what's about the Prokaryotes? Turns out an answer to this question is not so straightforward. Let me try anyway...
When the first two prokaryotic genomes were sequenced – those of the archeaon Methanococcus jannaschii and the bacterium Haemophilus influenzae, both published in 1995 – microbiologists were startled by the high number of annotated genes whose products were predicted to be proteins of unknown function. Ever since, for every newly sequenced genome one can safely expect about 1/3 of its genes to fall into this category of previously completely unknown genes (annotated as "hypothetical protein" ). Another ~1/3 of the genes belong to a category of genes without known function but with homologs in other sequenced genomes (annotated as "conserved hypothetical protein" ). And ~1/3 belong to the "known" genes, that is, those for which homologous proteins have been studied biochemically (rRNA and tRNA genes belong to this category although not coding for proteins ). Not always but often this holds true even for genome sequences of different isolates/strains from a single bacterial species! This proved to be another big surprise, which led microbiologists to develop the concept of the pan-genome (see here in STC ). In a nutshell, the pan-genome is the sum of the 'core genome' for a species which contains the genes shared by almost all isolates, the housekeeping genes, plus a plethora of 'accessory', or 'variable genes' that are found only in one or a few isolates. Digging for differences in the 'variable genomes' is a promising way to detect, for example, tissue-targeted virulence genes in pathogenic strains of bacteria such as Listeria monocytogenes (Figure 1). In the case of E. coli the 'variable genes' currently outnumber the 'core genome' genes by more than a factor of 10, and a finish line is not in sight. But are all these 'variable genes' in a species' pan-genome really novel genes, made from scratch?
Most certainly not, here is why. It was observed early on that bacterial toxins are often expressed (= produced ) by pathogens from genes of resident prophages rather than from 'core genome' genes. For example, Shiga toxin by the Stx-1 and Stx-2 prophages of E. coli O157:H7, cholera toxin by the CTXφ prophage of Vibrio cholerae, and diphteria toxin by the B-type lysogenic phage of Corynebacterium diphteriae. Formally, genes that are 'added' to the chromosome of a bacterium by integration of a temperate bacteriophage can be considered – from the viewpoint of the bacterium – new genes within its 'flexible genome' but they are not novel genes. They have their own long evolutionary history, in a virtually unfathomable succession of various temporary host cells, and by using various routes of HGT for moving from host to host (for example by 'autotransduction', see our earlier post here ). It is now widely accepted that phages play a key role in 'shaping' the flexible genomes of bacteria, and not only those of pathogens. But, and here's a catch, also the bacteria 'respond' to this continuous influx of new genes. Some do so by organizing their genomes such that the greater part of the flexible genome is confined to dedicated 'islands.' These 'islands' appear to be involved in a rapid turnover of genes (that is, gain and loss rates for new genes are balanced, slightly biased towards loss ).
This was found in two studies from Penny Chisholm's lab describing the detection of five particularly 'HGT-prone' regions in the genomes of several marine Synechococcus and twelve Prochlorococcus isolates (addressed in an earlier post by Merry ), which could be sort-of 'calibrated' against each other, and against metagenomic data from ocean samples (Figure 2). Their analysis indicated that many of these 'flexible genes' do not fit to the phylogeny of the 'core genes.' They vary greatly in type and number between the various ecotypes (= subtypes ) of the isolates, and, most importantly, potentially confer an adaptive advantage to the specific niche of their 'hosts'. The idea that genes, which potentially help cells to adapt to a novel niche, are not brought along all the way by the 'immigrants' but are already present in that niche, in the local virome and ready to be picked up via HGT, is still somewhat... unfamiliar, to put it mildly (maybe less so for hardcore marine microbiologists ). The Prochlorococci (Cyanobacteria) are no exception in expertly 'organizing' their flexible genomes because, for example, a comparably high plasticity confined to genomic 'islands' was also found by analyzing inter‑strain genome variation in Helicobacter pylori (Epsilonproteobacteria).
All cells can quickly adapt to changing environmental conditions by adjusting the overall expression pattern of their genome (see, for example, Fig. 1 in VanBogelen and Neidhardt (1990) for the heat-shock response in E. coli ). There's no need for new genes here. Prokaryotic cells also adapt – though not as quickly – by gene duplications, which can actually be seen as a recombination‑dependent rudimentary form of regulation of gene expression. Riehle et al. cultivated an E. coli B strain for 2,000 generations under standard conditions (37°C ), followed for six separate lineages by another 2,000 generations under stress, at near‑nonpermissive temperature (41.5°C ). After selection, three of the six 'stressed' lineages turned out to have acquired a duplication of >24 kbp (kbp = 103 basepairs ) around and including rpoS, encoding the stationary phase-specific sigma-factor. Coinciding with the occurrence of the duplication at different time points during growth at 41.5°C, the strains gained a 20 – 65% increase in fitness when compared to their ancestors. Obviously, gene duplications are an important mechanism for adaptation but a duplicated gene is certainly not a new gene. One usually thinks of mutations in the context of adaptation, and yes, cells adapt – sometimes surprisingly fast – by 'tweaking' their genes. They allow selection to pick from the continuously arising random mutations those that increase the functionality of affected gene products or silence genes if the products are detrimental (measured as the survival rate of the mutant progeny cells; 'survival' represents the cumulated and often incremental effects of favorable and detrimental mutations ). Finally, cells adapt by the acquisition of new genes via HGT (see above ) or by making novel genes from scratch (see further below ).
Genome size and genome compactness are, in general, not so much an issue in Eukaryotes. But one reason why Prokaryotes rely heavily on HGT for the acquisition of new genes is their compact genome structure, which is apparently under strong selective pressure (size not so much as we know one-chromosome genomes of free-living bacteria ranging in size from ~1.5 Mbp to >10 Mbp; Mbp = 106 basepairs ). The genome of Schizosaccharomyces pombe (Ascomycota ), for example, has a size of 12.5 Mbp and encodes ~4,900 genes. In comparison, roughly the same number of genes (~4.400 ) are encoded by the approximately 3-fold smaller genome of E. coli K-12 (4.6 Mbp ). Since the proteins of S. pombe and E. coli do not differ significantly in their size distribution, the non-coding 'intergenic regions' must be and are in fact shorter in the E. coli genome. 1,155 intergenic regions are really short with lengths of <50 bp, 1,889 range in length between 50 and 300 bp, just 478 are 300 – 900 bp long, and there are only 33 intergenic regions >900 bp (note that the numbers of intergenic regions do not add-up to the number of genes because ~30% of the ORFs overlap each other, a hallmark of compact genomes ). I mention these numbers only to emphasize that in the compact E. coli genome the 'open space', that is, non-functional DNA, that could serve as 'playground' for making genes from scratch is fairly limited. And even more limited if one takes into account that most intergenic regions are all but 'non-functional' as they accommodate transcription signals (promoters, terminators, transcription factor binding-sites ) or genes for a plethora of sRNAs that have come into focus only more recently. Thus Prokaryotes are – unlike Eukaryotes, see the previous post – very limited in evolving novel genes by accumulating mutational changes in intergenic regions. HGT by phages, on the other hand, allows the cells to aquire new genes 'en bloc' to adapt to changing environmental conditions.
An analogy pops up that is, although anthropomorphizing, too tempting to put aside. When you work hard on your old iPad 1 (with its very limited memory space of 16 GB, you know it! ) to develop a program in R, for example, you probably find it way more appealing to download an already existent code snippet 'from the cloud' rather than coding that part yourself from scratch, including all these unnerving bug-fixing efforts. Clearly less appealing is the perspective, however, that whenever you download snippets from the web you're inevitably exposed to viruses... But one intricate way how bacteria eventually turn phage genes into own new genes will be explored in the final sequel of this »making genes from scratch« series.
Frontpage picture by Buddhini Samarasinghe, from her blog jargonwall.