The Rise of Genomic Superspreaders
by Steven Quistad
The common ancestor of placental mammals probably looked like Eomaia scansoria, the earliest known placental mammal, shown here in an artist's reconstruction based on a 125-million-year-old fossil skeleton found in China in 2002. Source.
One hundred million years ago the earth’s climate was much warmer than today and vast inland seas stretched across entire continents. The land was dominated by charismatic megafauna that would one day serve as inspiration for Sir Arthur Conan Doyle’s novel The Lost World. This period is commonly referred to as the age of reptiles as our placental ancestors were barely visible. Yet it was during this period that something significant happened to them, something that would become a major part of who we are today. One hundred million years ago retroviruses infected our ancestors’ germline and hitched a ride through evolution into the present day where their DNA still exists in all of our genomes. In fact, such retrovirus infections occurred ~31 separate times in our evolution and these endogenous retroviruses (ERV’s) expanded and now make up an astounding 8% of our entire genome. This means that we owe ~240,000,000 bp of our DNA to these retroviruses!
Phylogeny of mammals (57) with ERV megafamilies shown as colored circles (area is proportional to the percentage of the ERV loci in the genome represented by that family). The placing of megafamilies on the tree shows relative age but not origin (which may be considerably earlier). Scale bar shows approximate dates in host phylogeny. Asterisked taxa are treated as duplicates and excluded from our analysis of all ERV families. Name color shows how many IAP loci were found in each species. Source.
Retroviruses usually infect somatic cells; therefore, when the infected cell stops dividing all progeny will vanish with the last cell of the clone. However, a retrovirus occasionally infects a cell belonging to the germline. Any offspring that develop from this infected germline cell will maintain the provirus and will pass on to their descendants. The establishment of an ERV lineage begins with an exogenous “founder provirus.” In humans each of the 31 families of ERV’s represents 31 separate integration events that occurred during our evolution. These ERV families are able to expand through reinfection, retrotransposition, and piggy-backing off co-infecting viruses; rarely they also double through duplication of the chromosomal segment where they reside. The total numbers of copies or loci can range from just a few to thousands in different families. If the function of a particular viral protein is subject to little selective pressure, random modifications will eventually result in a total loss of expression and replication ability. Most of our ERV’s are at least 30 million years old, so it is not surprising that many human ERV’s have lost the ability to replicate and reinfect neighboring cells due to the accumulation of substitutions, deletions, and insertions. Thus our genome has become a graveyard of formerly active ERV’s.
All retroviruses encode envelope proteins (the products of the env gene), which are required for infectivity. Recent work by Magiorkinis et al. revealed that when ERV’s lose their env gene, their proliferation within a genome is boosted by a factor of ~30. Using an in silico approach the authors recovered ERV loci from 38 mammalian genomes. They found that expansion of an ERV within a genome is negatively correlated with env integrity but not with the integrity of other ERV genes. This suggests that loss of env integrity provides the virus with some type of selective advantage. Interestingly, the distribution of ERV megafamilies within the 38 genomes closely followed the 20/80 rule, also known as the Pareto principle. This is an expansion of power-law distributions that, when applied to infectious diseases for example, states that a small percentage of individuals within a population are responsible for most of the transmission events. In this study, 22% of the megafamilies accounted for 80% of all the ERV’s. The 20/80 rule has been demonstrated in HIV, SARS, and now ERV proliferation.
Histogram showing (A) how common are ERV families of different size (B) How many loci in total are in these families. (c) env integrity (relative to gag) for megafamilies and randomly selected smaller families. Source.
So why would the loss of the env gene increase the proliferation of an ERV? After all it seems counterintuitive that the loss of a functional viral receptor would increase its copy number. From the host’s perspective, active ERV replication, which is occurring most often in somatic cells, risks insertional mutagenesis. The transmembrane domain of the Env protein is also known to have immunosuppressive properties; both of these factors would reduce host fitness. From the viruses perspective, replication through the formation of complete virions requires evading the host innate immune system. Therefore, loss of the env gene would select for ERV’s that replicate solely at the genomic level avoiding the host immune defenses.
More generally the significant evolutionary success of endogenous retroviruses raises many future questions. How was evolution of the host shaped by ERV’s? How do ERV’s affect host gene expression? Are ERV’s ubiquitous in other organisms beyond mammals? The high prevalence of ERV’s within our own genome provides yet another example that we live in world that has been intimately shaped by the most abundant biological entities on the planet, the viruses.
Steven is a student in the University of California at San Diego/San Diego State University Integrative Microbiology graduate course.
Magiorkinis G, Gifford RJ, Katzourakis A, De Ranter J, & Belshaw R (2012). Env-less endogenous retroviruses are genomic superspreaders. Proceedings of the National Academy of Sciences of the United States of America, 109 (19), 7385-90 PMID: 22529376