by Welkin Johnson
Stromatolites from the Proterozoic (2.3 billion years ago) found in the Andes of Bolivia. Source
How does one even begin to investigate the natural history of viruses? The dinosaurs bequeathed a motley assortment of bones, teeth, footprints striding 'cross ancient riverbeds, fossilized eggs, the occasional coprolite. The tiny trilobite left lasting and ubiquitous impressions, finding its way into textbooks and museum gift shops. Even prehistoric cyanobacteria, miniscule and boneless, are abundantly memorialized by gatherings of statue-like stromatolites.
As a scientist fascinated with the evolutionary interplay between viruses and their hosts, I admit to considerable professional envy. The paleontologists have it good. What, if anything, does a virus leave behind? My study subjects are utterly lacking in bony, fossilizable material, are too tiny to leave informative impressions in stone, and, unlike bacteria, produce no telltale geochemical signatures. By necessity, viral prehistory is traditionally inferred indirectly from phylogenetic reconstruction, typically based on aligned sequences of highly conserved subdomains shared by many viral polymerases. But these are genetic sequences, obtained from modern viral species, and the inferred ancestors aren't "real;" they are simply averages, each one a best-guess consensus. More importantly, this approach is limited to viruses with living modern descendents; it tells us nothing about extinct viral lineages. (Most likely T. Rex had its own contingent of obligate intracellular parasites?).
In this regard, the retroviruses are the notable exception.
Reverse transcriptase converts the retroviral RNA genome into double stranded DNA, which is then integrated irreversibly into the genomic DNA of the infected cell to form the DNA provirus. This process has occasionally resulted in the deposition of proviruses in the germlines of their animal hosts. Over the expanse of evolutionary time, the genomes of virtually every animal species have become riddled with these proviral sequences, the so-called endogenous retroviruses (ERVs). Most ERV sequences have been degraded by the accumulation of mutations but are still recognizable as retroviral in origin. The human genome alone contains hundreds of thousands of HERVs (Human ERVs), outnumbering our genes. Extrapolate these numbers across the entirety of the animal kingdom, and collectively ERV loci may well comprise a “fossil” collection numbering in the hundreds of millions of specimens. (Now who’s jealous?)
“Resurrected” virions budding from a transfected cell. Bar = 500 nm (left) and 100 nm (right). Image courtesy of Paul Bieniasz. Source
ERVs have attracted their own dedicated “fossil” hunters. Jonas Blomberg’s lab at Uppsala University, and their collaborators, have even developed a nifty tool, called RetroTector, for digitally sifting through billions of base-pairs of genomic DNA the way a Leakey might sift through sediment for a tooth or a bone chip. Like something out of a Michael Crichton novel, the labs of Thierry Heidmann and Paul Bieniasz have gone so far as to use consensus sequences to reconstruct ancient retroviral genomes that then produced particles in transfected cells, thus essentially completing a viral replication cycle initiated more than a hundred thousand years ago.
The lentiviruses, a genus of the modern retroviruses, includes two groups: the primate lentiviruses, including the human immunodeficiency viruses (HIV-1 and HIV-2) and the myriad Simian Immunodeficiency Viruses (SIV) of African primates, and the non-primate lentiviruses, including viruses of goats, sheep, cows, cats, horses, and others. Remarkably, one team of ERV hunters recently unearthed lentiviral ERVs from both of these modern groups.
One was discovered in the genome of the European brown rabbit, and for this the discoverers coined the name RELIK (Rabbit Endogenous Lentivirus type-K). RELIK sequences are most closely related to the non-primate lentiviruses, sharing similar genome structures and significant sequence similarity. They have now been found in other leporid species, indicating that the virus (and by extension the Lentivirus genus) is at least 12 million years old.
Gray Mouse Lemur. Source
Primate lentiviral ERVs were found in the genomes of Gray Mouse Lemurs (Microcebus murinus), pint-sized primates found exclusively on the island of Madagascar. They called these SIVgml (for Simian Immunodeficiency Virus of Gray Mouse Lemurs), signifying their close similarity to the extant SIVs endemic among modern apes and monkeys in Africa. SIV-like sequences have since been found in several additional species of lemur. Given that the Madagascar primates have been geographically separated from mainland relatives for the past 75 million years, explaining the presence of extant SIV on the mainland and the SIVgml ERVs on Madagascar presents a genuine puzzle to paleovirologists.
SIVgml is also something rare that paleontologists are often hard pressed to come by: a transitional form. Primate and non-primate lentiviruses differ by, among other things, the presence of a dUTPase gene in the non-primate version. While SIVgml has several genes unique to the modern primate lentiviruses, it also has a dUTPase, thus establishing an evolutionary link between the two branches. Where the lentiviruses originated is a mystery, but if we accept that RELIK and SIVgml must have shared a common ancestor, then at some point at least one ancestral lentivirus must have migrated across much of the African continent.
Thirteen-lined ground squirrel. “What? Viruses in MY genome?” Source
In 2008 while attending a meeting on Awaji Island, Japan, I ran into my former colleague, Dr. Keizo Tomanaga. Keizo is a virologist in the Research Institute for Microbial Diseases, Osaka University where he and his colleagues study Borna Disease Virus (BDV). As we sampled hors d'oeuvres, Keizo casually mentioned that they had found Bornavirus sequences in the human genome. Seeking clues to the function of Bornaviral proteins, they had done a BLAST search of the human genome looking for cellular proteins that were structurally similar to Bornavirus proteins. What they actually found were (at least at one time) bona fide Bornaviral genes. Since these elements were derived from one particular segment of the Bornaviral genome, that being the gene for the viral N protein (N for nucleoprotein), they named them “Endogenous Borna-Like N” elements, or EBLNs. Keizo and his colleagues ultimately unearthed a trove of EBLNs sandwiched into the genomes of multiple primate species, as well as other mammals including elephants and thirteen-lined ground squirrels. With the publication of these findings in Nature, retroviruses no longer hold the distinction of being the only animal viruses with a rich fossil record.
This finding was unexpected since, unlike retroviruses, Bornaviruses replicate without using a DNA intermediate. So how did their N-gene sequences wind up being part of human genomic DNA? The Bornavirus genome is a single negative-sense RNA molecule used as the template for both transcription of viral messenger RNAs and for production of the positive-sense antigenomes (used in turn to produce more negative-sense genomes for the progeny virions). Uniquely among the RNA viruses, much of this process occurs in the nucleus and nucleolus of the infected host cell, where it takes full advantage of host-cell RNA processing machinery. This puts the Bornaviral sequences in the right place to access the cellular genome, but still does not explain how the information gets converted to double-stranded, integrated DNA.
Notably, some of the ELBNs contain poly-A runs, consistent with an mRNA template, and some of the integrations are flanked by short duplications of the site in genomic DNA where insertion occurred—a hallmark of retrotransposon-mediated insertion. Putting it all together, a likely scenario for formation of EBLN’s is that the viral mRNA is reverse-transcribed and integrated by a cellular retrotransposon, such as the long-interspersed nucleotide elements (LINEs). In the same paper, the Osaka University team also reports detecting newly-inserted Bornavirus DNA sequences in chronically infected, cultured cells, and in the brains of persistently-infected mice, providing proof-in-principle that this can occur as a byproduct of Bornaviral replication. It remains a mystery as to why only the viral N protein coding sequences are found in EBLNs, but the authors suggest that a chance affinity between the N-protein mRNA and the retrotransposon replication machinery probably plays a role.
Despite their abundance, the ERVs and EBLNs are unlikely to be a representative sampling of their contemporaries. To begin with, the infectious event has to occur within a germline cell. Thus, tropism plays a large role; viruses infecting the appropriate tissues have the greatest probability of leaving their mark. Moreover, the rare newly-inserted viral sequence is unlikely to persist in the population and, if detrimental to the host, may by culled by negative selection from the host gene pool. The process of molecular fossil formation thus unfolds over the course of many host generations, with those ERVs and EBLNs found in modern genomes representing the improbable few that squeaked through. Incredibly, the hundreds of thousands of ERVs present in the human genome may represent but a tiny fraction of the total impact of retroviral epidemics on primate evolution.
That ancient endogenous elements like RELIK, SIVgml and the EBLNs are easily recognizable as relatives of modern RNA viruses is more than a little surprising. We are used to hearing about the consequences of error-prone transcription and lack of proof-reading associated with polymerases of the RNA viruses, and how these properties facilitate enormous variation and adaptability. Yet modern Bornaviruses still resemble EBLNs, in some cases after more than 40 million years! Such long-term stability, for which these fossils provide the first direct evidence, highlights the pitfalls of applying sequence divergence and mutation rates to estimate rates of RNA virus evolution.
“Don’t I know you from somewhere?” Source
From the perspective of host evolution, these elements (ERVs and EBLNs) can be sources of genetic variation, providing additional fodder for natural selection. There are several well-documented examples of ERV contributing to formation of new cellular functions. In mice, two genes that formerly encoded retroviral proteins (Fv1 and Fv4) were conscripted by evolution into the service of the host and now protect the cell against exogenous infection by inhibiting retroviral replication. Human Syncytin, a cellular protein involved in placental morphogenesis, evolved from the fusion protein of a long-extinct retrovirus. It remains to be seen whether any of the EBLNs, too, have taken on new roles in their hosts.
All of this leaves one wondering what other viral fossils may lurk in all that genomic DNA spread across the mighty Tree of Life. It’s a good bet they are there, but how do we find them? And more importantly, will we know them when we see them? Show someone a trilobite fossil for the first time, and they will immediately recognize what was once a living creature; most will even guess accurately that it was some sort of arthropod. Like trilobite fossils, ERVs and ELBNs are recognizable for what they are, based on sequence similarity with modern retroviruses and Bornaviruses, respectively. But what about extinct viruses, viruses snuffed out long before humans walked the earth? What about viruses with novel genome structures, or never-before-seen replication strategies? In such cases, where nothing remains but a bit of DNA sequence, might we be staring right at one without recognizing it for what it is (or was)?
Reference
Horie M, Honda T, Suzuki Y, Kobayashi Y, Daito T, Oshida T, Ikuta K, Jern P, Gojobori T, Coffin JM, & Tomonaga K (2010). Endogenous non-retroviral RNA virus elements in mammalian genomes. Nature, 463 (7277), 84-7 PMID: 20054395
Welkin is Assistant Professor of Microbiology and Molecular Genetics at Harvard Medical School, and an Associate Blogger for Small Things Considered.
As I write this, I'm in the process of reading the nominations for the 3QD Prize in Science. I think this article is one of the better ones, but I haven't yet read them all.
Where I have reservations is that the article seems inconsistent as to who its target readership is. The first half of the article is well targetted at intellectually curious non-scientists like me, and got my attention. By contrast, the second half seems targetted at trained biologists with a rich biological vocabulary, and was hard to follow.
Very interesting topic, no reservations about that.
Posted by: Adrian Morgan | June 03, 2010 at 05:53 PM