by Zachary Williams
The past as key to the present
Photo by Kent MacElwee. Source
When doctors in the 1980s noticed that people were dying from severe immune deficiency, nobody had ever seen anything like it before; it was a completely novel and terrifying disease. Once we figured out it was caused by a virus, the natural question was, where did it come from? It turns out that HIV was the result of zoonosis: the transmission of a pathogen to humans from other animals, in this case, chimpanzees and other African primates. In fact, a lot of human diseases originated from such cross-species transmissions: viral diseases like measles and smallpox, and also other infectious diseases like malaria and plague. At one point, each of those diseases was just as new and terrifying to humans as HIV. Of course, if it’s happened before, we can expect it to happen in the future as well, and thus a lot of research today is focused on understanding how and why zoonosis occurs. One of the ways we can investigate this is looking at past cross-species events. For example, what if we could look at the evolutionary history of ancient viruses, tracking their spread over millions of years? This is the core idea behind paleovirology – by studying the past evolution of diseases, we can better understand what’s going on today, and what could happen in the future.
Of course, viruses don’t leave fossils, so you might worry that paleovirologists have nothing to study. Fortunately, as previously covered on this blog, some viruses leave a 'fossil record' in the genome of their host species, in the form of viral sequences that get inserted into the host DNA. Retroviruses are by far the most common perpetrators, as they must irreversibly integrate their viral genome, or 'provirus,' into the host DNA before they can replicate. Sometimes this happens in a germline cell, and that cell survives and gives rise to offspring. This is probably a rare event, but over evolutionary time these endogenous retrovirus (ERV) sequences build up; the genomes of most animals have a significant percentage of ERV DNA –for example, approximately 8% of the human genome originated from viruses.
This trove of ancient viruses can teach us a lot about how viruses work, as shown in a study led by William 'Ted' Diehl in Welkin Johnson’s lab. The story begins as a fairly straightforward study of the evolution of a specific group of ERVs, but turns into a somewhat startling tale of rampant cross-species transmission and repeated genetic recombination between different retroviruses.
Hot on the trail of an ancient retrovirus
The ERV-Fc group is one of some 39 ERV clades found in the human genome, thus named because of a convention among ERVologists to name ERV groups after the tRNA they use to prime reverse transcription –in this case phenylalanine. It was first identified in 2003, with very little subsequent work, most notably one paper that identified some ERV-Fcs in dogs.
(Click to enlarge)
Figure 2.The genomes of most Eutherian mammals harbor ERV-Fc. A mammalian phylogeny including species whose genomes were examined for the presence of ERV-Fc. Species lacking ERV-Fc are depicted in red, while those found to harbor ERV-Fc are depicted in green. Source
Diehl and coworkers were curious about whether sequences related to ERV-Fc could be found in other mammals. Mining various sequence databases, they in fact identified ERV-Fcs in the majority of mammalian species they looked at, including members from every superorder of Eutherian (placental) mammals except for Xenarthra (anteaters, sloths, and their relatives). In total, they found ERV-Fc in 28/48 mammalian species screened. Interestingly, there was no sign of ERV-Fc in the Metatherian (marsupial) genomes they searched (figure 2).
They then asked, how did ERV-Fc manage to get into so many different species? The most straightforward explanation is that ERV-Fc was actually present in the common ancestor of all Eutherians, and simply came along for the ride as their hosts diversified over the past 100 million years or so. The viruses could have been transferred vertically as endogenous sequences, or as actively replicating viruses that evolved along with their hosts, occasionally leaving ERV fossils in their hosts’ genomes. An alternative explanation is that each lineage was infected
independently at some point in its evolution, through cross-species transmissions much like the zoonotic event that gave rise to HIV.
Oh, what a tangled...gram...we weave
One way to test these hypotheses is to compare the phylogenetic trees of the viruses and their hosts; if each ERV-Fc and its host species have been associated since their divergence from a common ancestor, their phylogenies should match, e.g., dog ERV-Fcs should be more closely related to other carnivore ERV-Fcs than to, say, human ERV-Fcs, and human ERV-Fcs should be more closely related to other primate ERV-Fcs than they are to their relatives from carnivore hosts. This 'co-speciation' pattern is seen with many viruses, as well as with other symbiotic relationships, like aphids and their endosymbiotic bacteria. When the authors looked at ERV-Fc, they did see some patterns like this; in trees for the viral genes gag and pol, a clear carnivore clade pops out, as well as a clade containing great ape and rhesus macaque ERV-Fcs, but the tree also reveals all sorts of discrepancies. For example, ape and rhesus ERV-Fcs are closely related to each other, but there are other primate ERV-Fcs scattered throughout the tree, with no apparent connection to evolutionary relatedness. Some species also contained two distinct ERV-Fc lineages. These discrepancies suggest that multiple cross-species transmissions of ERV-Fc viruses occurred in the past.
A quick and dirty way to visualize this is with a figure called a tanglegram (figure 3): simply line up the virus and host trees, and draw lines between each virus and its host; lines crossing each other indicate a cross-species transmission occurred in that viral lineage. As you can see, there’s a whole lot of line-crossing going on. From this tangled mess, the authors estimate that a minimum of 26 distinct transmission events occurred. Transmissions weren't just limited to closely related species either; for example, the closest relatives of the dolphin proviruses seem to be in rodents and rabbits! The authors don’t speculate on when or how such a dramatic cross-over could have occurred, though it’s worth remembering that dolphins evolved from terrestrial ancestors, so it’s conceivable that the transmission could have occurred at a time when the ancestors of rabbits and dolphins were a lot more likely to encounter each other than their descendants are.
A retroviral chop shop
(Click to enlarge)
Figure 3. Tanglegram comparison of host (left) and ERV-Fc phylogenies (right); dashed lines match species and the ERV-Fc found within their genome. Source
One of my favorite aspects of this paper is its thoroughness; most importantly, the authors made phylogenies based off multiple viral genes, instead of just one gene. Why does this matter? Well, you may have noticed that, not only do the viral trees not match the host tree, but there are also some discrepancies between different viral genes as well. Most notably, some of the env genes are more closely related to the human endogenous retrovirus HERV-W, than they are to the other ERV-Fcs! How is that possible? It turns out that retroviruses are rather prone to recombination, sometimes with retroviruses that are quite distantly related. Presumably these viruses picked up a new env gene from a relative of HERV-W, though a discerning eye may note that it rather looks like HERV-W may have stolen its env from an ERV-Fc in the first place, though the authors don’t talk about this possibility.
The authors now decided to dig into this story a little deeper by making trees with individual provirus gag genes, uncovering an even more remarkable evolutionary story. First, their pol and env trees showed that the Fc1 viruses from dogs and ferrets form a monophyletic clade along with the giant panda ERV-Fc. We also know that this Fc1 clade acquired its env from a HERV-W related virus. But they noticed that the tree of gag genes gets a little confusing, because there the Fc1 genes are not monophyletic. Instead there is a nice little Fc1 clade of dog, ferret, and panda viruses, but then a big group of ferret Fc1 gags are grouping with ferret Fc2! The easiest way to explain this is that there was a second recombination event, where a ferret Fc1 virus borrowed a gag gene from a ferret Fc2 virus. This is all summarized in a nice flow chart showing how the viruses may have spread into different carnivore species, and when the recombinations occurred (figure 4).
It's been awhile
Figure 4. Proposed recombination and transmission sequence involving carnivore ERV-Fc1. ERV-Fc sequences are depicted in blue, while ERV-W sequences are depicted in orange. Source
How to put all of this species-crossing into historical perspective? Conveniently, the unique structure of retroviral genomes allows us to make rough estimates of the date of any intact integration. The integrated provirus is flanked by long terminal repeats, or LTRs, and the LTRs of any given provirus are identical in sequence at the time of integration. Over long periods of time, LTRs will accumulate random mutations, slowly diverging in sequence. If we know the average mutation rate of the host species, we can use this divergence as a ‘molecular clock.’ Diehl et al. did this with all the proviruses they found with two intact LTRs. The oldest insertions they found were in ferrets and dogs, at a little over 30 million years old. While this is pretty dang old, it’s nowhere near as old as Eutherian mammals as a whole, which originated around 100 million years ago, corroborating the hypothesis that these viruses didn’t originate in the common ancestor of Eutherians. The age estimates also show that different ERV-Fc groups colonized their hosts at different times. For example, the ERV-Fc2 lineages in both ferrets and dogs were active between 35 and 20 million years ago, while the ERV-Fc1 groups in ferrets and dogs are much younger –in fact, the canine ERV-Fc1 lineage is the youngest of all the ERV-Fcs, with an average age of ~6 million years, and some loci much younger than that.
So, what have we learned?
First, as we’ve already said, cross-species transmissions appear to be the rule rather than the exception in ERV-Fc; the closer we look, the more we find. Second, ERV-Fc is a talented genetic thief, frequently acquiring new genes from other viruses (at least twice according to the data in this study, but I strongly suspect that we’d see a similar story if we looked as closely at the rest of this group as the paper did at carnivores). It’s tempting to speculate that these two facts may be related: did acquiring a new env gene give the carnivore Fc1 viruses the ability to bind to and enter different cell types, allowing them to spread more widely? We can’t know for sure, but we do know that virus receptor tropism can be a barrier to cross-species transmission; for example, chickens appear to have evolved to be resistant to the mouse virus MLV by accumulating mutations in the receptors used by MLV’s env.
Lastly, what does this tell us about today, and the viruses that could threaten us in the future? Not a lot directly, to be honest, but I think it does raise an interesting question: just how unusual is ERV-Fc’s penchant for cross-species transmission? If it is unusually prone to cross the species barrier, perhaps we can figure out what characteristics make it so, and keep a watchful eye out for other viruses that may have similar abilities.
References
- Diehl WE, Patel N, Halm K, Johnson WE. 2016. Tracking interspecies transmission and long-term evolution of an ancient retrovirus using the genomes of modern mammals. Elife 5:e12704. PMC
Zachary is a graduate student in the lab of John Coffin at Tufts Medical School, Boston.
Comments