by Jamie Henzy
A pile of macaroni
Readers of STC are likely aware that sequences from viruses infecting a host can sometimes wind up inserted into the host chromosomal DNA, even being passed on to offspring if the insertion occurs in an egg or sperm cell. This happens most frequently with retroviruses, which insert into host DNA as part of their replication strategy; consequently "endogenous" retrovirus (ERV) sequences comprise some 8% of the human genome (for perspective, compare this with ~2% of the genome that codes for proteins!). Other types of viruses also occasionally wind up inserted, most likely by consorting with one of the several cut-and-paste elements that are considered genomic parasites, such as LINE-1 elements. A LINE-1 element is a retrotransposon capable of copying viral RNA into DNA and inserting it into the host genome. These inserted viral sequences provide some of the scraps used in the process of "bricolage" described by Francois Jacob, whereby natural selection tinkers around with whatever is available to create something new, like a kid pasting macaroni around the edges of a cardboard rectangle and, voilà! . . . a picture frame!
Recently we recounted the story of wasps that have made use of their inserted viral genes to protect their eggs from the immune system of the caterpillar hosts into which they deposit them. Before that, we described how mammals depend on inserted retroviral genes to help form the placenta. Now we present another tale of virus bricolage, once again starring retroviruses. The "Part Deux" in the title relates to the fact that this tale, like that of the wasps, involves the hosts' use of viral sequences in functions that protect it from pathogens. Isn't that ironic!
See you LTR
In this case, the useful scraps of virus sequence are the promoter and enhancer regions that normally serve to initiate transcription of the virus when host transcription factors (TFs) bind to them. These regions are in the long terminal repeats (LTRs) found at both ends of the retroviral genome (Figure 2). Because the LTRs are identical in sequence when the virus first inserts into the host DNA, they frequently recombine such that the sequence between the LTRs is lost in the shuffle, leaving a so-called solo LTR. The vast majority of retroviral sequences making up that 8% of the human genome, in fact, consist of solo LTRs.
Now, imagine what can occur if one of these LTRs – solo or otherwise – happens to be near a host gene. The promoter and enhancer regions could contain binding sites recognized by host TFs and thereby alter expression of the gene. And there's another feature of retrovirus insertions that needs to be taken into consideration – the insertions can retrotranspose to various other regions of the genome; that is, they can be copy-and-pasted multiple times by the same mechanism described for LINE-1 elements. This means that LTRs with identical TF bindings sites may spread around to various locations, some within the vicinity of genes whose expression they can change. In other words, multiple host genes may all have their transcription altered in a coordinated manner by ERV LTRs.
The art of networking
Coordinated control of a network of genes can be very useful. Consider what happens when a pathogen is detected in a cell. Host signaling molecules known as interferons (IFNs) are released, which induce TFs to bind to promoter regions of various genes involved in innate immunity, collectively known as IFN-stimulated genes (ISGs). Expression of all these ISGs allows the host cell to launch a multi-pronged attack against the invader. Although this type of coordinated response is conserved across mammals, the specifics, such as how much of which gene is expressed, differ among species, most likely because different pathogens have put different selective pressures on the gene networks involved. However, the origins, maintenance, and adaptability of such coordinated networks presents some thorny problems. How do all of the genes in the network happen to evolve the same regulatory sequences? And what underlies the variation in these networks among different species? If you paid attention to the italics above, you've arrived at the hypothesis tested by Chuong et al. – that ERV LTRs in theory could provide the regulatory coordination required to evolve these networks.
The group focused on on IFN-gamma (IFNG), which is associated with a couple of TFs, one of which is known as signal transducer and activator of transcription-1 (STAT1). When the presence of a pathogen causes IFNG to kick in, STAT1 binds to special sites in the promoter regions of a set of ISGs, resulting in the expression of a whole slew of antiviral factors to protect the cell. To learn whether ERVs supply any of these TF binding sites, they analyzed ChIP data for human cells treated with IFNG. For the uninitiated, ChIP, for chromatin immunoprecipitation, is a technique for finding where on DNA specific proteins (such as TFs) bind. Proteins that have bound to the DNA are cross-linked so that they stay put, and the DNA is broken up into fragments. Then an antibody to the bound protein is used to pull out only those fragments with the bound protein. After reversing the cross-link to remove the protein, the DNA fragments can be sequenced and, ta-da!, you have your binding sites. They found that these STAT1 binding sites abound with LTR sequences from families of ERVs. Some of these families integrated and expanded in the genome 150 million years ago (mya), while others were acquired more recently. Most importantly, a large portion of these ERV sequences were found near genes associated with immune functions, raising the intriguing possibility that erstwhile viral sequences have been "turned" by the host to help fight pathogens: the ghosts of past viruses haunting the genome and rattling their chains at their descendants!
Consider one large primate-specific ERV family that is present at many of the STAT1 binding sites. This family, MER41, originated from a retrovirus that invaded the ancestral genome ~45 to ~60 million years ago, and includes several subfamilies. One particular subfamily, MER41B, has tandem STAT1 binding sites in its LTR, and is abundant near genes known to be stimulated by IFNG. Another subfamily, MER41A, carries a 43-base-pair (bp) deletion where the STAT1 binding sites would have been, and is not enriched near IFNG-stimulated genes. The consensus sequence generated from comparing all MER41 copies gives a rough portrait of what the ancestral sequence looked like, and it appears to have arrived in the genome with the STAT1 binding sites intact. One can imagine that those MER41 elements that altered nearby genes in a harmful way would be selected against, sometimes leaving the copies with the 43-bp deletion behind. However, others altered gene expression in a way that helped the host cell survive and were selected for and fixed in the population. And because they all responded to the same TFs, the genes they affected were roped into a coordinated network.
Of course the group did all the requisite work to show that these MER41 elements actually enhance expression of downstream genes in response to IFNG, even using the recently-blogged-about CRISPR-Cas system to delete a MER41 element in human cells. As you might guess, the nearby ISG failed to express in the mutants, proving the necessity of MER41 in its regulation.
A menagerie of MERs
ERVs have likely contributed to the wiring of IFNG-inducible networks in other mammals, as well. Sometime between 75 and 50 mya, different MER41-like elements – some of which carry STAT1 binding sites – were independently acquired by numerous mammalian lineages. Indeed, the group found that lemurs, bats, and cats all have MER41-like elements that respond to IFNG in vitro, driving expression of luciferase reporter constructs.
If ERV LTRs contribute to wiring of immune regulatory networks in a way that allows them to adapt to the particular panel of pathogens a species typically encounters, then there should be some regulatory differences seen between species, no? Take a particular ISG called Absent in Melanoma 2 (AIM2), present in both humans and mice. The protein product of AIM2 acts as a pathogen sensor when it encounters foreign DNA in the host cell cytosol, leading to an inflammatory response. In humans, AIM2 is an IFNG-stimulated gene driven by a MER41 element, whereas in mice (a species lacking MER41), AIM2 is constitutively active, i.e. always turned on. So the presence of MER41 ERVs in a primate ancestor allowed AIM2 to be wired into the IFNG-stimulated network of genes, whereas this gene falls outside the network in mice. They also found that this particular MER41 element is conserved nearby AIM2 in chimps, rhesus macaques, and marmosets, as well, and cells from these species express AIM2 when stimulated by IFNG.
You may be asking, why were the STAT1 binding sites present in the original infecting virus? The authors speculate that they may have allowed the virus to exploit features of the host immune response that aided in its replication, for example allowing it to escape gene silencing in specific cell types. In any event, we have here another tale of bricolage whereby vertebrate genomes make the most of the flotsam and jetsam known as genomic parasites, or junk DNA. In this case, the propagation in primates of a family of viral sequences contributed to the wiring of a gene network for fighting pathogens. Not only does this abound in irony, but it also deepens the relationship between our ERVs and our evolution. Considering how widespread ERVs are throughout the vertebrate kingdom (only the lowly hagfish and lampreys appear to be short of them), we've likely only scratched the surface in terms of the various roles played by these sequences-formerly-known-as-junk-DNA.
In addition to being an Associate Blogger for STC, Jamie is a postdoctoral researcher and part-time teaching faculty at Boston College.
Chuong EB, Elde NC, Feschotte C. 2016. Regulatory evolution of innate immunity through co-option of endogenous retroviruses. Science 351:1083–1087. doi: 10.1126/science.aad5497