Toppling Another Evolutionary Icon, ENCODE Data Suggests Endogenous Retroviruses Are Functional
Recently a reader emailed us to ask about how to respond to the argument that some endogenous retroviruses (ERVs) seem to match the standard phylogeny of higher primates, supporting common descent. Endogenous retroviruses are thought to be parasitic junk sequences in our genome that were derived from viral DNA. Evolutionists often cite them as supposed evidence of common ancestry. The reader asked how many ERVs are shared between humans and apes.
These are good questions. ERVs are a favorite with Darwin defenders -- so much so that one particularly uncivil activist, Abbie Smith, even named her popular science blog after them, "ERV."
In response, I suggested that, when analyzing common descent, the number of ERVs shared by humans and chimps is beginning to look like it's not a very important question. An assumption in the question (and in the arguments from supporters of common ancestry) is that ERVs are a type of functionless "junk" DNA. Thus if apes and humans share ERVs in the same position in our genomes, that would seemingly count as evidence for common descent. But what if ERVs aren't junk? What if they are a type of functional DNA? If that's the case, then shared ERVs could easily be explained by common design rather than common descent, and they would certainly no longer be some kind of special argument for common ancestry.
In fact there is good evidence that ERVs as a class of DNA are functional. A 2013 paper in PLOS Genetics using ENCODE data reported that very high percentages of endogenous retroviruses in the human genome are associated with open chromatin -- strong evidence of transcription -- and that they are transcribed non-randomly, suggesting some functional role:
Although emerging evidence suggests that transposable elements (TEs) have contributed novel regulatory elements to the human genome, their global impact on transcriptional networks remains largely uncharacterized. Here we show that TEs have contributed to the human genome nearly half of its active elements. Using DNase I hypersensitivity data sets from ENCODE in normal, embryonic, and cancer cells, we found that 44% of open chromatin regions were in TEs and that this proportion reached 63% for primate-specific regions. We also showed that distinct subfamilies of endogenous retroviruses (ERVs) contributed significantly more accessible regions than expected by chance, with up to 80% of their instances in open chromatin. Based on these results, we further characterized 2,150 TE subfamily-transcription factor pairs that were bound in vivo or enriched for specific binding motifs, and observed that TEs contributing to open chromatin had higher levels of sequence conservation. We also showed that thousands of ERV-derived sequences were activated in a cell type-specific manner, especially in embryonic and cancer cells, and we demonstrated that this activity was associated with cell type-specific expression of neighboring genes. Taken together, these results demonstrate that TEs, and in particular ERVs, have contributed hundreds of thousands of novel regulatory elements to the primate lineage and reshaped the human transcriptional landscape.Again, we see from this that a great number of ERVs are found within open chromatin, strongly suggesting they are transcribed into RNA. Junk-DNA advocates respond that this isn't enough to show function because the transcribed DNA might just be producing "junk RNA." That's a weak rebuttal. As the paper states, ERV-based RNA was produced "in a cell type-specific manner" and was "associated with cell type-specific expression of neighboring genes." They further report "Thousands of LTR/ERV sequences are activated in a cell type-specific manner" and ERV transcription is highly enriched compared to other parts of the genome:
(Pierre-�tienne Jacques, Justin Jeyakani, Guillaume Bourque, "The Majority of Primate-Specific Regulatory Sequences Are Derived from Transposable Elements," PLOS Genetics (May 9, 2013) (emphases added).)
For example, we observed that 1237 of the 2337 (52.9%) LTR7 repeat instances (a subfamily of the LTR/ERV class) were contributing to open chromatin in the human embryonic stem cell (ESC) line H7 when we would have only expected 60.5 (2.6%). This corresponds to a 20-fold enrichment and is highly significant (p<1.0E-100). We call such repeat subfamilies DHS-associated repeats (DARs) ... although LTR/ERV repeats constitute 13.5% of the repeat instances in the genome, they represent 25.0%, 54.6%, and 33.0% of the DAR instances in normal, embryonic, and cancer cells, respectively. ... LTR/ERV repeats have contributed a disproportionate fraction of cell type-specific accessible chromatin regions especially in embryonic and cancer cell lines. This is interesting given that network rewiring using ERV elements has already been described in ESCs [embryonic stem cells] and that it has been shown that stem cell potency fluctuates with endogenous retrovirus activity in mouse. ... Finally, we also reported that repeat subfamilies activated in a cell type-specific manner were also frequently associated with higher expression of neighboring genes.In other words, ERVs are being transcribed in a highly non-random manner that correlates with embryological patterns in association with other functional genetic elements. This decidedly points towards functionality.
Junk-DNA defenders rarely address such evidence showing that transcription is not random. For example, we see this in a recent paper in Genome Biology and Evolution by Brunet and Doolittle which critiques the PLOS Genetics paper. They write: "Jacques et al. (2013) for instance used ENCODE DNase I hypersensitivity data to place 80% of ERVs within open chromatin, and TEs in 63% of primate-speci?c regions." However, because specific functions for each ERV haven't yet been discovered they argue "It seems premature to imagine that a function for TEs as a class has been found!"
Their guilty-until-proven-innocent argument assumes that ERVs are by default non-functional. This reminds me of ENCODE critic Dan Graur who has stated, "If you don't know a function assume as a null hypothesis that it doesn't have function."
Worse, their approach ignores the specific empirical evidence raised in the PLOS Genetics paper, which shows ERVs aren't merely transcribed -- they are transcribed in a non-random, enriched, cell type-specific manner that correlates with the transcription of other functional genetic elements. This is the opposite of random or stochastic transcription and it strongly suggests function.
Dan Graur shares Brunet and Doolittle's dubious, assumption-filled approach. He claims that "transcription is fundamentally a stochastic [random] process," but as I explained previously in response to the ENCODE critics, ENCODE shows transcription isn't random:
ENCODE didn't merely study the genome to determine which DNA elements are biochemically active and making RNA. It also studied patterns of biochemical activity, uncovering highly non-random patterns of RNA production--patterns which indicate that these vast quantities of RNA transcripts aren't junk.... ENCODE's results suggest that a cell's type and functional role in an organism are critically influenced by complex and carefully orchestrated patterns of expression of RNAs inside that cell. As Stamatoyannopoulos observes, ENCODE found that "the majority of regulatory DNA regions are highly cell type-selective," and "the genomic landscape rapidly becomes crowded with regulatory DNA as the number of cell types" studied increases. Thus, as two pro-ENCODE biochemists explain, "Assertions that the observed transcription represents random noise . . . is more opinion than fact and difficult to reconcile with the exquisite precision of differential cell- and tissue-specific transcription in human cells."ERVs fit exactly the same pattern, strongly suggesting they are functional. If they are functional, then they don't provide any kind of special evidence for common ancestry, even if they do fit into a nested hierarchy. Here's my view of how evidence can count for or against (or neither for nor against) common ancestry:
- (1) Shared functional biological similarity that fits a nested hierarchy (i.e., a treelike pattern): Explained equally well by common design or common descent.
- (2) Shared non-functional biological similarity that fits a nested hierarchy: All things being equal, best explained by common descent.
- (3) Shared functional similarity that doesn't fit a nested hierarchy: Best explained by common design.
Image: � stephanvorster / Dollar Photo Club.