Revisiting an Old Chestnut: Retroviruses and Common Descent (Updated)
One common argument for common descent which one hears very frequently in the evolutionary literature concerns the placement of endogenous retroviruses (ERVs) in orthologous loci in primate genomes. By far the most clear and succinct presentation of the strongest version of this argument which I have encountered can be found in this popular-level internet article. The article sets out three layers of evidence, based on ERV elements, which, it proposes, serve to confirm the evolutionary model of common descent. According to the article, these layers of evidence are:
1. The presence of ERVs in orthologous loci among species of various degrees of taxonomic separation, and of the nested hierarchies they fall into
2. The comparative degrees of LTR-LTR discontinuity among orthologous full-length ERVs
3. Shared mutations among orthologous ERVs and the corroboratory nested hierarchies they fall into
I want to take some time to examine those three layers of evidence in a series of three blog posts. But, first, let us take a step back and explore some background.
Take a look at the following video:
As a consequence of the repair by the host cell's DNA repair proteins of single-strand gaps at each end of the inserted sequence formed upon element insertion, an integration is marked and characterised by a duplication of the target site. This is how we establish that these sequences are, indeed, inserts.
It is clear that, for an ERV insert to be retained by progeny, the retrovirus must infect the germ cells. And, indeed, it has been documented that the germline is a target for retroviruses SIV and SHIV in juvenile macaques. Moreover, these elements certainly look strikingly like retroviral elements. For example, they possess three markedly retroviral genes called gag, pol and env (which code for viral proteins including the viral matrix, capsid, nucleoproteins, reverse transcriptase, integrase, and the envelope protein). Some ID proponents have argued that, in spite of this striking appearance, the structure of the germ-cell production system makes it extremely unlikely that the inserts were accomplished by viruses, and thus these hitherto-thought-to-be ERV sequences are not, in fact, viral in origin at all. This is an interesting argument. It does seem somewhat unlikely that retroviral elements would be able to insert themselves literally hundreds of thousands of times into the germ cells without causing fatalistic damage to the host organism. Moreover, it seems far more likely than not that a virally-infected sperm cell would be substantially less fit and hence out-competed in the race to the egg. And the process of apoptosis (programmed cell death) would be likely to eliminate virally-infected sex cells. While such a successful germline insertion might be inherited under very rare circumstances, its occurrence literally hundreds of thousands of times seems highly unlikely.
Over the last decade or two, a myriad of different functions have been identified for components of ERVs. For example, the long terminal repeats (LTRs), which occur at the 5' and 3' ends of the retroviral sequence, are known to contribute to the host organism's promoters (Dunn et al., 2005). Conley et al. (2008) also report that,
Our analysis revealed that retroviral sequences in the human genome encode tens-of-thousands of active promoters; transcribed ERV sequences correspond to 1.16% of the human genome sequence and PET tags that capture transcripts initiated from ERVs cover 22.4% of the genome. These data suggest that ERVs may regulate human transcription on a large scale.
Indeed, ERV LTRs have even helped to shape the tumour-suppressor protein, p53 (aka the "guardian-of-the-genome"), as documented by Wang et al. (2007).
Kigami et al. (2003) also report that MuERV-L is one of the earliest transcribed genes in mouse one-cell embryos. In fact, knocking out the sequence's expression causes embryogenesis to grind to a halt at the four-cell stage.
There is also now a growing wealth of literature documenting the role of ERVs in conferring immunity to their host from infection by exogenous retroviruses (see, for example, Malik and Henikoff, 2005).
One particularly remarkable incidence of functionality with regards these sequences is the involvement of the highly fusogenic retroviral envelope proteins (the syncytins) which are known to be crucially involved in the formation of the placenta syncytiotrophoblast layer generated by trophoblast cell fusion. These proteins are absolutely critical for placental development in humans and mice. The different kinds of Syncytin protein are called "syncytin-A" and "syncytin-B" (found in mice); "syncytin-1" and "syncytin-2" (found in humans). But here's the remarkable thing: Although serving exactly the same function, syncytin-A and syncytin-B are not related to syncytin-1 and syncytin-2. Syncytin protein also plays the same function in rabbits (syncytin-ory1). But rabbit syncytin is not related to either the mouse or the human form. These ERVs are not even on the same chromosome. Syncytin-1 is on chromosome 7; syncytin-2 is on chromosome 6; syncytin-A is on chromosome 5; and syncytin-B is on chromosome 14.
Indeed, Dupressoir et al. (2005) report that
Together, these data strongly argue for a critical role of syncytin-A and -B in murine syncytiotrophoblast formation, thus unraveling a rather unique situation where two pairs of endogenous retroviruses, independently acquired by the primate and rodent lineages, would have been positively selected for a convergent physiological role. [emphasis added]
This is a remarkable case of convergent evolution, of a kind which is highly unlikely to have occurred by Darwinian means.
But be that as it may. Briefly, the argument made in the cited article is this: In addition to the placement of ERV sequences in orthologous loci (and its pertinent nested hierarchical pattern), the article's assessment also takes into consideration the shared mutations among orthologous ERVs which, we are told, also fall into very similar nested hierarchies. Since mutation and ERV placement are independent factors, it is argued that this only makes sense when viewed in the light of descent. Moreover, the comparative degrees of LTR-LTR discontinuity among orthologous ERVs are argued to be implicative of the common descent model. The long-terminal-repeat (LTR) sequences must be identical upon insertion -- copies of retroviral promoters are polymerised during reverse transcription (the retroviral promoter is not transcribed into mRNA, so the ERV would otherwise lose its promoter). There are identical repeats on either terminus of the retroviral genome. During reverse transcription, the tRNA primer detaches and the DNA repeat hybridizes with the remaining RNA repeat at the genome's 3' terminus. These repeats need to be identical, otherwise they cannot hybridise. A copy of the 3' unique and 5' unique sections is polymerised on the opposite terminus. This is what forms the long-terminal-repeats (LTRs), which also have to be identical at the time of integration. Since LTRs are identical upon reverse transcription and integration, greater mutational divergence (common ancestry being true) ought to correlate with an older insertion.
In my next blog post, I will seek to address the first of the three arguments for the shared ancestry of primates offered in the article.