Treasure in the Genetic Goldmine: PZ Myers Fails on “Junk DNA”

Readers may recall my encounter with developmental biology professor PZ Myers earlier this year. In that brief interaction, I came to appreciate Myers’s ability to charm his adoring fans and followers irrespective of the scientific robustness of his claims, or the accuracy with which he represents the views of those with whom he disagrees.

At the recent Skepticon 4 conference, Myers presented a lecture titled “Rummaging About in the Genetic Junkyard.” He argued for the view — which he also regularly espouses on his blog — that the preponderance of our genome is non-functional. This claim is advocated by many Darwinian thinkers (including Francis Collins, Ken Miller, Francisco Ayala, Darrel Falk, Richard Dawkins and Jerry Coyne) to provide a case against the notion of design, since it is to be expected that any competent engineer would not leave our chromosomes replete with nonsensical sequences. Jonathan Wells recently wrote an excellent book on the topic called The Myth of Junk DNA. At the risk of flogging a dead horse, let us turn to the substance of Myers’s presentation.

What is the Function of Introns?

Myers’s argument begins with a description of the structure of a gene and mRNA processing. As he notes, a eukaryotic pre-mRNA transcript possesses both protein-coding regions (called “exons”) and non-protein-coding regions (called “introns”). The introns of higher eukaryotes can often be very long indeed — in many cases spanning hundreds of thousands or even millions of bases. Lower eukaryotes (e.g. yeast) tend to have shorter and fewer introns. A protein complex called the “spliceosome” is responsible for removing the introns from the pre-mRNA transcript, and another enzyme (called RNA ligase) pastes together all of the exons. Introns are marked for removal by a specific consensus sequence (GT at the 3′ end or AG at the 5′ end). In addition, the 5′ cap and the 3′ poly-A tail are added to form the processed mRNA. Myers thinks that this is “wasteful” since introns can typically account for 90% of the length of the pre-mRNA transcript.

But is Myers correct on this point? Are introns useless, without functional significance? The answer is no and again no. For one thing, at least some introns exhibit high levels of sequence conservation, which suggests that they possess some kind of function (e.g.Sun et al., 2000; Sorek and Ast, 2003). We also now know that introns play an essential role in alternative splicing. Alternative splicing allows the exons of pre-mRNA transcript to be spliced into a number of different isoforms to produce multiple proteins from the same transcript. As evolutionary biologist Richard Sternberg observes,

It is the presence of introns that makes this permutative expansion of messenger RNAs possible in the first place.
[…]
Someone could argue that the sequences directing alternative splicing are in the protein-coding regions of the RNA. That is, one could argue that while introns do indeed make splicing possible, they are merely “junk” fillers, with the exons indicating where and how the spliceosome is to do its cutting and pasting. Yet such an argument would be false. In order for alternative splicing to work properly, it is necessary not only that exons be demarcated from introns, but also that the splicing process be correctly modulated. And introns contribute significantly to such modulation.

Indeed, Sorek and Ast (2003) document that the majority of alternatively spliced exons in humans and mice were flanked by introns, the sequences of which were extremely highly conserved, implying an important role in the process of alternative splicing. Minovitsky et al. (2005) have also documented that a six-nucleotide intronic sequence is “frequently located adjacent to tissue-specific alternative exons in the human genome.” Across taxa as widely divergent as dogs, rats, chickens, mice and humans, this sequence exhibits conservation. These features, according to the paper, “mark it as a critical component of splicing switch mechanism(s) designed to activate a limited repertoire of splicing events in cell type-specific patterns.” In addition to this, direct evidence has been documented showing that introns contain codes involved in the regulation of alternative splicing (Ladd and Cooper, 2002; Hui et al., 2005; Helder et al., 2007; Hastings et al., 2001; Wagner et al., 2005).

Furthermore, as documented by Hoeppner et al. (2009), the genes for the majority of non-translated microRNAs (which are required for the expression of mRNAs during development) and small nucleolar RNAs (which play an important role in the processing of ribosomal RNAs) occur in introns. Berezikov et al. (2007) note that,

Mirtrons are alternative precursors for microRNA biogenesis that were recently described in invertebrates. These short hairpin introns use splicing to bypass Drosha cleavage, which is otherwise essential for the generation of canonical animal microRNAs. Using computational and experimental strategies, we now establish that mammals have mirtrons as well.

Qi et al. (2010) also report that,

314 miRNAs (51%) were located in introns of protein-coding genes, 231 miRNAs (38%) were located in intergenic regions, while only 43 miRNAs (7%) were located in exons of noncoding RNAs or UTR of protein coding genes. Interestedly, 23 miRNAs (4%) were located in either an exon or an intron depending on alternative splicing of the host transcript.

It has also been established that the promoter sequences for those RNA genes occur in introns (Monteys et al., 2010).

Furthermore, Mondai et al. (2007) show that as many as 52% of transcripts for chromatin-associated RNA, which plays an important role in the structural organization of nuclear chromatin (Rodríguez-Campos and Azorín, 2007), are found in introns.

But there’s more. Swinburne and Silver (2007) show that the length of introns (and, consequently, the time taken to transcribe them) can contribute to timing mechanisms during development.

I could continue in much the same vein for a long time. But enough has been said to indicate that Myers’s dismissal of intronic sequences as functionless junk, without reviewing the literature to the contrary, is disingenuous. For further information on the functional importance of introns, I refer readers to chapter 4 of The Myth of Junk DNA. Let us press on to PZ Myers’s next argument.

Telomeric Repetitive Elements

Myers progresses into a discussion of chromosome structure and, in particular, the presence of telomeric repetitive elements at the end of chromosomes. As Myers correctly notes, in many organisms, chromosomal telomeres possess the six base-pair sequence TTAGGG repeated over and over approximately fifty to one hundred times in tandem. In some organisms (such as Drosophila), however, the situation is quite different.

He also correctly observes that, in linear chromosomes, the DNA polymerase enzymes are unable to replicate right to the end of the chromosome. This is because the enzymes are unable to replace the lagging strand’s terminal RNA primer. The enzyme telomerase thus sticks on an additional repetitive DNA motif to the ends of the telomeres to prevent them from becoming shorter with each round of replication.
Myers departs from the facts, however, when he asserts that these telomeric repetitive elements are non-functional. As evolutionary biologist

Richard Sternberg explains,

How, precisely, are miles and miles of TTAGGG of significance? From the standpoint of chromosome architecture, the repetitive elements en masse have the propensity to form complicated topologies such as quadruplex DNA. These sequences or, rather, topographies are also bound by a host of chromatin proteins and particular RNAs to generate a unique “suborganelle” — for lack of a better term — at each end. As a matter of fact, the chromatin organization of telomeres can silence genes and has been linked to epigenetic modes of inheritance in yeast and fruit flies. Furthermore, different classes of transcripts emanate from telomeres and their flanking repetitive DNA regions, which are involved in various and sundry cellular and developmental operations.

One interesting paper in this regard (Mirabella and Gartenberg, 1996) discusses the role of telomeric sequences in yeast as chromosomal anchorage points. They suggest “that telomeric DNA supports the formation of a SIR-independent macromolecular protein-DNA assembly that hinders the motion of DNA because of its linkage to an insoluble nuclear structure.”
Indeed, there is now evidence to suggest that these telomeric repetitive elements are synthesized, by the telomerase enzyme, in the middle of chromosomes (called interstitial telomeric sequences), and serve functional roles such as recombination hotspots (Farré et al., 2009).

Functions of Transposable Elements

Myers goes on to talk about transposable elements. He claims that the copy-and-paste elements known as the long and short interspersed nuclear elements (LiNEs and SINEs respectively), both of which are retroposons, are nothing more than selfish genetic elements. He also mentions endogenous retroviruses (ERVs) and genetic (“cut-and-paste”) transposons. But Myers doesn’t mention the many important functions that have been ascribed to these elements.

Over the last decade or two, a myriad of different functions have been identified for components of ERVs. For example, the long terminal repeats (LTRs), which occur at the 5′ and 3′ ends of the retroviral sequence, are known to contribute to the host organism’s promoters (Dunn et al., 2005). Conley et al. (2008) also report that,

Our analysis revealed that retroviral sequences in the human genome encode tens-of-thousands of active promoters; transcribed ERV sequences correspond to 1.16% of the human genome sequence and PET tags that capture transcripts initiated from ERVs cover 22.4% of the genome. These data suggest that ERVs may regulate human transcription on a large scale.

Indeed, ERV LTRs have even helped to shape the tumor-suppressor protein p53 (aka the “guardian-of-the-genome”), as documented by Wang et al. (2007).

Kigami et al. (2003) also report that MuERV-L is one of the earliest transcribed genes in mouse one-cell embryos. In fact, knocking out the sequence’s expression causes embryogenesis to grind to a halt at the four-cell stage.

There is also now a growing wealth of literature documenting the role of ERVs in conferring immunity to their host from infection by exogenous retroviruses (see, for example, Malik and Henikoff, 2005).

One particularly remarkable incidence of functionality with regard to these sequences is the involvement of the highly fusogenic retroviral envelope proteins (the syncytins) which are known to be crucially involved in the formation of the placenta syncytiotrophoblast layer generated by trophoblast cell fusion. These proteins are absolutely critical for placental development in humans and mice. The different kinds of Syncytin protein are called “syncytin-A” and “syncytin-B” (found in mice); “syncytin-1? and “syncytin-2? (found in humans). But here’s the remarkable thing: Although serving exactly the same function, syncytin-A and syncytin-B are not related to syncytin-1 and syncytin-2. Syncytin protein also plays the same function in rabbits (syncytin-ory1). But rabbit syncytin is not related to either the mouse or the human form. These ERVs are not even on the same chromosome. Syncytin-1 is on chromosome 7; syncytin-2 is on chromosome 6; syncytin-A is on chromosome 5; and syncytin-B is on chromosome 14.

Indeed, Dupressoir et al. (2005) report that

Together, these data strongly argue for a critical role of syncytin-A and -B in murine syncytiotrophoblast formation, thus unraveling a rather unique situation where two pairs of endogenous retroviruses, independently acquired by the primate and rodent lineages, would have been positively selected for a convergent physiological role. [emphasis added]

This is a remarkable case of convergent evolution, of a kind that is highly unlikely to have occurred by Darwinian means.

In addition, a plethora of functions have been identified for various classes of transposable element (e.g. Britten et al., 2004; Shapiro and Sternberg, 2005). Indeed, one recent Genome Research paper documented a global genomic function for the mouse B1 SINE — the analogue of Alus — noting that B1 SINEs have “potent intrinsic insulator activity in cultured cells and live animals,” (Román et al., 2011).

Moreover, as Stephen C. Meyer notes here as part of a response to Francisco Ayala on this topic,

In general, SINEs (and thus Alus) allow genetic information to be retrieved in multiple different ways from the same DNA data files depending on the specific needs of different cell types or tissues (in different species-specific contexts). In particular, Alu sequences perform many taxon-specific lower-level genomic formatting functions such as: (1) providing alternative start sites for promoter modules in gene expression-somewhat like sectoring on a hard drive (Faulkner et al., 2009; Faulkner and Carninci, 2009); (2) suppressing or “silencing” RNA transcription (Trujillo et al., 2006); (3) dynamically partitioning one gene file from another on the chromosome (Lunyak et al., 2007); (4) providing DNA nodes for signal transduction pathways or binding sites for hormone receptors (Jacobsen et al., 2009; Laperriere et al., 2004); (5) encoding RNAs that modulate transcription (Allen et al., 2004; Espinoza et al., 2004; Walters et al., 2009); and (6) encoding or regulating microRNAs (Gu et al., 2009; Lehnert et al., 2009).

In addition to these lower-level genomic formatting functions, SINEs (including Alus) also perform species-specific higher-level genomic formatting functions such as: (1) modulating the chromatin of classes of GC-rich housekeeping and signal transduction genes (Grover et al., 2003, 2004; Oei et al., 2004; see also Eller et al., 2007); (2) “bar coding” particular segments for chromatin looping between promoter and enhancer elements (Ford and Thanos, 2010); (3) augmenting recombination in sequences where Alus occur (Witherspoon et al., 2009); and (4) assisting in the formation of three-dimensional chromosome territories or “compartments” in the nucleus (Kaplan et al., 1993; see also Pai and Engelke, 2010).

Moreover, Alu sequences also specify many species-specific RNA codes. In particular, they provide: (1) signals for alternative RNA splicing (i.e., they generate multiple messenger RNAs from the same type of precursor transcript) (Gal-Mark et al., 2008; Lei and Vorechovsky, 2005; Lev-Maor et al., 2008) and (2) alternative open-reading frames (exons) (Lev-Maor et al., 2007; Lin et al., 2008; Schwartz et al., 2009). Alu sequences also (3) specify the retention of select RNAs in the nucleus to silence expression (Chen et al., 2008; Walters et al., 2009); (4) regulate the RNA polymerase II machinery during transcription (Mariner et al., 2008; Yakovchuk et al., 2009; Walters et al., 2009); and (5) provide sites for Adenine-to-Inosine RNA editing, a function that is essential for both human development and species-specific brain development (Walters et al., 2009).

Contrary to Ayala’s claim, Alu sequences (and other mammalian SINEs) are not distributed randomly but instead manifest a similar “bar-code” distribution pattern along their chromosomes (Chen and Manuelidis, 1989; Gibbs et al., 2004; Korenberg and Rykowski, 1988). Rather like the distribution of the backslashes, semi-colons and spaces involved in the formatting of software code, the “bar-code” distribution of Alu sequences (and other SINEs) reflects a clear functional logic, not sloppy editing or random mutational insertions. For example, Alu sequences are preferentially located in and around protein-coding genes as befits their role in regulating gene expression (Tsirigos and Rigoutsos, 2009). They occur mainly in promoter regions-the start sites for RNA production-and in introns, the segments that break up the protein-coding stretches. Outside of these areas, the numbers of Alu sequences sharply decline. Further, we now know that Alu sequences are directed to (or spliced into) certain preferential hotspots in the genome by the protein complexes or the “integrative machinery” of the cell’s information processing system (Levy et al., 2010). This directed distribution of Alu sequences enhances the semantic and syntactical organization of human DNA. It appears to have little to do with the occurrence of random insertional mutations, contrary to the implication of Ayala’s “sloppy editor” illustration and argument.

A full listing of references can, of course, be found in the original article.

Summary and Conclusion

PZ Myers’s one additional point is his reference to T. Ryan Gregory’s “onion test.” But since I have already responded in detail to the “onion test,” I see no need to do so again here, and I instead refer readers to my previous article for my thoughts on it. What was of particular concern for me was to see so many people in the audience applauding PZ, apparently lapping up his every word. I doubt that many will bother to check his facts for themselves by close inspection of the primary literature. Myers began his presentation by comparing his lecture to the pounding of a “creationist mouse” with a sledgehammer. However, as we have seen from this brief literature review, on the issue of “Junk DNA,” it is PZ Myers’s own mouse that has been subjected to a good pounding.