Unitary Pseudogenes and RNA Editing
Broken genes that appear to have been disabled by mutations have long been the flagship of many a critique of intelligent design. Why would an intelligent designer litter our genomes with nonsensical "junk"? The presence of some vestigial DNA is of course not at all incompatible with classical conceptions of design (genetic mutation is, after all, rather good at breaking things). It seems somewhat unlikely to ID theorists, however, that the great majority of our DNA is without function. Given the rapid expansion in our understanding of the non-coding regions of DNA over the last decade or so, the argument made against ID by the likes of Francis Collins and Karl Giberson has been considerably weakened.
In a previous article, I argued that pseudogenes could be rendered functional by post-transcriptional RNA editing. I used the specific example of the vitamin C GULO gene and suggested the possibility that the human GULO pseudogene may be functional in utero but subsequently turned off. Such a hypothesis requires that GULO produce an mRNA transcript. I had consulted the Ensembl Genome Database regarding the GULO pseudogene in humans, and that database reported that it produces a transcript but no known protein product.
Upon further investigation, however, I've discovered that the Ensembl database appears to be inaccurate on that point, and it's not confirmed that the GULO pseudogene produces a transcript (indeed, clicking on "Supporting evidence," one finds that there is "No Transcript supporting evidence for this transcript"). Part of the reason for this is that the GULO pseudogene lacks a canonical promoter. However, that doesn't necessarily mean this pseudogene produces no RNA transcript. Many metazoan loci possess non-canonical promoters that, moreover, can be millions of base pairs upstream of annotated exons (e.g., see Manak et al., 2006). A further complication with the proposed hypothesis is that some exons are absent from the GULO pseudogene, and it's not entirely clear to me how they could be created by RNA editing. While my original hypothesis is probably incorrect with respect to this particular pseudogene, it remains possible that the human GULO pseudogene yields RNAs that perform some other function in the cell.
Moreover, there's a much more important point to take note of, which I hope is not lost in this discussion -- namely that my original hypothesis could be more generally applicable. What I proposed might be happening in the GULO pseudogene could very well be happening in other unitary pseudogenes. Unitary pseudogenes are genes that have been disabled by mutations such as frame shift mutations, deletions, or point mutations resulting in premature stop codons.
RNA editing is able to expunge premature stop codons from an mRNA transcript (e.g., Grewe et al., 2011); and it is also able to correct frame shifts created by indel mutations. Probably the most dramatic example of this is the mitochondrial proteins of the sleeping sickness parasite Trypanosoma brucei (Simpson et al., 2003). In the case of cytochrome c oxidase subunit III, the gene is completely scrambled, and no less than 58 percent of the mRNA transcript is "written" by RNA editing, involving the addition of 347 uridines to 121 sites and the removal of 16 uridines from 7 sites (Feagin et al., 1988). Without this extensive editing, the transcript cannot be translated into a functioning protein.
Addition and deletion of uridine nucleotides is mediated by molecules called guide RNAs (approximately 20-50 nucleotides in length), which are complementary to the fully edited transcript. The guide RNAs base pair with part of the mRNA transcript to be edited. A large protein complex called an editosome then envelops the newly formed duplex RNA, opens the transcript at the first mismatched base pairing and inserts uridines. Often multiple guide RNAs and editosomes are required to edit a complete transcript. The mechanism by which this process takes place is truly remarkable -- but that's a topic for another day. For readers seeking more detail, I recommend reading Golas et al. (2009) and Leung and Koslowsky (2001).
This form of editing involving uridine insertion and deletion is restricted almost entirely to the mitochondrial RNAs of single-celled eukaryotes. Likewise, cytidine insertion has so far only been found in the mitochondrial RNAs of slime molds (e.g., Physarum polycephalum) (Mahendran et al., 1991).
Altered reading frames caused by indels, however, can also be corrected by programmed translational frameshifting, as has been documented for instance in turtle mitochondrial RNAs, and the mitochondrial RNAs of the shellfish pathogen genus Perkinsus (Zhang et al., 2011; Masuda et al., 2010; Russell and Beckenbach, 2008). This process is mechanistically diverse and quite widely distributed, and is induced by sequences known as "programmed frameshift sites" (e.g., see Farabaugh, 1996). In yeast, for example, the transcript encoding EST3 (one of the subunits of the yeast telomerase) undergoes +1 ribosomal frameshifting. This is essential for telomere replication, as indicated by the fact that mutating the gene such that it can no longer frameshift results in "a strain with the same phenotype as an est3 null mutant" (Morris and Lundblad, 1997).
RNA editing is not limited to single-celled protozoans. Indeed, deamination of adenonsine to form inosine which mimics guanosine (A-to-I editing), and deamination of cytidine to form uridine (C-to-U editing), are found in retroviruses, the organelles of various eukaryotes and the nuclei of metazoans (Zhou et al., 2013; Chateigner-Boutin and Small, 2010; Cho et al., 2007; Bishop et al., 2004; Yu and Schuster, 1995). These deaminations are catalyzed by adenosine deaminase and cytosine deaminase respectively.
As you can see, RNA editing is particularly prevalent in mitochondrial RNAs and especially among protozoans. But just how widespread is the phenomenon of RNA editing in the mammalian transcriptome and what are its implications as far as unitary pseudogenes are concerned? One might object that I am making an unjustified extrapolation in arguing that this phenomenon likely has important significance regarding pseudogenes. In fact, however, it is becoming increasingly evident that even mammalian transcripts cannot be regarded simply as mirrors of the DNA sequence. An interesting paper was published a few years back in PLoS Genetics entitled "Pseudo-Messenger RNA: Phantoms of the Transcriptome" (Frith et al., 2006). A survey of data from the FANTOM (Functional Annotation Of Mammalian genome) project, comprising over 100,000 full-length RNA sequences produced from the mouse genome, identified many transcripts that "do not correspond cleanly to any identifiable object in the genome, implying fundamental limits to the goal of annotating all functional elements at the genome sequence level." The paper concluded:
[T]he transcriptome can no longer be regarded as a simple mirror of the genome, or a redundant layer between genome and proteome. This was already indicated by the high incidence of alternative splicing, but the non-standard transcripts surveyed here provide even more compelling examples. The recently developed field of systems biology stresses the complex emergent behaviour that lies between genotype and phenotype: this complexity begins with the transcriptome.
Of course, this paper was not without its critics. You can find a response to the work, as well as a rebuttal from the authors, here.
RNA editing is now recognized to be of particular importance in the brain (e.g., Dillman et al., 2013; Savva et al., 2011; da Silva et al., 2010). In humans, for instance, two alternative spliced variants of the mRNA encoding tryptophan hydroxylase 2 (TPH2), expressed primarily in the serotonergic neurons of the brain, undergo extensive and mutually exclusive editing and in fact "imbalanced RNA editing might be involved in the pathogenesis of psychiatric disorders" (Grohmann et al., 2010).
I could go on in a similar vein for some time. The bottom line is that RNA editing can determine how genes are expressed in a time and tissue specific manner. These data challenge the common assumption that frame shift mutations and premature stop codons, and other supposed hallmarks of pseudogenes, are necessarily diagnostic of a lack of protein-coding capacity. Of course, pseudogenes can also serve manifold functions besides coding for proteins (e.g., see Chan et al., 2013; Wen et al., 2012; Pei et al., 2012; Pink et al., 2011; Franco-Zorrilla et al., 2007; Balakirev and Ayala, 2003).
For readers interested in further detail on the subject of RNA editing, I recommend Harold C. Smith's 2008 book, RNA and DNA Editing: Molecular Mechanisms and Their Integration into Biological Systems. For an overview of the history of RNA editing research, I recommend a recent paper from the Cellular and Life Sciences journal by Volker Knoop, "When you can't trust the DNA: RNA editing changes transcript sequences."