Plethora of Recent Papers Finds Function for Non-Coding DNA
One of the greatest movies ever made (in my humble opinion) was Three Amigos, which introduced pop-culture to the word "plethora."
Well, when it comes to recent papers that have found function for non-coding DNA, a plethora is exactly what we have. Back in September the ENCODE project released its important findings, through a set of 30 papers, that the vast majority of the genome is transcribed, strongly suggesting function. But ENCODE is by no means the only recent research project to have discovered function for non-coding DNA. An article last month at Medical Daily, "Researchers Show How Junk DNA Influences Face Development," reports that "Researchers from Stanford School of medicine say that 'enhancers' present in the junk DNA follow an origami-like design while constructing the face, using simple instructions to make an intricate object." Another news article on the Y chromosome stated, "Despite the claim that this male sex chromosome is mostly junk, new research suggests it's actually a lean, mean, highly evolved machine for producing the fittest males possible." And yet another article at Medical Xpress, titled "'Junk DNA' drives embryonic development," states:
MicroRNAs are small pieces of genetic material similar to the messenger RNA that carries protein-encoding recipes from a cell's genome out to the protein-building machinery in the cytoplasm. Only microRNAs don't encode proteins. So, for many years, scientists dismissed the regions of the genome that encode these small, non-protein coding RNAs as "junk." We now know that microRNAs are far from junk. They may not encode their own proteins, but they do bind messenger RNA, preventing their encoded proteins from being constructed. In this way, microRNAs play important roles in determining which proteins are produced (or not produced) at a given time.According to the article, "microRNAs are powerful regulators of embryonic cell fate."
Jonathan M. recently reported on a new paper in Genome Biology which studied large RNA molecules produced by stretches of the genome between genes, called long intergenic noncoding RNAs (lincRNAs). According to the paper, these lincRNAs "constitute an important layer of genome regulation across a wide spectrum of species." The researchers studied over 9000 human lincRNAs and found that over 80% contain transposable elements (TEs) -- the kind of repetitive DNA that is typically thought to be selfish "junk" DNA. The paper characterized the typical view of TEs as follows:
One important method by which the genome, including lncRNA sequence, evolves is transposable element (TE) insertions. TEs are nucleic acid sequences capable of inserting into genomic DNA that are typically considered "selfish" genomic parasites and have conquered 45-65% of the human genome.Rather than being mere "'selfish genomic parasites," TEs play an "extensive" role in gene regulation:
In a few known cases, TE proteins required for transposition have seeded novel genes in the host genome. More often, TEs influence transcriptional regulatory networks. For example, TE promoters, particularly the long terminal repeats (LTRs) of endogenous retroviruses (ERVs), initiate transcription at some protein coding genes, typically as alternative promoters. Further, TEs have shaped gene regulation by distributing transcription factor binding sites, spawning enhancers, and possibly by composing highly conserved noncoding regions. In addition to proteins, ncRNA genes, particularly microRNAs, can be derived from TEs. Post-transcriptionally, Alu elements (and potentially other TEs) harbor splicing signals, and insertions in protein coding genes have created new splice sites and exons. Taken together, these studies demonstrate extensive shaping of gene regulatory networks by TE insertions.The paper reported that another function of TEs is that they play a major role in forming the sequence of lincRNAs and also governing their sequence transcription: "There are many thousands of lincRNA transcripts encoded in the human genome that play critical functional roles across a spectrum of cellular processes." It further reports on the role of TEs in forming lincRNAs:
(David R. Kelley and John L. Rinn, "Transposable elements reveal a stem cell specific class of long noncoding RNAs," Genome Biology, 13:R107 (2012).)
Here we comprehensively characterize the TE composition of long intergenic noncoding RNAs (lincRNAs) and their functional relationships in the human genome. We find that lincRNAs contain TEs at a far greater rate than protein coding genes and are highly enriched for ERVs and depleted of LINEs and SINEs. TEs have position and orientation preferences in lincRNAs, including a frequent association of LTRs with lincRNA transcription start sites (TSSs) that suggests a role in the genes' origins. In a number of intriguing cases, TE content correlated with lincRNA expression properties. Strikingly, lincRNAs containing HERVH elements exhibit a stem cell specific expression pattern. These results demonstrate that lincRNAs have nonrandom composition of TEs that strongly correlates with their functional and regulatory properties, suggesting a mechanism for malleable evolution of lincRNAs.Of course when these authors say "evolution," they simply mean that they assume that these functional lincRNAs arose by random mutation, and TEs played a role in those mutations. But this is inference based upon evolutionary assumptions, not hard data. The only hard data they have actually found is (1) long intergenic noncoding RNAs are prevalent throughout the genome, and (2) common DNA motifs they call "transposable elements" are common in those lincRNAs and play an important role in their function. In other words, the hard data point towards purpose and function. The notion that these genetic elements have an evolutionary origin via, as the paper puts it, "stochastic" transposable elements, is as I said, mere inference.
More Function for Pseudogenes
But perhaps the most interesting recent non-Encode paper published on "junk" DNA came out in September in the journal Science Signaling, as it recounts the extensive evidence for functions of pseudogenes that are being discovered. The paper reports that "Because they are generally noncoding and thus considered nonfunctional and unimportant, pseudogenes have long been neglected," but then notes:
Recent advances have established that the DNA of a pseudogene, the RNA transcribed from a pseudogene, or the protein translated from a pseudogene can have multiple, diverse functions and that these functions can affect not only their parental genes but also unrelated genes. Therefore, pseudogenes have emerged as a previously unappreciated class of sophisticated modulators of gene expression, with a multifaceted involvement in the pathogenesis of human cancer.The paper presents three main lines of evidence for pseudogene function:
(Laura Poliseno, "Pseudogenes: Newly Discovered Players in Human Cancer," Science Signaling, 5 (242) (September 18, 2012).)
- First, "The number of pseudogenes present in the genomes of multicellular organisms is much higher than that present in the genomes of unicellular organisms, such as bacteria, and contributes to the larger size of the genomes of multicellular organisms as compared to those of unicellular organisms. The human genome represents an extreme case in this regard; indeed, the number of human pseudogenes (~18,000) is close to that of the protein-coding genes (20,000 to 30,000). Thus, the noncoding genome, to which pseudogenes belong, may serve as a repository of at least a portion of the information underlying highly complex systems."
- Second pseudogene sequences are conserved, suggesting "pseudogene identity is preserved by selective pressure." According to the paper, "The existence of conserved processed pseudogenes that are transcribed irrespective of their position in the genome suggests that they are maintained to exert a specific role."
- Third, "Consistent with the notion that they exert biological functions, the expression of pseudogenes is a regulated process." Indeed, "various pseudogenes show a spatiotemporal expression pattern distinct from that of their coding counterparts," suggesting they are playing functional roles.
After reviewing the evidence for pseudogene function, the paper offers a striking conclusion:
They might be "pseudo" genes because they do not encode a protein or because they encode a protein that does not function in the same way as that encoded by their cognate genes. Nonetheless, they are not functionally disabled but can perform different functions than their parental gene counterparts.If they perform important functions, why haven't we discovered them? There are two reasons. First, "pseudogenes have long been dismissed as junk DNA," meaning they were ignored. Second, because of this dismissal, we have not developed technologies to study them. Nonetheless, the authors of the paper are hopeful that the development of new techniques will lead to discoveries of more functional pseudogenes:
Discovered in 1977, pseudogenes were dismissed as "junk DNA." They were not formally studied; indeed, extensive effort went into developing strategies to avoid their unintended detection ... Not having technologies designed with pseudogenes in mind is one of the reasons why the exact fraction of transcribed human pseudogenes is still unknown. ... The fact that pseudogenes have long been dismissed as junk DNA has profoundly affected the study of their parental genes, as well as of pseudogenes themselves. As far as parental genes are concerned, in some cases the presence of pseudogenes has been acknowledged and their inadvertent detection has been avoided. ... side, the development of specific tools for the detection of pseudogenic RNAs and proteins, as well as for the study of the epigenetic modifications of their promoters, has been greatly delayed. ... The examples discussed in this Review indicate that pseudogenes can play a pivotal and multifaceted role in human cancer and, indeed, offer a strong rationale for the further development of techniques suitable for their systematic study.While there's still much we don't know about the genome (including pseudogenes and TEs), the trend in current research points clearly in one direction: the more we study the genome, the more we detect function for non-coding DNA. Yet the now-dubious "junk-DNA" paradigm was born and bred inside the evolutionary paradigm based upon the idea that our genome was built through random "stochastic" mutations. As time goes on, that paradigm is being forced to retreat into the gaps in our knowledge.