Simple Logic (and the Data) Refute PZ Myers on 'Junk DNA' (Updated)
A few weeks ago, PZ Myers commented on so-called 'Junk DNA'. Under the headline, 'Junk DNA is still junk', Myers wrote:
The ENCODE project made a big splash a couple of years ago -- it is a huge project to not only ask what the sequence of a strand of human DNA was, but to analyzed and annotate and try to figure out what it was doing. One of the very surprising results was that in the sections of DNA analyzed, almost all of the DNA was transcribed into RNA, which sent the creationists and the popular press into unwarranted flutters of excitement that maybe all that junk DNA wasn't junk at all, if enzymes were busy copying it into RNA. This was an erroneous assumption; as John Timmer pointed out, the genome is a noisy place, and coupled with the observations that the transcripts were not evolutionarily conserved, it suggested that these were non-functional transcripts.
But is it accurate to say that apparently non-constrained elements are always functionless? Evolutionary conservation seems to be a legitimate reason for predicting functionality of genetic elements. As the reasoning goes, if a sequence of DNA performed no useful function, it would accumulate neutral mutations at random. If its sequence remains constant over time, however, this implies that some sort of stabilizing selection is occurring to preserve its function. Conservation of sequence implies function. This is sound reasoning.
However, why must non-conservation imply non-function? The simple rules of logic say the inverse need not be true. Consider the following:
If it's raining outside then the front lawn will be wet.
Now consider the inverse:
If it's not raining outside then the front lawn will not be wet.
Of course, this need not be true. For example, the lawn could be wet due to causes other than rain, such as sprinklers or a water fight on a hot summer day. The fact that it's not raining does not imply that the lawn must be dry.
So simple logic implies that PZ's reasoning could be flawed. But the most important question is, what do the data say? Should the supposition linking non-conserved elements with non-functionality be considered a watertight inference?
It turns out that leading researchers would also disagree with PZ.
A report on research by Kunarso et al. in Nature suggests: "Although sequence conservation has proven useful as a predictor of functional regulatory elements in the genome the observations by Kunarso et al. are a reminder that it is not justified to assume in turn that all functional regulatory elements show evidence of sequence constraint."
These are not isolated results. Another paper was published in 2007 by the ENCODE Project Consortium detailing the results from the ENCODE project, the public research consortium that has been seeking all the functional elements in the human genome.
This research similarly reported,
At the outset of the ENCODE Project, many believed that the broad collection of experimental data would nicely dovetail with the detailed evolutionary information derived from comparing multiple mammalian sequences to provide a neat 'dictionary' of conserved genomic elements, each with a growing annotation about their biochemical function(s). In one sense, this was achieved; the majority of constrained bases in the ENCODE regions are now associated with at least some experimentally derived information about function. However, we have also encountered a remarkable excess of experimentally identified functional elements lacking evolutionary constraint, and these cannot be dismissed for technical reasons. This is perhaps the biggest surprise of the pilot phase of the ENCODE Project, and suggests that we take a more "neutral" view of many of the functions conferred by the genome. (emphasis added)
And so, it seems, while evolutionary constraint may be used to infer function, there is no reason to think that the inverse is true --- that non-constraint implies non-function.
Commenting on what seems to be the almost universal transcription of DNA into RNA (according to the research of the ENCODE project), Myers further remarks,
Creationists thought it was wonderful. They detest the idea of junk DNA -- that the gods would scatter wasteful garbage throughout our precious genome by intent was unthinkable, so any hint that it might actually do something useful is enthusiastically seized upon as evidence of purposeful design.
By "creationist," Myers presumably means anyone who's skeptical of the traditional neo-Darwinian dysteleological understanding of the mechanics of evolution. I find this labeling to be misleading at best. But let's leave that aside.
Myers fails to accurately represent the position and predictions of ID (and Darwinism) regarding so-called "junk DNA." It is a prediction of Darwinian (and most dysteleological) thought that the genome should exhibit islands of meaning in a vast ocean of non-meaning. After all, if the genome has been cobbled together by a process of undirected, mindless chance and necessity, is it not reasonable to expect the presence of a large amount of noise in the genome? Moreover, it is the homologous distribution of supposed "junk DNA" which is often presented as one of the leading arguments for common ancestry. Every time function is ascribed to these elements, the Darwinian argument has to retreat, taking refuge in the ever-shrinking gaps in our knowledge of the genome.
Myers follows this up by citing a paper that appeared just over a month ago in PLoS Biology by Bakel et al. The abstract reads,
The human genome was sequenced a decade ago, but its exact gene composition remains a subject of debate. The number of protein-coding genes is much lower than initially expected and the number of distinct transcripts is much larger than the number of protein-coding genes. Moreover, the proportion of the genome that is transcribed in any given cell type remains an open question: results from "tiling" microarray analyses suggest that transcription is pervasive and that most of the genome is transcribed, whereas new deep-sequencing based methods suggest that most transcripts originate from known genes. We have addressed this discrepancy by comparing samples from the same tissues using both technologies. Our analyses indicate that RNA sequencing appears more reliable for transcripts with low expression levels, that most transcripts correspond to known genes or are near known genes, that many transcripts may represent new exons or aberrant products of the transcription process. We also identify several thousand small transcripts that map outside known genes; their sequences are often conserved and are often encoded in regions of open chromatin. We propose that most of these transcripts may be by-products of the activity of enhancers, which associate with promoters as part of their role as long-range gene regulatory sites. Overall, however, we find that most of the genome is not appreciably transcribed.
There is actually a pretty good response to this article here. The methodology of the PLoS Biology article is fatally flawed, for they use a program called "RepeatMasker", which screens out all the repetitive DNA. But given that about 50% of our genome is comprised of repetitive DNA, the conclusions drawn by the authors seems to be a little disingenuous to say the least! In fact, the official description of RepeatMasker itself states that "On average, almost 50% of a human genomic DNA sequence currently will be masked by the program."
As if that weren't bad enough, the researchers then base their results "primarily on analysis of PolyA+ enriched RNA." But we've known since 2005 that, in humans, PolyA- sequences are twice as abundant as PolyA+ transcripts. So the authors not only exclude half the genome from their research, but also completely ignore two thirds of the RNA in what remains!
In any case, even if we grant Myers his premise (that the majority of DNA is not transcribed into mRNA), it still does not follow that the non-transcribed elements are "junk". Indeed, functions have been ascribed to non-transcribed regions of the genome.
The bottom line is this: Caution is warranted when one starts making generalizing claims that a DNA element is never transcribed into mRNA. It cannot be ruled out, of course, that a region is transcribed in a few select instances or cell types.
Functions for DNA that was once thought to be junk are being uncovered on a daily basis. Wouldn't it be more scientific to take a wait-and-see approach on the "junk DNA" question?