Why the "Onion Test" Fails as an Argument for "Junk DNA"
Briefly stated, the often cited "onion test" observes that onion cells have many times more DNA than human cells do. And since the onion is considered to be relatively simple as compared to us, this discrepancy -- it is argued -- can only be accounted for if the preponderance of its DNA is, in fact, junk or non-functional. Let's see whether the concept really holds any water.
The term "onion test" was first coined in April 2007 by the Canadian evolutionary biologist and genome biologist T. Ryan Gregory. In a post on his blog, Gregory wrote,
I am not sure how official this is, but here is a term I would like to coin right here on my blog: "The onion test." The onion test is a simple reality check for anyone who thinks they have come up with a universal function for non-coding DNA. Whatever your proposed function, ask yourself this question: Can I explain why an onion needs about five times more non-coding DNA for this function than a human?
The onion, Allium cepa, is a diploid (2n = 16) plant with a haploid genome size of about 17 pg. Human, Homo sapiens, is a diploid (2n = 46) animal with a haploid genome size of about 3.5 pg. This comparison is chosen more or less arbitrarily (there are far bigger genomes than onion, and far smaller ones than human), but it makes the problem of universal function for non-coding DNA clear.
Further, if you think perhaps onions are somehow special, consider that members of the genus Allium range in genome size from 7 pg to 31.5 pg. So why can A. altyncolicum make do with one fifth as much regulation, structural maintenance, protection against mutagens, or [insert preferred universal function] as A. ursinum?
There you have it. The onion test. To be applied to any ambitious claims that a universal function has been found for non-coding DNA.
This phenomenon is but one example of a larger issue typically known as the "C-value enigma." The C-value enigma describes the lack of correlation (among eukaryotes) between genome size and organismal complexity; a phenomenon which, curiously, does not extend to prokaryotic domains. The human genome comprises about 3 billion base pairs of DNA: Compare this to the genome size of Amoeba dubia (670,000,000,000 bp). Indeed, the human genome is only marginally bigger than that of C. elegans and D. melanogaster. Among amphibians, the smallest genomes are just shy of 10 billion base pairs, while the largest are nearly 10^11 base pairs.
When I wrote on this subject at Uncommon Descent, Larry Moran protested that "[t]he Onion test is not about proving the existence of junk DNA and its not about explaining the C-Value Paradox," but, rather, it's "a test for those people -- like Jonathan Wells -- who think that most of our genome has a function." He further wrote,
I hate to break it to you Jonathan M, you being such a great scientist and all, but the Onion Test is not an argument for junk DNA. It's a test of possible functional explanations. (Didn't I mention that already?) Jonathan M has completely misunderstood the Onion Test. Would it surprise you, dear readers, to learn that Jonathan Wells also flunks the test...This isn't rocket science."This is a curious claim, not least because T. Ryan Gregory clearly wrote his original blog post with the intent of arguing that much of our DNA is junk. Indeed, he writes,
I do not endorse the use of the term "junk DNA," which I think has deviated far too much from its original meaning and is now little more than a loaded buzzword; the descriptive term "non-coding DNA" is what I use to refer to the majority of eukaryotic sequences (of various types) that do not encode protein products...Some non-coding DNA certainly has a function at the organismal level, but this does not justify a huge leap from "this bit of non-coding DNA [usually less than 5% of the genome] is functional" to "ergo, all non-coding DNA is functional."While this may be true, as Jonathan Wells has noted, "it is the trend, more than the current total, that should worry any defender of junk DNA." Moran also claims that the "onion test" is not synonymous with the C-value enigma, but he doesn't make the difference very clear.
As I noted previously, the argument is only valid if one can justify the presumption that genome size has no bearing on organismal physiology. The problem is that this assumption can be demonstrated not to be the case.
Transcription Delays and Developmental Timing Mechanisms
Presumably a reflection of the selective-pressure on transcriptional economy with respect to highly expressed genes, genes that are highly expressed tend to have short introns (Castillo-Davis et al., 2002). Other genes are rich in introns: such as the 2400 kb human dystrophin gene, 99% of which is comprised of introns. The time taken to transcribe this gene into mRNA adds up to about 16 hours (Tennyson et al., 1995). To take another example, consider the Y chromosomal loci of Drosophila, which are very long (spanning millions of bases and consisting largely of introns): For instance, a locus such as DhDhc7(Y) is transcribed over the course of two to three days to give rise to a ~5,100,000 nucleotide pre-mRNA (see Reugels et al., 2000; Piergentili et al., 2007; and Redhouse et al., 2011). The time taken to transcribe respective stretches of DNA is not inconsequential to physiological fitness: Indeed, Swinburne and Silver (2010) report that "transcriptional delays have been shown to contribute to developmental timing."
On this, Larry Moran was curiously silent. Responding to the same article, Nick Matzke remarked that "if most of the genome has merely 'sequence independent' function, then Stephen Meyer's statement that the genome is 'chock-full of information' is unsupportable." While there is no doubt that SOME of the genome has a "sequence independent function," it is doubtful that most of the genome falls into such a category. Matzke is mistaken because, if the genetic elements possess a function, the design inference is not negated, even if that function is independent of exact sequence. Even in the case of repetitive DNA, the sequence can be highly relevant (see this paper by Richard Sternberg and James Shapiro on "Why Repetitive DNA is Essential to Genome Function"). Moreover, even if the function is sequence independent, there still might be specified complexity, triggering an explicit design inference.
Alternative Splicing and Genome Size
A second point I made was that the phenomena of alternative splicing and alternative polyadenylation may have some explanatory power when it comes to accounting for the C-value enigma. Alternative splicing allows the exons of pre-mRNA transcript to be spliced into a number of different isoforms to produce multiple proteins from the same transcript. I stated that the level of alternative splicing exhibited in humans (more than 90%, with an average of 2 or 3 transcripts per gene) is much higher than that for C. elegans (about 22%, with less than 2 transcripts per gene), and argued that this may, in part, explain why humans have only marginally more genes than C. elegans, which is otherwise seemingly paradoxical given the complexity of humans as compared to the roundworm. On this point, Moran wrote,
I don't agree with the facts. I don't think it's true that most human genes produce multiple functional copies of mRNA by alternative splicing.On this point, Moran is not just in disagreement with me, he is also out-of-date with the literature by about ten years! As this 2008 article from Science Daily reports,
But let's assume that it's true. Let's assume that a large fraction of putative junk DNA is there because it promotes alternative splicing. Why do onion cells need so much more alternative splicing and why do related species of onion differ so much in their need for alternative splicing? That's what the Onion Test asks.
Nearly all human genes, about 94 percent, generate more than one form of their protein products, the team reports in the Nov. 2 online edition of Nature. Scientists' previous estimates ranged from a few percent 10 years ago to 50-plus percent more recently.Wang et al. (2008), moreover, report in Nature,
"A decade ago, alternative splicing of a gene was considered unusual, exotic ... but it turns out that's not true at all -- it's a nearly universal feature of human genes," said Christopher Burge, senior author of the paper and the Whitehead Career Development Associate Professor of Biology and Biological Engineering at MIT.
Through alternative processing of pre-messenger RNAs, individual mammalian genes often produce multiple mRNA and protein isoforms that may have related, distinct or even opposing functions. Here we report an in-depth analysis of 15 diverse human tissue and cell line transcriptomes on the basis of deep sequencing of complementary DNA fragments, yielding a digital inventory of gene and mRNA isoform expression. Analyses in which sequence reads are mapped to exon-exon junctions indicated that 92-94% of human genes undergo alternative splicing, ~86% with a minor isoform frequency of 15% or more.Moran also misunderstands the point of the argument, since it is not onion cells which would exhibit more alternative splicing but human ones (hence explaining why their genome size need not be as big as it might otherwise be). At any rate, this phenomenon on its own is not sufficient to account for the C-value enigma. Indeed, it is only one of a plethora of factors that make the issue far more complex than it is often portrayed.
I also noted that approximately 10% of human genes code for transcription factors (a special class of protein that binds to specific sequences of DNA, namely, enhancers or promoters which are adjacent to genes they regulate the expression of). In contrast, only about 5% of yeast genes code for transcription factors. When coupled with a much larger network of transcriptional enhancers and promoters, such a difference could result in a much larger set of gene expression patterns. This could lead to a non-linear increase in organismal complexity (see Levine and Tjian, 2003). Larry Moran conceded that "[he doesn't] even understand this point." Clearly, he didn't even bother to look up the cited paper. For if he had, he would have read the following:
The yeast genome encodes a total of 300 transcription factors; this includes both sequence-specific DNA-binding proteins and subunits of general transcription complexes such as TFIID9. In contrast, the genome sequences of C. elegans and Drosophila reveal at least 1,000 transcription factors in each organism. There may be as many as 3,000 transcription factors in humans. It would appear that organismal complexity correlates with an increase in both the ratio and absolute number of transcription factors per genome. Yeast contains an average of one transcription factor per 20 genes, while humans appear to contain one factor for every ten genes. Given the combinatorial nature of transcription regulation, even this twofold increase in the number of factors could produce a dramatic expansion in regulatory complexity. [emphasis added]
Another point missed by Moran and Matzke is that there are limiting factors on an organism's genome size. For example, in 2002, Andrew George published a paper in Trends in Immunology titled, "Is the number of genes we possess limited by the presence of an adaptive immune system?" In the paper, he argued that gene count is limited by the presence of an adaptive immune system because of the potential for of self-recognition. As the paper explains,
The factors that are important in limiting the number of functional genes contained within the genome of an organism are presently unknown. Here, it is suggested that in organisms that contain an adaptive immune response, the number of genes in the genome might be limited by the need to delete autoreactive T cells, thus preventing autoimmunity. The more genes an organism has, the more autoantigens are generated, necessitating an increase in the proportion of T cells that are deleted. Is human complexity limited by the presence of an immune system? Although immunity is vital for health, the need to be tolerant to all "self" molecules could restrict the number of genes in our genome.In a further correlation that has been established, organisms with rapid development typically have lower C-values, presumably because they don't have time to replicate lots of DNA between cell divisions.
In The Myth of Junk DNA, Jonathan Wells observes,
There is a strong positive correlation, however, between the amount of DNA and the volume of a cell and its nucleus -- which affects the rate of cell growth and division. Furthermore, in mammals there is a negative correlation between genome size and the rate of metabolism. Bats have very high metabolic rates and relatively small genomes. In birds, there is a negative correlation between C-value and resting metabolic rate. In salamanders, there is also a negative correlation between genome size and the rate of limb regenerationOn this point, the detractors have also remained oddly silent.
In the case of bacteria, which have single replicons per chromosome, they face selective pressure to limit the accumulation of non-genic DNA which might make the replication times longer and thus slow rates of reproduction. This means that their genome size is correlated with gene number, and thus increases in proportion to structural and metabolic complexity.
Genome Size and Cell Volume
A final point: For creatures as diverse as vertebrates, plants and protozoa (unicellular eukaryotes), there is a positive correlation between genome size and cell volume. Cavalier-Smith and others have suggested that DNA possesses a "structural role" in controlling nuclear volume, cell size and cell-cycle length.
Moran and Matzke want to know whether this phenomenon is a cause or an effect. Cavalier-Smith's view of "skeletal DNA," advocated in the cited paper, would suggest that "larger cells need more DNA," and that "smaller ones need less." Indeed, he explains that "If cell size declines, the nucleus will be too large compared with the cytoplasm for optimal efficiency and the drain on the cell's resources for replicating all that extra DNA and making all the histones it needs will be uneconomic." So, deletions are selectively favored over duplications. In the case of larger cells, the reverse is the case. When the cell size increases, Cavalier-Smith explains, "there is positive selection for a corresponding increase in nuclear volume; it is generally easier to achieve this by increasing the amount of DNA rather than by altering its folding parameters." Cavalier-Smith argues that the abundance of transposons (arising by duplication) is an effect (and not a cause) of the larger genomes. The larger amount of DNA thus provides a selective advantage by contributing to the skeleton and volume of the nucleus.
The so-called onion test, or indeed the "C-value enigma," is predicated on unsupportable assumptions about the physiological effects of -- and/or requirements for -- larger genomes, many of which are contradicted by the scientific evidence. As we learn ever more about the nature and functional inter-dependency of the genome, as the extent of genomic "dark matter" continues to shrink, those who continue to advocate the view that the preponderance of our genome is non-functional should find these facts disconcerting.