Coming to Grips with ENCODE: More on the Question of "Function" - Evolution News & Views

Evolution News and Views (ENV) provides original reporting and analysis about the debate over intelligent design and evolution, including breaking news about scientific research.

Evolution News and Views

Coming to Grips with ENCODE: More on the Question of "Function"

Grips with ENCODE.jpg

Since the ENCODE Project asserted that most of the human genome shows activity, some Darwinists have countered that activity is not the same thing as function. A substantial part of that activity could be "biological noise," the output of transcription machinery that leads nowhere. Aware of the debate, thirty geneticists, many from the ENCODE Project, wrote a Perspective piece in the Proceedings of the National Academy of Sciences evaluating what is currently known about genomic activity and biological function. We noted it here the other day, but this is important stuff and bears further elucidation. To present a robust argument for design, readers need to be aware of the arguments on both sides, as well as the limitations of detection methods.

The Debate

junk-dna-cover.jpgThe case for junk DNA is more nuanced than the simple presupposition that evolution generates junk. It's partly based on observations like the C-value paradox (the fact that genome size appears unrelated to phenotypic complexity), the high number of repetitive elements, and theoretical considerations, such as the likely inability of selection to maintain the entire genome of a slowly reproducing organism against the mutational burden.

MDNA.Cover.jpgThe case for abundant function, on the other hand, builds on ENCODE's discovery of "pervasive activity over an unexpectedly large fraction of the genome, including noncoding and nonconserved regions and repeat elements." (Emphasis added.) ENCODE researchers found many human genomic regions teeming with biological activity. The argument here is that the body would not invest so much energy in all this activity if it were not functional. These and other observations "raise the possibility that functional sequences encompass a larger proportion of the human genome than previously thought."

So how should we assess the degree of function in the genome? In short, the authors state that more work needs to be done, but the use of multiple approaches is most likely to lead to the correct interpretation. Because "there is no universal definition of what constitutes function," care must be taken before making dogmatic claims about the extent of function in the human genome. Nevertheless, the ENCODE project may well represent the best estimate to date.

To evaluate the evidence fairly, the authors examined three approaches to identifying function: (1) biochemical studies (like ENCODE) that measure activity of DNA regions in various cell types, (2) evolutionary approaches, which use comparative genomics to identify conserved regions as a measure of natural selection, and (3) genetic approaches, that look for consequences of mutations in the phenotype. While each approach has its strengths and weaknesses, their conclusions often overlap, leading to better estimates of function:

Limitations of the genetic, evolutionary, and biochemical approaches conspire to make this seemingly simple question difficult to answer. In general, each approach can be used to lend support to candidate elements identified by other methods, although focusing exclusively on the simple intersection set would be much too restrictive to capture all functional elements. However, by probing quantitative relationships in data from the different approaches, we can begin to gain a more sophisticated picture of the nature, identity, and extent of functional elements in the human genome.

What are the strengths and weaknesses of the different approaches?

Genetic Approach

The genetic approach can identify some functions with certainty: if you mutate a gene and the organism dies, it needs the gene to function! Beyond the obvious, though, it is possible to miss functional elements. The gene (or noncoding region) might be able to absorb mutations and still function. The DNA might only function in certain cell types, or in certain stages of development. And just because the organism doesn't die or weaken doesn't mean the DNA is useless:

The approach may also miss elements whose phenotypes occur only in rare cells or specific environmental contexts, or whose effects are too subtle to detect with current assays. Loss-of-function tests can also be buffered by functional redundancy, such that double or triple disruptions are required for a phenotypic consequence. Consistent with redundant, contextual, or subtle functions, the deletion of large and highly conserved genomic segments sometimes has no discernible organismal phenotype and seemingly debilitating mutations in genes thought to be indispensible have been found in the human population.

Functions determined with the genetic approach, therefore, should be considered a lower limit to the true level of function.

Evolutionary Approach

By comparing genomes, evolutionists look for DNA regions that are conserved. The idea is that that purifying selection will eliminate mutations in functional regions, while allowing nonfunctional regions to mutate by genetic drift or neutral evolution. This approach doesn't actually identify what the function is. It just assumes that function will be conserved.

Evolutionists also try to find evidence of positive selection by looking for accelerated evolution across species. Once again, though, the mere comparison of nucleotides cannot identify the function or lack of it. The authors' discussion of the drawbacks of this approach is noteworthy:

Although powerful, the evolutionary approach also has limitations. Identification of conserved regions depends on accurate multispecies sequence alignments, which remain a substantial challenge. Alignments are generally less effective for distal-acting regulatory regions, where they may be impeded by regulatory motif turnover, varying spacing constraints, and sequence composition biases. Analyzing aligned regions for conservation can be similarly challenging. First, most transcription factor-binding sequences are short and highly degenerate, making them difficult to identify. Second, because detection of neutrally evolving elements requires sufficient phylogenetic distance, the approach is well suited for detecting mammalian-conserved elements, but it is less effective for primate-specific elements and essentially blind to human-specific elements. Third, certain types of functional elements such as immunity genes may be prone to rapid evolutionary turnover even among closely related species. More generally, alignment methods are not well suited to capture substitutions that preserve function, such as compensatory changes preserving RNA structure, affinity-preserving substitutions within regulatory motifs, or mutations whose effect is buffered by redundancy or epistatic effects. Thus, absence of conservation cannot be interpreted as evidence for the lack of function.

 Finally, although the evolutionary approach has the advantage that it does not require a priori knowledge of what a DNA element does or when it is used, it is unlikely to reveal the molecular mechanisms under selection or the relevant cell types or physiological processes. Thus, comparative genomics requires complementary studies.

Once again, this approach is likely to severely underestimate the true measure of biological function.

Biochemical Approach

The ENCODE Project measured function of genomic regions by their biological activity: i.e., which ones are transcribed, which ones are regulated, and which ones show epigenetic enhancement. The surprise of the first ENCODE announcement was that they found "pervasive transcription" across most of the genome. As with all approaches, though, this one too has limitations:

An advantage of such functional genomics evidence is that it reveals the biochemical processes involved at each site in a given cell type and activity state. However, biochemical signatures are often a consequence of function, rather than causal. They are also not always deterministic evidence of function, but can occur stochastically.... In short, although biochemical signatures are valuable for identifying candidate regulatory elements in the biological context of the cell type examined, they cannot be interpreted as definitive proof of function on their own.

By stochastically, they mean that it's possible the transcription machinery generates some RNAs in regions that have no function, producing a kind of "biological noise" that gets cleaned up downstream by other processes. Advocates of "junk DNA" could, therefore, argue that the mere presence of transcribed RNAs does not prove function. The authors agree that we shouldn't expect a priori that every RNA transcript is functional. "Zero tolerance for errant transcripts would come at high cost in the proofreading machinery needed to perfectly gate RNA polymerase and splicing activities, or to instantly eliminate spurious transcripts."

State of the Art

So how should we assess the degree of function in the human genome? At this stage, we cannot accurately say. The resolution of lab techniques is too crude in some cases to catch everything. The activity of genomic regions varies by orders of magnitude. It's going to take more work to flesh out the best interpretation of ENCODE, but by using the strengths of all three approaches, the authors say, we can get a glimpse of the likely measure of biological function. Since the genetic and evolutionary approaches tend to underestimate function, and the biochemical approach could overestimate it, we must be careful not to be dogmatic. Even so, there are multiple points where measures of function could be substantially underestimated.

One thing we can say is that ENCODE provided a whopping reality check for Darwinian evolutionists who had assumed most of the genome is junk:

The major contribution of ENCODE to date has been high-resolution, highly-reproducible maps of DNA segments with biochemical signatures associated with diverse molecular functions. We believe that this public resource is far more important than any interim estimate of the fraction of the human genome that is functional.

The burden of proof is now on the junk-DNA advocates to explain why this pervasive activity is nonfunctional. ID advocates, by contrast, received a strong boost from the ENCODE news. It matched their assumption that "if something works, it's not happening by accident."

Photo credit: Duncan Hull/Flickr.