Jonathan Wells Hits an Evolutionary Nerve

When intelligent design (ID) proponents press neo-Darwinian evolutionists on the inability of Darwinian evolution to produce new functional genetic information, a common response from evolutionists is that they get angry and engage in name calling. That’s what happened when Michael Egnor asked How does evolution produce new functional genetic information?, and it again seems to be the case now after Jonathan Wells bravely observed that “duplicating a gene doesn’t increase information content any more than photocopying a paper increases its information content.” Mathematician and ID-critic Jeffrey Shallit responded by calling Wells a “buffoon.” Dr. Shallit then proceeded to offer an irrelevant definition of information which supposedly showed that Wells was wrong. William Dembski has responded to Shallit here, but Shallit’s tactics and arguments are worth exploring further.

Shallit defines information as “Kolmogorov complexity,” which can be thought of this way: What is the minimum length of a computer program to generate string X? The more commands necessary, the greater the Kolmogorov complexity.

Returning to Wells’s analogy, let’s call the information in the paper being photocopied X. We’ll use a little pseudocode to explore what, according to Shallit’s logic, would be the algorithm necessary to generate the needed information. It would probably look like this:

1. Write X

Next, to photocopy the paper (i.e. to produce XX), the necessary commands might be as follows:

1. Write X
2. Repeat last step

So, in Shallit’s view there is “more” Kolmogorov complexity after photocopying the paper (XX) than in the single sheet of paper (X), because it requires more commands to describe the photocopied information. But the information in both sheets of paper is identical.

Does Shallit’s explanation have any bearing upon the question Wells is asking, i.e. how do explain how new functional genetic information arises? Of course not. What would be far more interesting is an explanation of how you could get a paper full of new information — call it Y — instead of 2 copies of X. Thus, Wells would like to see how we can get XY out of X, instead of merely XX.

Since biology is based upon functional information, Jonathan Wells is interested in far more important questions like, Does neo-Darwinism explain how new functional biological information arises? Shallit seems interested primarly in addressing simplistic, trivial questions like how one might duplicate a string, without regard for the all important question of whether those additional characters convey some new functional message.

Under Kolmogorov complexity, a stretch of completely functionless junk DNA that has been utterly garbled by random, neutral mutations might have more Kolmogorov complexity than a functional gene of the same sequence length. For example, consider the two following strings:

String A:
KOMOLGOROVINFORMATIONISAPOORMEASUREOFBIOLOGICALCOMPLEXITY

String B:
JLNNUKFPDARKSWUVWEYTYKARRBVCLTLOPDOUUMUEVCRLQTSFFWKJDXSOB

Both String A and String B are composed of exactly 57 characters. String A spells a sentence in English, and String B was generated using a random string generator. Yet since many of its characters could be predicted using the grammatical rules of English, String A actually has less Kolmogorov complexity than String B (for example, we could use a data compression algorithm to shorten String A dramatically). Yet clearly String A conveys much more functional information than the String B.

For obvious reasons, Kolmogorov complexity is not always a helpful metric of functional biological information. After all, biological information is finely tuned to perform a specific function, whereas random strings are not. A useful measure of biological information must account for the function of the information, and Kolmogorov information does not necessarily take function into account.

In fact, Kolmogorov information is very similar to Shannon information, where “In both cases, the amount of information in an object may be interpreted as the length of a description of the object.” But the length of the description says nothing about whether there is function, or how much fine-tuning is necessary for function. Thus you could have a very long random string that requires long descriptions, but it has no function. As any ID novice knows, we infer design when we find both complexity and specification. In rough terms, Shannon information or Kolmogorov information measure complexity, but not specification. Thus, such measures of information are not useful for measuring functional biological information. As a paper in the journal Theoretical Biology and Medical Modelling observes:

[N]either RSC [Random Sequence Complexity] nor OSC [Ordered Sequence Complexity], or any combination of the two, is sufficient to describe the functional complexity observed in living organisms, for neither includes the additional dimension of functionality, which is essential for life. FSC [Functional Sequence Complexity] includes the dimension of functionality. Szostak argued that neither Shannon’s original measure of uncertainty nor the measure of algorithmic complexity are sufficient. Shannon’s classical information theory does not consider the meaning, or function, of a message. Algorithmic complexity fails to account for the observation that “different molecular structures may be functionally equivalent.” For this reason, Szostak suggested that a new measure of information — functional information — is required.

(Kirk K. Durston, David K. Y. Chiu, David L. Abel, Jack T. Trevors, “Measuring the functional sequence complexity of proteins,” Theoretical Biology and Medical Modelling, Vol. 4:47 (2007) (internal citations removed).)

Likewise, Stephen C. Meyer writes in a peer-reviewed scientific paper that it is useful to adopt “‘complex specified information’ (CSI) as a synonym for ‘specified complexity’ to help distinguish functional biological information from mere Shannon information — that is, specified complexity from mere complexity.” Meyer’s suggested definition of “specified complexity” is highly useful in describing functional biological information. Specified complexity is a notion derived from mainstream scientific literature and is not an invention of critics of neo-Darwinism. In 1973, origin of life theorist Leslie Orgel identified specified complexity as the hallmark of biological complexity:

[L]iving organisms are distinguished by their specified complexity. Crystals are usually taken as the prototypes of simple, well-specified structures, because they consist of a very large number of identical molecules packed together in a uniform way. Lumps of granite or random mixtures of polymers are examples of structures which are complex but not specified. The crystals fail to qualify as living because they lack complexity; the mixtures of polymers fail to qualify because they lack specificity.

(Leslie E. Orgel, The Origins of Life: Molecules and Natural Selection, pg. 189 (Chapman & Hall: London, 1973).)

Orgel captures the fact that specified complexity, or CSI, requires both an unlikely sequence and a specific functional arrangement. In fact, Orgel’s “random mixture of polymers” might have extremely high Kolmogorov complexity, even though it would not be sufficiently specified to encode a functional biological life-form. Specified complexity is a much better measure of biological complexity than Shannon complexity or Kolmogorov complexity because it recognizes the highly specified nature of biological complexity. This is a point that Shallit must resist recognizing because it’s much harder to generate specified complexity via Darwinian processes than mere Shannon complexity or Kolmogorov complexity.

Shallit would most likely define information that is “new” as mere copies or duplicates of some pre-existing stretch of DNA, even if the new copy doesn’t actually do anything new, or perhaps even when the new DNA doesn’t do anything at all. In contrast, ID proponents define “new” genetic information as a new stretch of DNA that actually performs some different, useful, and new function.

For example, consider the following 42-character string:

DUPLICATINGTHISSTRINGDOESNOTGENERATENEWCSI

Now consider the following duplicate string:

DUPLICATINGTHISSTRINGDOESNOTGENERATENEWCSIDUPLICATINGTHISSTRINGDOESNOTGENERATENEWCSI

Whether or not we have increased the Kolmogorov complexity, we have not created any new meaning in the duplicated string. We have not increased the CSI in any meaningful sense.
The above example is of course analogous to the commonly cited evolutionary mechanism of gene duplication. New functional information is not generated by a process of duplication until mutations change the gene enough to generate a new function — which may or may not be possible. As Professor of Neurosurgery Michael Egnor insightfully commented in a response to P.Z. Myers:

[G]ene duplication is, presumably, not to be taken too seriously. If you count copies as new information, you must have a hard time with plagiarism in your classes. All that the miscreant students would have to say is “It’s just like gene duplication. Plagiarism is new information — you said so on your blog!”