"Monkeys Typing Shakespeare" Simulation Illustrates Combinatorial Inflation Problem
We've all heard the old line about how it would take a virtual eternity for monkeys sitting at typewriters to produce all the works of Shakespeare. Well, according to a BBC News article, someone is actually making the attempt -- not with real monkeys but with virtual ones. But there's a catch: the simulation isn't trying to produce all the works of Shakespeare in one fell swoop. It's cheating by only trying to generate small pieces of Shakespeare's writings, and then splicing them together in the right order:
Mr Anderson's virtual monkeys are small computer programs uploaded to Amazon servers. These coded apes regularly pump out random sequences of text.So the "target sequence" here isn't the entire works of Shakespeare, but rather a body of much shorter 9-character strings. According to the article there "there are about 5.5 trillion different combinations of any nine characters from the English alphabet." The simulation is using cloud computing, known to run at high teraflop, and even petaflop speeds. With such enormously rapid computation power, it isn't surprising that this simulation can pump out 9-character strings of original Shakespeare so quickly.
Each sequence is nine characters long and each is checked to see if that string of characters appears anywhere in the works of Shakespeare. If not, it is discarded. If it does match then progress has been made towards re-creating the works of the Bard.
But what would happen if the simulation were forced to produce all the works of Shakespeare in one fell swoop? Could our fastest computers do it? According to the BBC article, not a chance:
"If he's running an evolutionary approach, holding on to successful guesses, then he'll get there," said Tim Harford, popular science writer and presenter of the BBC's radio show about numbers "More or Less."Though Tim Harford might not realize it, in the world of real biology, an "evolutionary approach" might actually have to produce something more like "a flawless copy" than smaller 9-character strings. For example, Doug Axe's research has shown that functional protein folds are enormously specified, and rare in sequence space. He suggests that amino acid sequences that yield functional protein folds might be as rare as 1 in 1074 sequences.
And without those constraints?
"Not a chance," said Dr Ian Stewart, emeritus professor of mathematics at the University of Warwick.
His calculations suggest it would take far, far longer than the age of the Universe for monkeys to completely randomly produce a flawless copy of the 3,695,990 or so characters in the works.
"Along the way there would be untold numbers of attempts with one character wrong; even more with two wrong, and so on." he said. "Almost all other books, being shorter, would appear (countless times) before Shakespeare did."
Is Darwinian evolution up to the job? Well, other research from Axe suggests that if a feature requires more than 6 mutations before conferring some advantage, it won't arise in the history of the earth.
In other words, Darwinian evolution isn't going to be able to produce fundamentally new protein folds. In fact, it probably wouldn't even be able to produce a single 9-character string of nucleotides in DNA, if that string would not be retained by selection until all 9 nucleotides were in place.
This is called the combinatorial inflation problem. To summarize, the problem goes like this:
Natural selection works well when it can build structures in small incremental steps. But when multiple mutations are necessary to produce a selective advantage, the odds of the trait arising begin to become very small. Michael Behe explains further:
[I]f only one mutation is needed to confer some ability, then Darwinian evolution has little problem finding it. But if more than one is needed, the probability of getting all the right ones grows exponentially worse.The more mutations you need to gain some selective advantage, the more the probabilities multiply and thus get smaller at an exponential rate.
Darwin understood such a problem could devastate his theory. In Origin of Species, he noted that "If it could be demonstrated that any complex organ existed, which could not possibly have been formed by numerous, successive, slight modifications, my theory would absolutely break down." Though Darwin didn't know about mutations in DNA, modern biologists are fast approaching the realization that biochemical complexity pushes Darwinian theory well beyond the available probabilistic resources.