Can You Build WALL-E from Repeating Legos?
"A central question in protein evolution is the extent to which naturally occurring proteins sample the space of folded structures accessible to the polypeptide chain." Thus begins a new paper on sequence space for proteins, a concept that has been key to work by leading ID theorists Douglas Axe, Stephen Meyer, and William Dembski. This is the question: Out of the vast space of possible amino acid sequences, how many can fold into functional proteins? ID argues that functional space is such a small subset of sequence space, the probability that a blind search will find any is vanishingly small.
Nine researchers led by some of our Seattle neighbors over at the University of Washington, publishing in Nature, decided to investigate how much of the sequence space nature has sampled. It's obviously far too big a space to search, so they limited it to just "repeat proteins" -- those that use certain structural motifs over and over.
To our knowledge, all designed repeat protein structures to date have been based on naturally occurring families. These families may cover all stable repeat protein structures that can be built from the 20 amino acids or, alternatively, natural evolution may only have sampled a subset of what is possible.
By applying experimental protein design, they show that you can get many more potential proteins by simply repeating certain "building blocks" over and over, something like assembling Lego pieces blindly. They manufactured some Lego-like protein kits by generating scads of "a simple helix-loop-helix-loop building block" and putting them together using an automated process. Out of 83 they built, 44 showed a stable fold. But is this experiment about evolution or intelligent design?
We have shown that a wide range of novel repeat proteins can be generated by tandem repeating a simple helix-loop-helix-loop building block. As illustrated by the comparison of 15 design models to the corresponding crystal structures (Fig. 4), our approach allows precise control over structural details throughout a broad range of geometries and curvatures. The design models and sequences are very different from each other and from naturally occurring repeat proteins, without any significant sequence or structural homology to known proteins (Extended Data Fig. 8). This work achieves key milestones in computational protein design: the design protocol is completely automatic, the folds are unlike those in nature, more than half of the experimentally tested designs have the correct overall structure as assessed by SAXS, and the crystal structures demonstrate precise control over backbone conformation for proteins over 200 amino acids. The observed level of control over the repeating helix-loop-helix-loop architecture shows that computational protein design has matured to the point of providing alternatives to naturally occurring scaffolds, including graded and tunable variation difficult to achieve starting from existing proteins. We anticipate that the 44 successful designs described in this work (Extended Data Fig. 9), and sets generated using similar protocols for other repeat units, will be widely useful starting points for the design of new protein functions and assemblies.
Note that word "function" at the end. A search of the paper shows nothing about whether any one of the design models actually does anything. Yet they seem to have one ear open to the possible whisper of Darwin speaking in the background:
Naturally occurring repeat protein families, such as ankyrins, leucine-rich repeats, TAL effectors and many others, have central roles in biological systems and in current molecular engineering efforts. Our results suggest that these families are only the tip of the iceberg of what is possible for polypeptide chains: there are clearly large regions of repeat protein space that are not sampled by currently known repeat protein structures. Repeat protein structures similar to our designs may not have been characterized yet, or perhaps may simply not exist in nature.
The authors only mention evolution twice. It's not really a focus in this paper. The word "design," however, appears a whopping 74 times, even before the Methods section. They did interesting and important work. But lest anyone think their conclusion weakens the arguments of Axe, Meyer, and Dembski by expanding the potential functional space accessible to random search within sequence space, let's apply a heavy dose of realism.
They sampled only part of the "repeat protein" portion of sequence space.
They began with "building block" motifs that already fold (helices and loops).
They used only left-handed (homochiral) amino acids.
They did not test to see if any of the stable structures perform a function.
They did not test to see if any of their structures could interact with other proteins or structures (for this problem, see this earlier article on this subject).
Their work was highly dependent on intelligent design (i.e., their own).
You could liken their results to a robot programmed to assemble Legos according to a rule: "fasten, twist, repeat." If the Lego pieces are already designed, the algorithm can say nothing about where the pieces came from. As all kids know, the holes in Lego pieces have to be spaced properly to fit together. Similarly, amino acids need to be properly sequenced to fold into a helix or loop. If that's a given, it's not surprising that you could generate quite a few unique structures by the algorithm "fasten, twist, repeat." Even WALL-E the robot could do that without thinking. Whether anything worthwhile would result is dubious.
Actually, you can assemble a WALL-E robot using Lego pieces now. The Lego company offers that and many other elaborate, complex kits that go well beyond the simple building-block sets from decades ago. A kid could put the WALL-E pieces together and show off his pride and joy in a matter of hours or maybe even minutes. But could nature pull that off by blind search? Think of the programming that would be required to get WALL-E to assemble his likeness out of Lego pieces! It's intelligent design all the way down.
Here's the take-home: Despite a hint of "protein evolution" in this paper, the experimental evidence has again vindicated ID. Without a mind directing assembly of amino acids according to a design goal, nothing interesting will happen by chance or repetition by an aimless process. Sequence space is too vast and functional space too vanishingly small to expect success by blind search.