Code Busting: Genomes Where a "Stop" Sign Means "Go"
Up until now it has been believed that most organisms use the same genetic code to specify how to make proteins. A new article in Science, however, shows that this may not be true, and that considerably more variation in organisms' genetic codes may exist than was expected. Out there in the "microbial dark matter," in organisms that won't grow in the lab and that have not previously been examined, are species that have specified one of the canonical stop codons (TAG, TGA and TAA in DNA, and UAG, UGA and UAA in the transcribed RNA) to encode an amino acid instead. And the percentage of organisms that have done so is much higher than anyone would have thought possible.
What are stop codons? They play an essential role in protein synthesis by telling the protein-making machine, the ribosome, when to stop making a particular protein. In bacteria, genes are often grouped in units of regulation known as operons; the genes in these operons are transcribed as one long message. The stop codons act like periods, telling the ribosome to "stop here" as it works its way down the message. The ribosome releases the protein it has just made, and then moves on down the message to the next one.
What these scientists have discovered is that in these organisms with non-canonical genetic codes, one of their "stop" codons doesn't mean stop, it means "insert an amino acid here." The result is that the ribosome reads through the "stop" codon, inserting an amino acid instead, and thus joins what would have been two proteins into one.
How did the scientists spot this phenomenon? According to the Science Daily article describing this work, they found a genome where, if the canonical three stop codons were used, many reading frames were about 200 nucleotides long. These very short reading frames would make proteins only 60 to 70 amino acids in length, which is much shorter than the norm. They also noticed that many short reading frames terminated with the same stop codon UGA. If they reassigned UGA to glycine, however, all the genes became normal in length. Now that they knew such a thing might happen, they looked for other examples in the wide wonderful world of genomic samples and they found many other cases where it appeared organisms had non-canonical genetic codes.
How might this happen? A mutation would have to occur in a gene encoding a tRNA. In the case of the stop codon UGA, it is one base different from the codon that specifies glycine, GGA. If a tRNA(gly) that normally recognizes GGA mutates so as to recognize UGA, that tRNA(gly) would insert glycine at UGA codons and prevent termination. Not so bad, you say? In bacteria, many transcriptional units are one long message containing the coding sequences for multiple genes. If a stop codon is incorrectly read in such messages, proteins end up as concatenated chains. UGA represent 29 percent of all stop codons in E. coli, so that could make a fine mess.
In these newly identified genomes all the UGAs or all the UAAs or all the UAGs appear to be read as amino acids. Which it is depends on the organism. Bacteria seem to favor replacing UGAs. Another thing -- the termination factor(s) that normally bind UGA and end translation have been inactivated. Otherwise there would be competition between the mutant tRNA and the termination factor(s) for UGA binding.
Thus, in bacteria with these alternate genetic codes all UGA stop codons are treated the same way. This only works, however, because none of those UGA codons are still supposed to be stop codons. In order to terminate proteins properly, any necessary stops must be either UAA or UAG, the remaining stop codons.
There are some puzzles here. If this kind of universal reassignment of tRNA(gly) to UGA were to happen suddenly in E. coli, and the termination factor(s) for UGA were to be inactivated, 29 percent of all proteins would potentially become concatenated with other proteins. This would be a catastrophe.
A smaller version of this can happen in E. coli. We know because we sometimes see suppressor mutations that restore truncated proteins to full length. Somewhere else in the same strain a mutation happens that makes a tRNA recognize the particular stop codon present in the truncated protein's coding sequence. The mutant tRNA "suppresses" the early termination by inserting its particular amino acid.
Suppressor strains are sickly, because not just the truncated gene gets affected. Other proteins that should be terminated are not because of the mutated tRNA. These strains can easily die out or revert unless maintained by strong selection for their suppressive effect. That is, if the mutation that they are suppressing is bad enough, then being sick is better than being dead.
However, the kind of change discussed in this article is far more radical: the organism now uses just two stop codons, and the missing former stop codon now specifies an amino acid instead. That means that if a switch occurred, any gene that used the missing stop codon had to substitute one of the other stop codons in its place. Genome wide.
It's very difficult to change codes in midstream, so to speak, especially with respect to stop codons. To get a code switch like this right, the decoder (the tRNA) and the thing being decoded (the genome) have to switch the function of the signal (the codon) simultaneously and universally.
As an example of what can happen if a code switch occurs suddenly and universally, suppose your bank keeps records of your transactions with tabs to separate different deposits.
100 1020 30 400 550 etc.
A computer virus rewrites the code for your account so that where there were tabs you now have a 2, for example.
The above string becomes
The meaning is lost and the bank thinks you just stole the U.S. Treasury. You get arrested and thrown in jail, and all your assets are frozen.
Drawing of a tRNA molecule, taken from Meyer and Nelson (2011).
The Science paper implies that switches of this kind have arisen multiple times in evolution. It should be a very, very rare event, though, because of the number of steps involved for it to have happened gradually. For example, here's one scenario:
- One out of a four tRNA(gly) genes switches to recognizing UGA. The resulting mutant tRNA(gly) then rescues a truncated protein that has an early termination codon (UGA) somewhere in its sequence.
- Because there are four tRNAs in E. coli that recognize GGA codons, if one mutates, there still would be three left unmutated, so GGA codons still would be read as glycine. And not all UGAs would be read as glycines either, because the termination factor for UGA would still be around, and could bind to some proportion of the messages that use UGA as a stop codon, and properly terminate their proteins. But the mutant strain would be sick.
- Under pressure from the detrimental effects of the mutant tRNA, some other genes could change their stop codons from UGA to UAA. Since these genes would no longer be affected, those cells would survive and the bacterial strain would get healthier.
- The cycle could continue. Another tRNA(gly) gene might mutate, and the strain would get sicker. Any change of UGAs to UAAs could potentially alleviate this, though how much is unknown.
- Perhaps the abnormal UGA-tRNAs might prevent some bacterial viruses from reproducing in them. But in order to be effective in messing up viral translation, the abnormal tRNAs would probably be just as effective at messing up the host's translation.
- Finally, the strain could lose the particular termination factor(s) that recognize UGA, and then any remaining messages with UGA codons would be forced to convert or adapt to concatenation or die.
Notice, this all depends on a sickly strain competing against healthy sisters in the wild long enough for other genes to mutate their stop codons. It's hard to maintain these strains in the lab, where competition is reduced and food is plentiful, so no one knows how likely this might be.
And to change the code across the whole genome is wildly improbable. Changing a UGA to UAA would have to happen at least 1160 times (29 percent of E. coli's more than 4000 genes use TGA, which is UGA in RNA). The steps in this scenario are testable, but they would need to be demonstrated as feasible under realistic conditions.
Why do I go into all of this? The paper seems to imply that this process has happened multiple times. They report that the distribution of species with these non-canonical codes is patchy across evolutionary trees, suggesting multiple origins. But if it's very hard for such a transition to happen once, how likely is it that it has happened multiple times? The paper does not address how or why that might be the case.
Could there be another explanation? Perhaps all these species had unique codes to begin with. Now there's a thought that should bring us to a full stop.