A little over a month ago, we reported on advances in using DNA as a high-density information storage medium, and predicted that “the race for better DNA storage products will accelerate once it moves from lab to market.” Today there’s more news as market forces are stepping on the gas. Even more exciting, DNA is finding uses for more than just data storage.
DNA in Information Technology
A major improvement in DNA information storage and retrieval has been announced in Science Magazine. Two scientists from Columbia University and the New York Genome Center report a new high bar for DNA data storage — over two orders of magnitude better than previous attempts. They were able to encode text, images, a movie, and an operating system in 2 megabytes of DNA, and retrieve it back perfectly in multiple trials. News from Columbia University Data Science Institute says,
Humanity may soon generate more data than hard drives or magnetic tape can handle, a problem that has scientists turning to nature’s age-old solution for information-storage — DNA.
In a new study in Science, a pair of researchers at Columbia University and the New York Genome Center (NYGC) show that an algorithm designed for streaming video on a cellphone can unlock DNA’s nearly full storage potential by squeezing more information into its four base nucleotides. They demonstrate that this technology is also extremely reliable. [Emphasis added.]
In a video clip in the article, Yaniv Erlich describes what they did and why it is significant. The video teases with the words, “All your data in a drop of DNA.” The information storage capacity of DNA, its compactness and durability make it highly desirable for the next generation of computers. It can last for many thousands of years if kept in a cool, dry place.
“DNA won’t degrade over time like cassette tapes and CDs, and it won’t become obsolete — if it does, we have bigger problems,” said study coauthor Yaniv Erlich, a computer science professor at Columbia Engineering, a member of Columbia’s Data Science Institute, and a core member of the NYGC.
Erlich and Zielinsky describe why the practical limit of Shannon information in DNA is 1.8 bits per nucleotide (not 2, as might be expected, due to indexing and error-correction requirements). Their algorithm achieved 1.6 bits per nucleotide, pretty close to the theoretical limit.
They first compressed all their input into a single Linux archive and chopped it into 32-byte segments. To translate binary to DNA, they created a simple rule that represents pairs of binary digits (00, 01, 10, 11) as the four DNA nucleotides (A, C, G, T). After the translation, they sent the codes across the country to a company to generate the DNA.
In all, they generated a digital list of 72,000 DNA strands, each 200 bases long, and sent it in a text file to a San Francisco DNA-synthesis startup, Twist Bioscience, that specializes in turning digital data into biological data. Two weeks later, they received a vial holding a speck of DNA molecules.
To retrieve their files, they used modern sequencing technology to read the DNA strands, followed by software to translate the genetic code back into binary. They recovered their files with zero errors, the study reports. (In this short demo, Erlich opens his archived operating system on a virtual machine and plays a game of Minesweeper to celebrate.)
Their success relies on a technique called “fountain codes” (see the paper from Harvard) that are widely used by streaming technologies. Unlike earlier transmission algorithms, fountain codes do not require corrupted or missing packets to be re-sent. The algorithm just sprays out the packets like droplets in a fountain. Erlich even calls the packets in their “DNA Fountain” droplets, composed of DNA oligonucleotides of a fixed length (200 bases). We should note that these droplets were specified in “DNA text” before being converted to actual molecules.
Included with each droplet is a 4-bit “seed” or barcode of extra information, provided by a transform algorithm first applied to the input data, plus two more bits for error correction. The extra six bits allows the receiver to reconstruct the sequence upon retrieval. This slight amount of overhead ensures error-free reconstruction of the original information — be it photos, movies, music, text, or even an operating system.
In addition to higher storage density, the Columbia team also reached new heights of reliability. They demonstrated the method’s robustness by making copies of copies seven times in series. Even after this “deep copying,” they were able to retrieve the data without error.
To summarize, in this work, we reported an efficient and robust coding strategy that enables virtually unlimited data retrieval and high physical density while approaching the Shannon capacity of DNA storage closer than any previous design. We tested our framework with a relatively large file compared with those used in previous studies and were able to perfectly recover the data under various tests. Implementing our approach in concert with long-term preservation techniques, such as DNA embedding in silica beads, might require further fine-tuning of the redundancy levels. We expect that such fine-tuning can benefit from the high flexibility of the DNA Fountain framework, which allows determination of virtually any redundancy level without changing the software or affecting the decoding time.
Will companies flock to this technology? The main limitations are cost and speed. But remember the first silicon computers? Because DNA is so attractive, we can surely expect that inventors will find faster, cheaper ways to synthesize custom DNA strands. Read-out of DNA with PCR and cutting-edge sequencing techniques have already become cheap and easy since the Human Genome Project made it seem a formidable “moonshot” type of project. The authors expect that entrepreneurs will find “the sweet spot that optimally combines the highest chemical throughput and the highest possible rate (bits per nucleotide) of the coding design.” When that happens, “DNA might become an economically viable solution for long-term, high-latency storage.”
While we’re considering storage, why not consider computing with DNA? A short article on Futurism.com announces, “Self-Replicating ‘DNA Computers’ Are Set to Change Everything.” Read about how researchers at the University of Manchester are “working on turning strands of DNA into the next basis for computing.” Because DNA’s double helix can take two paths simultaneously, it can solve problems quicker. That’s cool, but imagine if progress in biological input and output technology someday allows DNA computers to access DNA storage. (Hey; is that what happens in life?)
DNA in Electrical Engineering
Did you know that DNA conducts electricity? News from Arizona State begins with this remarkable statement: “DNA, the stuff of life, may very well also pack quite the jolt for engineers trying to advance the development of tiny, low-cost electronic devices.” A team led by Nongjiang Tao succeeded in making a controllable electrical switch out of DNA that is 1,000 times smaller than a human hair. You can read about their “engineering feat” in the open-access journal Nature Communications.
See also “Switched-on DNA” in an article from ASU’s well-named biomimetics division, the Biodesign Institute. This multi-million-dollar institute, with 800,000 square feet of research space and over 600 employees and 62 tenured professors (one a Nobel laureate), is a good example of the gold rush in “bioinspired innovation.” Look for intelligent design concepts in the Overview page:
Created on the premise that scientists can overcome complex societal issues by re-imagining the “design rules” found in nature, the institute’s researchers are addressing an expansive array of global challenges by creating “bioinspired” solutions,” ….
[Mission Statement:] The Biodesign Institute at ASU addresses today’s critical global challenges in healthcare, sustainability and security by developing solutions inspired from natural systems and translating those solutions into commercially viable products and clinical practices.
DNA in Mechanical Engineering
Finally, last month we mentioned that a Purdue University team, “inspired by natural biological motors” like kinesin, made a “DNA walker” of their own. They envision using such synthetic molecular machines in biomedical and industrial applications, perhaps even as vehicles to deliver anti-cancer drugs inside the body.
A Remarkable Molecule
It stores information, it conducts electricity, and it moves. The fact that DNA’s sequences of molecular bases can be represented in letters (A, C, T, G), converted into binary digits (0, 1), and read out into a computer that can boot it up as an operating system and play a movie is really exciting. Erlich and Zielinski state that the theoretical information storage capacity of DNA is 680 petabytes (680 x 1015) per gram. Imagine 680 million billion bytes of information in a speck the weight of a paperclip! No more long hallways of hard drives; the world’s information will fit in one room, or even a shoebox.
Theoretically, we note as an aside, single atoms could be carriers of information. However, a review in Nature looked into a recent test of this, and decided that it could only work in extreme conditions, “such as in an ultrahigh vacuum.” The atom would have to be mounted to a molecular surface anyway. In addition, serious limitations at the atomic scale would have to be worked out to reliably encode and decode information: controlling magnetic isotropy, molecular vibrations, and interactions between the molecule and its surroundings. “These are antithetical requirements whose fulfillment will not be straightforward,” so don’t look for progress at that scale any time soon.
Back to DNA — the key to understanding its usefulness is its stored sequence. Random nucleotides do nothing. They don’t convey information, they don’t walk, and they don’t form electrical switches. Only by intelligently designing algorithms to carefully control the encoding and decoding of the sequence were these engineers able to accomplish what they did.
Erlich and Zielinsky spoke of “evolutionarily optimized machinery” that faithfully replicates information. Perhaps they had to say that to pass the AAAS peer review process. We expect, however, that they would not be pleased if someone turned around and said that their achievement was done by blind, unguided processes. No; DNA is, in a sense, better understood as “Designed Natural Algorithms.”
Image credit: ColiN00B via Pixabay.