Everyone has heard of DNA, but many don’t appreciate its marvelous design. It stores all the information an organism needs to make proteins, regulate how they are made, and control how they are used. It does this by coding biological information in sequences of four nucleotide bases: adenine (A), thymine (T), guanine (G), and cytosine (C). The nucleotide bases link to one another in order to hold DNA’s familiar double-helix structure together. A can only link to T, and C can only link to G. As a result, the two linking nucleotide bases are often called a base pair. DNA’s ingenious design allows it to store information in these base pairs more efficiently than any piece of human technology that has ever been devised.
What you might not realize is that pretty much any information can be stored in DNA. While the information necessary for life involves the production, use, and regulation of proteins, DNA is such a wonderfully-designed storage system that it can efficiently store almost any kind of data. A scientist recently demonstrated this by storing his own book (which contained words, illustrations, and a Java script code) in the form of DNA.1
The way he and his colleagues did this was very clever. They took the digital version of their book, which was 5.27 megabits of 1’s and 0’s, and used it as a template for producing strands of DNA. Every time there was a “1” in the digital version of the book, they added a guanine (G) or a thymine (T) to the DNA strand. Every time the digital version of the book had a “0,” they added an adenine (A) or a cytosine (C). Now unfortunately, human technology cannot come close to matching the incredible design of even the simplest living organism. As a result, while living organisms can produce DNA that is billions of base pairs long, human technology cannot. It can produce only short strands of DNA.2 So while a single-celled organism could have produced one strand of DNA that contained the entire book (and then some), the scientists had to use 54,898 small strands of DNA to store the entire book.
Of course, just storing the information in DNA form wasn’t enough. In order to show that the data storage actually worked, they used a completely separate process to read the DNA, convert the information in the DNA back into 1’s and 0’s, and produce a new copy of the book. The entire process worked incredibly well. Of course, it was very slow. It took about two weeks to both store the book in DNA form and then read the DNA back to reproduce the book. Nevertheless, it demonstrated that DNA can store pretty much any kind of information, and it demonstrated that DNA is incredibly efficient at doing so.
The entire book was stored in less than a trillionth of a gram of DNA, leading the authors to state that the theoretical limit of DNA’s storage capabilities is 455 exabytes per gram. In case you aren’t familiar with the term, an exabyte is a billion gigabytes. Think about that for a moment. I am incredibly impressed with my little flash drive that can hold 16 gigabytes of information. DNA can store more than a billion times that amount in a fraction of the flash drive’s mass. In fact, if the author’s estimate is correct, all the information produced in the entire world over the course of a year could be stored in a mere 4 grams of DNA!
It’s no wonder the authors wanted to show that their storage and retrieval process could work. If we ever get to the point where human technology can produce and read DNA at even a fraction of the speed that a single-celled organism can do so, the information storage possibilities would be mind-blowing! As the authors state:
DNA is particularly suitable for immutable, high-latency, sequential access applications such as archival storage. Density, stability, and energy efficiency are all potential advantages of DNA storage…
Indeed. The design of DNA is truly astounding. It only makes sense that we should try to use it in our own technology, which is primitive by comparison.
REFERENCES
1. George M. Church, Yuan Gao, and Sriram Kosuri, “Next-Generation Digital Information Storage in DNA,” Science DOI:10.1126/science.1226355, 2012.
Return to Text
2. Daniel G. Gibson, et al., “Complete Chemical Synthesis, Assembly, and Cloning of a Mycoplasma genitalium Genome,” Science 319:1215-1220, 2008.
Return to Text
I want dibs on the DNiPod. Of course, it’ll be able to store more songs than have ever been sung in the history of the world, but still, talk about bragging rights! haha just kidding, but really, wow.. that could be something! Incredible.
DNiPod. That’s awesome!
Wouldn’t it be possible to store 2 bits in a single DNA character, much as a “nibble” can be rendered as a single Hexadecimal digit? E.g. 00 = G, 01 = T, 10 = A, and 11 = C.
That should presumably mean we can store all that information in 2 grams of DNA.
Josiah, that is correct. However, there is currently a technological problem with that. It turns out that current technology has a difficult time producing and reading DNA with a lot of guanine/cytosine base pairs in close proximity or a long, repeating sequence. When you force each nucleotide base to represent one unit of information, it is possible to run into such situations. I would expect that as technology improves, those difficulties will be overcome, and your suggestion will be implemented. Right now, however, allowing flexibility in choosing from two possible base pairs for each unit of information allows us to avoid the technological bugs.
Would some sort of special methods be needed to preserve information stored in this way? When I shared this story with a friend, he suggested that the DNA would “expire” at some point, making the information stored on it unusable. My understanding was that DNA, being simply a large molecule, could be preserved more or less indefinitely—the still-viable seeds found in the Pyramids come to mind. But then, those seeds were stored in a very dry, dark, airtight environment; and seeds stored the way we normally store them don’t last nearly as long. Then again, a seed contains a lot more than just DNA; other factors could contribute to its eventually going bad. My question, then, is does the DNA itself have a shelf life or require special preservation methods to remain usable? If so, that would be one challenge to overcome before it became a widespread means of data storage.
Michael, there would be no special methods needed to preserve the information over a long period of time. DNA does degrade, but it typically takes a few thousand years to do so. I doubt that any digital system would survive for a thousand years without regular maintenance. I assume a DNA system would need maintenance as well, but probably less than a digital one. You are right that seeds stored normally don’t last for thousands of years, but that’s not because of DNA degradation. To get the germination process started, certain proteins must be functional in the seed. Those can degrade over a period of many years. The DNA is particularly resistant to degradation.
Even if it degrades to some extent, that doesn’t mean all is lost. As the scientists say in their paper, “Unlike most digital storage media, DNA storage is not restricted to a planar layer, and is often readable despite degradation in non-ideal conditions over millennia.”