Want to Get 70 Billion Copies of Your Book In Print? Print It In DNA

I have been meaning to read a book coming out soon called Regenesis: How Synthetic Biology Will Reinvent Nature and Ourselves. It’s written by Harvard biologist George Church and science writer Ed Regis. Church is doing stunning work on a number of fronts, from creating synthetic microbes to sequencing human genomes, so I definitely am interested in what he has to say. I don’t know how many other people will be, so I have no idea how well the book will do. But in a tour de force of biochemical publishing, he has created 70 billion copies. Instead of paper and ink, or pdf’s and pixels, he’s used DNA.

Much as pdf’s are built on a digital system of 1s and 0s, DNA is a string of nucleotides, which can be one of four different types. Church and his colleagues turned his whole book–including illustrations–into a 5.27 MB file–which they then translated into a sequence of DNA. They stored the DNA on a chip and then sequenced it to read the text. The book is broken up into little chunks of DNA, each of which has a portion of the book itself as well as an address to indicate where it should go. They recovered the book with only 10 wrong bits out of 5.27 million. Using standard DNA-copying methods, they duplicated the DNA into 70 billion copies.

Scientists have stored little pieces of information in DNA before, but Church’s book is about 1,000 times bigger. I doubt anyone would buy a DNA edition of Regenesis on Amazon, since they’d need some expensive equipment and a lot of time to translate it into a format our brains can comprehend. But the costs are crashing, and DNA is a far more stable medium than that hard drive on your desk that you’re waiting to die. In fact, Regenesis could endure for centuries in its genetic form. Perhaps librarians of the future will need to get a degree in biology…


(Link to Church’s paper)

Photo by Today is a good day – via Creative Commons

19 thoughts on “Want to Get 70 Billion Copies of Your Book In Print? Print It In DNA

  1. The DOI link seems busted now so here is a direct link to the abstract at Science:

    Next-Generation Digital Information Storage in DNA
    George M. Church,
    Yuan Gao,
    Sriram Kosuri

    On another topic, I wonder how long that info encoded in DNA would last if inserted into a bacteria – if one grew bacteria with it, in say 10,000 years would the information still be recoverable from the descendants?

    [CZ: Thanks. I fixed the link. The information would probably degrade pretty fast in bacteria, because of the high mutation rate and strong selection against extraneous books.]

    [MB: the info would probably be impossible to recover from 1 10,000 year descendant because of degradation. However if you had maintained a colony of 1,000,000 descendants could you statistically recover it?]

  2. Seems amusing (to me) I can read this in hard copy or in DNA, but not in Kindle. It is like jumping from writing on clay tablets to making electronic copies, and by-passing the paper-bound book entirely.

  3. Mindblowing!

    If there would be databases in DNA, perhaps programmers and database administrators will need to get a degree in biology.

  4. how can zero have an absolute value when zero has no bottom,,,,id need to go into much more detail to put forward a complete case,,,but zero is equivalent to infinite,,,bottomless well,,no boundary,,so how can we assume 0=0 when we cant see the vector only the viod,,,whatever edge we do see,is 0+x x being plank space than defines zero…in other words,,lets reverse this,,,,imagine youre at the bottom of the ocean,,,how could you possibly find a single drop of water,,,in this case 1=1-x2 without the zero to define 1 one would never equal 1,,,always x=0 relative to position…zero has two main values of +,and -,,,can be either or but always a set and value never =0

  5. I disagree – electronic will ALWAYS trump chemical storage and I am a chemist and a bioinformatician – the only biological computers we will use are human brains and maybe augmented ones but at the molecular level this is not going to happen. The reason is energy cost for transfering the information in and out. It costs a lot less to do a zero to one switch electronically than make any chemical bonds. This is a very costly process in energy.

    I also disagree about stability. A solid state drive is perfectly stable for a long time. DNA decomposes and I would certainly not expose it to variations in heat or commonly found biological elements present in the environment. We do have prehistoric DNA but the limit is about 40,000 years and the degredation is serious.

    The resequencing accuracy is impressive but the human genome at 9000,000,000 characters would still have 2000 errors.

  6. @Andrew, I’m quite sure I’ll read earlier the message 6000 years mammoth re-birthed than same about any piece of electronics build after 1990. Not to mention the former will have better skills.

  7. @Andrew, solid state drives are not more stable than organic memory. That’s why they’re not used on the ISS or any other space vehicle that also can carry humans. Indeed, they’re not used in space at all.
    Radiation causes changes in the data state of the solid state drive. It causes DNA damage in organisms, but it appears that DNA repair mechanisms manage to keep up with the damage, as evidenced by the continued survival of experimental organisms and humans returning after extended stays in space.

    The key in maintaining DNA in a stable form is to have the DNA data necessary for the organism’s survival be part of the data. As an example, if the book data were part of a cell’s DNA, but the book data was a required part of, say the ATP processing train (and other key systems as well), it would have to be conserved.
    Of course, to write that book into an organism’s DNA and all be key to the organism’s survival only if the book remained intact would be tricky in the extreme, to put it mildly.
    DNA remains robust in a living organism. If it weren’t, there wouldn’t be even much cellular life. Indeed, when one considers extremophiles and their environment not degrading their DNA, DNA is proved extremely robust! DNA only breaks down under the most extreme conditions, cellular death, extremely high radiation flux, the most reactive chemical environments. For the latter, it’s still dicey, considering how many bacteria spores can survive non-destructive sterilization methods.

    As to the resequencing accuracy, it’s far better than any encyclopedia I’ve ever looked at! Indeed, it’s pretty close to human DNA replication error rates (though many are corrected by cellular mechanisms), as I recall from research I’ve read over the years.

    Of course, the data would only remain intact over time if protected by existing inside of a living organism where the intact data is key to the survival of the organism, where corruption results in the death of the organism, thereby remaining tightly conserved.

    Update, I put the URL for a study of DNA replication error rates and correction rates. The corrected error rate is a bit lower than 2000 in 9 billion. The corrected error rate is extremely low.

  8. A back of the envelope estimate for how stable the information would be after 10,000 years in generations of bacteria: Church’s experiment split the data into 96 bit chunks, each with a 19 bit address that was used to put the pieces back together, which would require 115/96 times 5.27 megabits (not MB as it says in the article) or about 6.3 million base pairs for storage for one copy of the book.

    If the data is stored in e. coli in some place in the genome that would not affect their viability, the entire 6.3 megabit would not fit in the 4.2Mbp e. coli genome. But the book could be spread over as many individual bacteria as necessary. According to the blog post at http://sandwalk.blogspot.co.nz/2007/07/mutation-rates.html the rate of neutral mutation (mutations that have no positive or negative effect on viability) in e. coli after accounting for the DNA repair mechanism is approximately 1 out of every 1e10 nucleotide base pair replications. Assuming 24 hours per generation (a more realistic real world number than the faster replication under laboratory conditions) and approximating 365.25 days in a year, that is about 3.6 million generations, or a total of 6.3 million times 3.6 million nucleotide replications, for an estimated number of errors at the end of 3.6e6 times 6.3e6 times 1e-10 which is 2268 errors in each 6.3 million bit copy of the book. or less than a 0.04% error rate. That is a tiny amount that can be compensated for with a simple error correction code that would not add many bits to the total. By comparison, the Reed-Solomon code used for the first stage error correction of an audio CD requires 32 bits in all to encode 28 bits of data, and it can correct up to 2 bytes of error out of every 32 byte block.

    So if you could somehow arrange to maintain the proper environment to support the colony of e. coli for 10,000 years and have someone around to decode them at the end, the book should be readable.

    [CZ: Just to clarify–Church and his colleagues did not insert the DNA into a living organism. The molecule sits in a chip.]

  9. “john naddaf Says:
    how can zero have an absolute value when zero has no bottom,,,,id need to go into much more detail…<snip>…zero has two main values of +,and -,,,can be either or but always a set and value never =0”
    I wish I’d said that.


  10. Now if we can get the DNA book to bring itself up to date and keep bibliography and references up to date by keeping it in a “library soup” with all the other DNA books in the library, he may be onto something.

  11. If we could use DNA proof reading from organisms easily we would be in a much better state for sequencing and amplification in general. Including it in an organism would be tricky as it is not coding DNA – so it needs to be in regulatory space or in introns and this makes it more likely to undergo mutations and rules out putting it in bacteria. It is an interesting proof of concept but a concept I do not expect to see used very often.

    I am sure that Arcticio will never see a mammoth. They might see an elephant with mammoth DNA but a mammoth will never come back. That is the big problem about the DNA codes for an organism viewpoint – it is the same as a blueprint codes for my house. It doesn’t unless I have a builder. So unless someone finds a mammoth egg that is not going to happen.

    As for stability – it depends on storage. A living organism is very environment dependent. A nutrient free organism is a dead one but my ssd does not need a sandwich or a drink. The shuttles used regular hard drives for storage and they took more than a few shocks and could be restored, organic storage is much less effective and also much more error prone at making subjective interpretation of observations and data.

    I am wondering how you can fiind out the DNA replication error rates at a level which is below the accuracy of the measurement device (the sequencing device itself). Surely that is impossible. I cannot measure to the nearest nanometre on a metre rule. Somatic mutation rates are difficult to estimate and we always sequence populations. There is a move in recent papers to sequence single cells to compare them for circulating cancers for example and there can be large differences.

Leave a Reply

Your email address will not be published. Required fields are marked *