A Blog by

“Find and Replace” Across An Entire Genome

It couldn’t be easier to make sweeping edits on a computer document. If I were so inclined, I could find every instance of the word “genome” in this article and replace it with the word “cake”.

Now, a team of scientists from Yale and Harvard Medical School have done a similar trick for DNA. Geneticists have long been able to edit individual genes, but this group has developed a way of rewriting DNA en masse. And they’ve used it to recode the entire cake genome of a bacterium.

Their success was possible because the same genetic code underlies all life. It’s written in the four letters (nucleotides) that chain together to form DNA: A, C, G and T. Every set of three letters (or ‘codon’) corresponds to a different amino acid, the building blocks of proteins. For example, GCA codes for alanine; TGT means cysteine. The chain of letters is translated into a chain of amino acids until you get to a stop codon. These special triplets act as full stops that indicate when a protein is finished.

This code is virtually the same in every gene on the planet. In every human, tree and bacterium, the same codons correspond to the same amino acids, with only minor variations. The code also includes a lot of redundancy. Four DNA letters can be arranged into 64 possible triplets, which are assigned to only 20 amino acids or a stop codon. So for example, GCT, GCA, GCC and GCG all code for alanine. And these surplus codons provide some wiggle room for geneticists to play around with.

The Harvard and Yale team, led by Farren Isaacs and George Church, started TAG with TAA throughout the entire cake genome of the common gut bacterium Escherichia coli. Both are stop codons, so there’s no noticeable difference to the bacterium – it’s like replacing every full stop in this post with… a slightly different full stop. But to the team, the cake- genome-wide swap freed up the TAA codon, so that they could reassign it to other amino acids, beyond the usual 20.  And that opens up many possible applications for what they’re calling a “genomically recoded organism” or GRO.


The team is pursuing three applications. First, by assigning codons to new amino acids, they can create a wider range of proteins than the ones living things currently use. These, in turn, could produce new types of drugs or substances, from polymers that can deliver drugs to specific parts of the body to coated surfaces that can prevent the growth of microbes. The idea is that new amino acids will provide chemists and engineers with more options for achieving these goals, just like adding new colours to an artist’s palette changes the range of things they can paint. “You can imagine converting these recoded organisms into factories for producing materials with new and exciting properties,” says Isaacs.

Second, the team could use the tweaked genetic codes to make living things resistant to viruses. Viruses make copies of themselves by hijacking the protein-making factories of their hosts. They depend on the fact that their proteins are encoded by the same triplets as those of their hosts. If their hosts stray from this universal genetic code, their factories will mangle the virus’s instructions, creating distorted and useless proteins. That would be useful for industry as well as medicine. The biotechnology company Genzyme had to shut down a manufacturing plant for several months after it was hit by a contaminating virus. Millions of dollars were lost.

And sure enough, the team’s recoded microbes were less susceptible to at least one type of phage—a virus that kills bacteria. They weren’t invincible by any means, but the colonies did take longer to die. The effect was small, but not unexpectedly so. The TAG codon is rare (which is why the team started with it) and only found at the end of genes. Reassigning it shouldn’t have done that much to hamper a virus. But it did, which suggests that bigger changes might be even lead to complete protection.

Third, and for similar reasons, the altered codes could be used to contain genetically modified organisms, preventing them from breeding with wild populations. It’s the geneticist’s version of the Tower of Babel story – modified creatures would be imprisoned by their own genetic tweaks, unable to productively exchange genes with natural counterparts.


The recoding relied on two complementary technologies, invented by the team – MAGE, which substitutes TAA for TAG in separate pieces of bacterial DNA, and CAGE, which knits the pieces together into a whole genome.

MAGE, the older of the two techniques, made its debut two years ago. It stands for “multiplex automated genome engineering”, a fancy way of saying that it can easily change a genome many times over. It was originally used to create millions of small variants of bacterial genomes, producing a multitude of strains that can be tested for new abilities. As Jo Marchant puts it in her excellent feature, it’s an “evolution machine”. In its debut, within a matter of days, it had evolved a strain of E.coli that would produce large amounts of lycopene, a pigment that makes tomatoes red.

MAGE is a versatile editor. Not only can it create many diverse changes in a group of cells, it can also create many specific changes in a single cell. That’s what the team have now done. TAG appears in 321 places throughout the E.coli genome. For each one, the team created a small stretch of DNA that had TAA instead of TAG, surrounded by exactly the same letters. They fed these edited fragments into bacteria, which used them to build new copies of their own DNA. The result: daughter bacteria with edited genomes.

In this way, they created 32 strains of E.coli that, between them, had every possible substitution of TAG to TAA. This might seem overly complicated, but replacing every TAG with TAA in a single step would be inefficient, slow, and error-prone. A single mistake could be lethal for the microbes. By taking things slowly, and spreading the substitutions among 32 strains, the team could better troubleshoot any tricky snags.

To combine the 32 strains into one, the team developed CAGE (or “conjugative assembly genome engineering”). The technique relies on the bacterial equivalent of sex – a process called conjugation where two cells sidle up, form a physical link between one another, and swap DNA.

The team matched their 32 strains up in pairs, in a league that looked like a knock-out sports tournament. One strain of each pair would deliver its edited genes into its partner, and the incoming genes were designed to merge with those of the recipient in specific ways. Thirty-two strains with 10 edits each became sixteen strains with 20 edits each. Sixteen turned into eight and eight into four.

When I first wrote about this in 2011, the team reached this “semi-final” stage. They had four strains of E.coli, each with a quarter of its genome stripped of TAG codons. Now, they’ve gone all the way, producing a single strain where every TAG is now a TAA. They also managed to get rid of release factor 1 (RF1), a protein that recognises TAG as a stop signal and halts the production of whatever protein’s being made.

The recoded microbe picked up 355 mutations along the way, but it seemed outwardly normal and  reproduced at a healthy pace. With TAG free from its duties as a punctuation mark, the team could reassign it to new amino acids, just as they planned. “In a plug and play manner, you can start to pop in new amino acids with new chemistries,” says Isaacs.

And as the team hoped, the new strain was more resistant to viruses than normal ones… but not completely resistant. To realise the ultimate goal of making virus-proof or genetically-contained organisms, they’ll have to do much more than replace one stop codon.

What next?

Next, the team need to start recoding the “sense codons”—the ones that actually correspond to amino acids.  And that is a lot harder. If you alter these sequences, you could screw up how genes are switched on or off, how efficiently or accurately they’re used to make proteins, how well those proteins work once they’re made, and more. And since bacterial genes overlap a lot, if you change a single instance of a single codon, you could be messing up three different genes at once. “There are a lot of things that can go wrong, and that’s not even an exhaustive list,” says Marc Lajoie, the lead author of the new research. “It’s just the stuff we know about.”

Also, sense codons are far more common than stop codons. E.coli has 321 instances of TAG in its genome. Add the next rarest codons—AGA and AGG—and you have upwards of 5,000 changes to make. If you want to recode just the 13 rarest ones (which the team calls the “forbidden codons”), you’d have to make 155,000 changes. Things get difficult fast.

To start with, Lajoie and Siriam Kosuri tried to recode the forbidden codons—completely substituting them for replacements that code for the same amino acid. And rather than doing it across the entire E.coli genome, they focused on recoding just 42 essential genes, one at a time. That makes for a manageable total of 405 changes rather than 155,000. Still, this is the sort of experiment where you imagine scientists interlacing their fingers, stretching their arms out to crack all of their knuckles, and then getting down to it.

“Changing TAG throughout the entire genome was a way of getting our feet wet. That project was intended to succeed,” says Lajoie. “In this one, we were actually looking to fail.” They wanted to see what would work and what wouldn’t.

They found that 26 of the 42 recoded genes were successful—that is, bacteria that carried them survived and, on average, grew just 20 percent slower than their normal kin. And perhaps more importantly, every single one of 405 forbidden codons could be recoded either individually or in small groups. None of them in itself was a deal-breaker. All of them could be replaced to an extent.

“That was a surprise and very encouraging to us,” says Lajoie. It means that all of these are “amenable to genome-wide removal”. The circumstances that determine success or failure will lie in the quirks of each specific gene, and can potentially be dealt with.

“Through this tour de force of genome engineering, they’ve essentially shown that there are no large fundamental barriers to codon reassignment,” says Chang Liu, a biomedical engineer from the University of California, Irvine. “Rather, it is an exercise in overcoming an array of small hurdles, each of which we already have the technology to address.”

The team is now building on this pilot, and start replacing sense codons across the entire E.coli genome. That will allow them to take their technique from the world of impressive demos into actual applications. But more than that, it will help them to probe the very nature of our genetic code. How did it evolve? Why is it structured the way it is, with three letters to a codon? And how malleable is it? “Only now do we have the ability to start making fundamental changes to the code and seeing the consequences,” says Isaacs.

Lajoie adds, “We’re only starting to see all of the tangled constraints that determine how genomes work. Nobody understands the full complexity – that’s why it’s so difficult.”

Reference: Lajoie, Rovner, Goodman, Aerni, Haimovich, Kuznetsov, Mercer, Wang, Carr, Mosberg, Rohland, Schultz, Jacobson, Rinehart, Church & Isaacs. 2013 Genomically Recoded Organisms Expand Biological Functions. Science http://dx.doi.org/10.1126/science.1241459

Lajoie, Kosuri, Mosberg, Gregg, Zhang & Church. 2013. Probing the Limits of Genetic Recoding in Essential Genes. Science  http://dx.doi.org/10.1126/science.1241460

Note: This post builds upon an earlier one published in 2011. A bit like science, then.

15 thoughts on ““Find and Replace” Across An Entire Genome

  1. I think there is something missing from this. “TAG” doesn’t just show up in stop codons, but it also shows up out-of-frame within genes. It could show up with the T at the end of one codon, and the AG in the beginning of another. So perhaps some of the disrupted genes could have had some of their other codons modified by their find-and-replace approach?
    Anyhow, I imagine that much more specific and rare-copy genome editing will be very useful, and I would regard this experiment as a success.

  2. Exciting, but I can’t help but feel just a bit of a frisson, a slight pause to wonder… might this be a case of a little knowledge being a dangerous thing? We don’t know what all we don’t know about life, and how viruses interact between living things, or how horizontal gene transfer might come about, perhaps faster than we think possible… are we being like precocious children who’ve discovered some programming tricks and merrily begun “souping up” dad’s computer?

  3. I have to say, this sounds like a really bad idea. First, the redundant coding makes genes resistant to SNP mutations, since they often end up coding the same protein anyway. Removing the redundancy will make them more prone to mutations that actually change function.

    Second, if this makes bacteria resistant to attack by viruses, might it not also make them resistant to attack by existing antibiotics, and by our immune system? Talk about playing with fire.

  4. Exactly, David Bump! The other day somebody sequenced the botulism bacterium (Shhh! They’re keeping it secret for national security.) Then there’s Craig Venter and his Digital Biology Converter that spews out DNA from computer code. What ever happened to the great precautions the original microbiologists used when they started messing with genes?

  5. In regard to the comments here, the “find and replace” approach only targeted TAG codons residing in protein coding genes, not just any instance of TAG in the genome. There are approximately 320 TAGs in protein coding regions of the E coli genome. TAGs that are out of the reading frame are not relevant to the proteins encoded by the ORF, and thus do not matter. Also, recoded strains could be sensitive to SNPs if more sense codons are re-assigned, but this is a worthy tradeoff in order to genetically encode non-standard amino acids – something that could never have been done with a wild type strain. Lastly, resistance to viruses based on genetic “isolation”(which occurs due to incompatibility of the viral genome with the recoded E. coli genome) is a completely different issue than antibiotic resistance, which is known to occur through genetic mutation affecting protein structure – not a fundamental code change. If anything these efforts will enable new types of antibiotics to control these strains specifically.

  6. Nothing I have read in genetics has scared me as much as this. We want to create a strain of bacteria, small, rapidly reproducing organisms well known for their ability to survive, contaminate and flourish, that are immune to viruses?

    Their release into the biosphere would be inevitable, and then what would happen? Crossing would be unlikely to produce viable offspring, (Though the unlikely has happened before.) but just the resistance to viruses would give the strain an immense advantage, what would stop its spread around the globe?

  7. well people seems to be freaking out over this. there’s lots of sequences of deadly pathogens already available, for example the 1918 flu virus sequence. that doesn’t mean it’s going to spread like crazy.

    I wonder how the bacteria will handle the introduction of amino acids that weren’t part of life for the 4 billion years it has existed. They might just end up being toxic to bacteria but you might end up producing resistant strains.

  8. R.E. Hunter: “Second, if this makes bacteria resistant to attack by viruses, might it not also make them resistant to attack by existing antibiotics, and by our immune system?

    No. No one who understands how antibiotics work, or how the immune system works, would agree with you. Antibiotics are small molecules which interfere in critical processes of bacterial cells; for example many of them interfere with the bacterial ribosome. But the changes discussed do not involve any changes to ribosomal sequences, either RNA or protein.

    Viruses, in contrast, require the cell protein manufacturing system (i.e. the ribosome) to translate their own gene sequences. This directly involves the extra-ribosomal portions of the translation machinery, i.e. the tRNAs and their charging systems. I.e. the stuff that is actually proposed to be changed.

    1. Bayesian Bouffant: I do understand how antibiotics work. They work by binding to key bacterial proteins to interfere with their normal function, in some cases causing their death, in others preventing them from reproducing.

      This binding is very specific, requiring a match between both the physical shapes and the charges of the two molecules. Already, bacteria develop resistance by mutations that make small changes in the proteins that still allow them to perform their essential function but reduce or eliminate the binding of the antibiotic molecules (this is one of the mechanisms, not the only one).

      So giving the bacteria access to new amino acids not used in nature gives them an entire array of new possibilities for mutated proteins with shape and/or charge differences that could prevent antibiotic binding.

      In fact, viruses rely on exactly the same binding processes to infect cells, except that they are doing protein-to-protein binding rather than small-molecule-to-protein binding.

      Also, if they are just adding new amino acid encodings, the virus encoding will still be a subset of the bacteria’s, so the virus will still be able to commandeer the bacteria’s genetic machinery. To prevent this, they would need to eliminate an existing encoding that the virus relies upon, and do this without breaking anything the bacteria relies upon, probably a close-to-impossible task with the current state of genetic engineering technology and understanding. And even if it is achieved, the virus can still mutate around it, far faster than we can develop new antibiotics.

  9. This is just nibbling around the edges, making slight alterations to existing systems. There is in principle no reason why a larger change could not be made: building a bacterium with a completely changed genetic code. To accomplish this, you would have to change every coding region at once, along with all of the tRNAs and charging systems. The scale is formidable, but to repeat, there is no reason to believe it would be impossible. This would be a Ventner scale project. Ventner and company have already chemically synthesized a complete bacterial chromosome.

  10. David Bump: “ “souping up” dad’s computer?

    And who is “dad” in your analogy?

    Kudzu: “but just the resistance to viruses would give the strain an immense advantage, what would stop its spread around the globe?

    Build in some antibiotic vulnerabilities. Don’t worry too much about it acquiring resistance from other bacteria, horizontal gene transfer, like viral infection, depends upon a shared genetic code.

    1. @Bayesian Bouffant: Horizontal gene transfer relies on the genes being transferred sharing a genetic code. Given the rarity of some of these triplets it is likely that a resistance gene will be compatible or could easily mutate into a compatible gene.

      However antibiotic vulnerabilities are not a good way to control bacteria since they would allow killing of small populations that escape, leaving the rest untouched. Better would be to knock out a vital gene so that the bacterium cannot produce a nutrient it needs, then provide that nutrient in the lab. This would prevent any escaped bacteria growing in most conditions, but they could then pick up the gene again from wild bacteria.

      In either case I am highly skeptical of our ability to contain the bacteria involved.

  11. Guys, guys, guys…

    The mechanism by which these bacteria would be resistant to viruses is because they have a fundamentally altered biology, that makes them incompatible not only with viruses, but also with all other life as we know it. They wouldn’t survive a day outside the lab (well, perhaps survive for a short while, but not replicate).

    Also, the thing with unnatual amino acids is that they don’t appear in nature. A bacterium dependent on unnatural amino acids could not survive without it being provided (in the lab).

    [At this point, someone’s going to say “Life finds a way” and I’m just going to cry. – Ed]

    1. Life finds a way*

      @Bob (And also Ed) From the article I was led to believe their genetic code was rewritten but not their basic biology; they use the same basic ammino acids, fats, sugars and so on. They can still consume organic material and reproduce. If there was an additional measure taken that prevents them from simply consuming and reproducing in the manner of microbes everywhere then this is far more reassuring, but I would like to know how that was done.

      *Excepting the vast majority of species that have become extinct wither naturally or by human intervention. Cry me a river Ed.

Leave a Reply

Your email address will not be published. Required fields are marked *