In July, an article by journalist Megan Molteni from the tech magazine Wired had hit the headlines: "Scientists Upload a Galloping Horse GIF Into Bacteria With Crispr". Wow! And a second 'wow!' followed suit when a few weeks later an article by journalist Maev Kennedy in the British newspaper The Guardian boasted the title "Mathematical secrets of ancient tablet unlocked after nearly a century of study". This article featured a recent paper by the Australian Mathematicians Daniel Mansfield and Norman Wildberger (2) who sport a new interpretation of a Babylonian clay tablet that was first described in great detail by the Archaeologists and Assyriologists Otto Neugebauer and Abraham Sachs in 1945 (1). You may wonder how 70 years can generously make up for "...nearly a century", and why we at 'Small Things Considered' care for what bothers the print and online media during the summer slump. Here we go...
Plimpton 322 is a postcard-sized clay tablet with cuneiform inscription, dating most likely from ~1870 BCE, and unearthed in the early 1900s from the rubble of the ancient Mesopotamian city of Larsa (now in Iraq) (Fig. 1). Thousands of these >3,000 years-old clay tablets are known and, according to Assyriologists, most of them document the particular penchant of the ancient Babylonians for written contracts of any kind (they had already developed a sophisticated 'backup' system by wrapping clay around the original and repeating the contract text on the 'envelope'. in case the envelope turned unreadable by damage, or the contractors got into conflict about the wording, the envelope was broken and the original studied in the presence of a judge ). But some clay tablets, including Plimpton 322, testify a remarkable versatility of the Babylonians in mathematics. In their study from 1945, Neugebauer and Sachs describe their finding of serial numbers involved in Pythagorean triples – integer solutions to the equation a2+b2=c2 – on Plimpton 322 (1). In a fine collaboration, Assyriologists and Mathematicians have since then found out that Babylonian Mathematicians must have solved many problems, for example in astronomy or architecture, by using a sexagesimal number system and integer ratios rather than trigonometry, as did the ancient Greek, or arithmetic (they arrived at 1.41421296 as approximation for the square root of 2, which is again wow! ). However, Evelyn Lamb makes it pretty clear in her critique of the Mansfield & Wildberger paper in Scientific American that the Babylonian 'trig table' on Plimpton 322 is in no way more precise than the calculations done by our computers today (fun fact: it is also known that the scribe of Plimpton 322 made a few 'typos' and even calculation mistakes ).
In their 'galloping horse' paper in Nature, Shipman et al. present an ingenious application of CRISPR to transform E. coli cells into living devices capable of capturing, storing, and propagating information over time (3). They used the CRISPR-associated (Cas) proteins, Cas1 and Cas2, that function as an integrase complex to acquire oligonucleotides after electroporation (mimicking phage infection) and store them as 'spacers' in a CRISPR array of the host chromosome (Merry had discussed the intricacies of CRISPR several times in STC, see here, here, and here ). They digitized five frames of the famous 'galloping horse' photograph series by Eadweard Muybridge into 36 x 26 pixel pictures with 21 gray-values. For their 'translation' into oligonucleotides, 9 non‑neighboring pixels each were grouped into individual 'pixets', the 104 pixets assigned a (nucleotide)barcode to allow their re-projection onto the reconstructed picture frame following sequencing, and each grey value assigned a degenerate triplet code that allowed avoiding internal PAM sites and mononucleotide stretches, and allowed obtaining a mean GC content of ~50% of the resulting oligonucleotide. Finally, the oligonucleotide was tagged with a PAM sequence to ensure its efficient acquisition as spacer in a preferred orientation, and the complementary strand fixed through a hairpin structure. The oligonucleotide sets for each picture frame were introduced as bulk into E. coli cells overexpressing Cas1 and Cas2 by electroporation. Transformed cells were allowed to recover for one day between each round of transformation. CRISPR array are known to 'record' acquired spacers in historical order, so sequencing of the CRISPR arrays after day 5 should not only reveal the completeness of the oligonucleotide sets but their timely order to reconstruct the original image frames. And indeed, they could reconstruct each frame and the order of frames, and increasing read depth aided the accuracy of the reconstruction to >90% overall accuracy (Fig, 2).
Shipman et al. make no claim of having developed a technique to preserve visual messages for people in the far-off future. Rather they aim at turning human cells like neurons into biological recording devices. "The E. coli is just a proof of concept to show what cool things you can do with this CRISPR system," says Jeff Nivala, coauthor of the paper, in the Wired article and explains, "our real goal is to enable cells to gather information about themselves and to store it in their genome for us to look at later." It will be a long way to persuade eukaryotic cells to use the bacterial CRISPR system to record "incoming" DNA snippets in a timely order for the curious scientist. It may well turn out to be more revealing to see how the cells avoid being recruited to do the job! But turning CRISPR into a 'recording device' and not using it as a mere 'gene editor' is certainly a cute idea, and worth a galloping horse.
Plimpton v. Muybridge
Despite some initial enthusiasm, scientists and engineers working on DNA as a means to store huge amounts of data in the smallest possible volume – 215 petabytes per gram of DNA, for example – have mostly abandoned the idea of using living cells for long-term storage. Data stored as DNA would inevitably participate in the evolution of their host cells. And unless this DNA could contribute a selective advantage it would first get corrupted, and then lost sooner rather than later. It is hard to imagine that a weirdo bacterium, or an eccentric yeast cell, could develop an addiction for DNA‑encoded GIFs of galloping horses or trigonometric tables. And despite their 'data recovery rate' of >90%, the results obtained by Shipman et al. make the point here: 74% of the errors they detected were due to mutations, sequencing flaws, or 'typos' made during in vitro oligonucleotide sythesis, 12% due to imprecise spacer incorporation into the CRISPR array by the Cas proteins, 10% were expected but missing spacers, and 3% were erroneously incorporated snippets of host chromosome DNA (the autoimmunity problem of CRISPR, it's Achilles' heel ). There is, however, a reasonable perspective to store DNA ex vivo for prolonged periods. If it were otherwise, we would know nothing about the Denisovans. And even when recovering ancient DNA is technically demanding – the recovery rates are still underwhelming – we can safely assume that properly stored DNA is more stable than, say, parchment or clay tablets.
Yet when it comes to readability, clay tablets are the likely winners on the long term. Can you imagine high-tech DNA sequencers being the standard equipment of any future library? There's a "prequel": a few years back, the U.S. Library of Congress – among academics of every flavor valued as one of the most professional knowledge archives of the world – faced a major disaster with increasingly deteriorating magnetic tapes and storage discs from the 1950s. Worse still, it weren't only the storage media that were on the 'brink of extinction' in many cases but, in addition, the hard- and software required to actually get access to the stored data. Replacement tape readers weren't available any longer, the companies producing them gone, construction plans lost, people with knowledge of the software deceased...
Durable low-tech clay tablets can be read by eye and their long-lost written language and content deciphered with scientific effort, at least partially, even when the cultural background is lost: the stored data are present in a way that literally invites reading. Filaments of crystallized DNA may be physically fine – and even aesthetically pleasing! – but they are are plain dull, that is, give no hints on galloping horses. Recall that nucleic acids were, for "...nearly a century", considered unlikely to have any capacity for encoding information at all.
(1) Neugebauer O, Sachs AJ. 1945. Mathematical Cuneiform Texts. American Oriental Series, 49. American Oriental Society, American Schools of Oriental Research
(2) Mansfield DF, Wildberger NJ. 2017. Plimpton 322 is Babylonian exact sexagesimal trigonometry. Hist Math, doi 10.1016/j.hm.2017.08.001 (Open Access PDF here)
(3) Shipman SL, Nivala J, Macklis JD, Church GM. 2017. CRISPR-Cas encoding of a digital movie into the genomes of a population of living bacteria. Nature Commun, 547 (7663), 345 – 349. PMID 28700573 (Open Access PDF here)