This site may earn chapter commissions from the links on this page. Terms of use.

Hard drives and NAND flash memory can store a lot more data than they could just a few years ago, but they've still got zippo on DNA. The genetic material in well-nigh every cell of your body has a vastly college storage capacity than a hard bulldoze, and it could potentially last for hundreds of thousands of years. The problem has been efficiently encoding information in Dna. Now a pair of researchers from Columbia Academy and the New York Genome Center accept developed a process for storing 214 petabytes of data per gram of Deoxyribonucleic acid.

The DNA in our cells contains the instructions for building all the proteins that go along us running. DNA is made upwards of repeating sequences of the nucleic acids adenine, guanine, cytosine, and thymine (A, M, C, and T). These are sometimes called base pairs. Each sequence of 3 bases translates to a different amino acid, which are the edifice blocks of proteins. It's data storage merely like what we do with difficult drives, merely with much higher potential density.

The get-go time scientists were able to write and read digital data to Dna, they managed an constructive chapters of ane.28 petabytes per gram. That'south dainty and all, simply Yaniv Erlich and Dina Zielinski improved that by a gene of 100. They successfully encoded a total computer operating system, an 1895 French picture called "Arrival of a train at La Ciotat," a $l Amazon gift bill of fare, a computer virus, a Pioneer plaque, and a 1948 study past information theorist Claude Shannon. The key was not in producing the DNA, but how the data was divide upward and encoded in the outset place.

Erlich and Zielinski call their process a "Deoxyribonucleic acid Fountain." Starting time, all the files were compressed into a single master archive. An algorithm was used to have the binary code from that file and split it into short strings of digits. When translating the binary into base pair sequences, the algorithm is able to drop nucleotide sequences that are more likely to lead to read errors and replace them with others. Each bundle of strings is referred to as a droplet, and each droplet has a barcode in the sequence that tells the researchers where information technology fits when reassembling the file.

Erlich and Zielinski

Drawback: Reading files on your hard bulldoze does not require a pipette.

The researchers concluded up with 72,000 Dna strands that contained the encoded data. The code was sent off to Twist Biosciences, a San Francisco company that can generate constructed Deoxyribonucleic acid from a provided sequence. In a few weeks Erlich and Zielinski received a vial containing the DNA molecules they coded. To read the information, they used standard DNA sequencing technology, and so special software to reverse the encoding process. They were left with the original files, all perfectly intact.

DNA information storage could have a number of benefits at this level of density. As mentioned in a higher place, Deoxyribonucleic acid tin can terminal a long time, and you'll even so be able to read it in a hundred years. It would be hugely difficult to read data stored on an antique v.25-inch floppy disk, for example. Yaniv Erlich jokes that if the Deoxyribonucleic acid data does become obsolete, we take bigger problems. Cost is notwithstanding an issue, though. The team spent $7,000 creating the Dna archive and another $2,000 reading information technology. They besides needed a lot of advanced equipment. Notwithstanding, DNA storage might make some sense in the near future as a method of cold storage for large volumes of vital information.

Now read: How DNA data storage works