Showing posts with label long-lived media. Show all posts
Showing posts with label long-lived media. Show all posts

Tuesday, August 27, 2024

2024 Optical Media Durability Update

Six years ago I posted Optical Media Durability and discovered:
Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
Here are the subsequent annual updates:
It is time once again for the mind-numbing process of feeding 45 disks through the readers to verify their checksums, and yet again this year every single MD5 was successfully verified. Below the fold, the details.

Tuesday, May 28, 2024

Library of Congress: Designing Storage Architectures 2024

I participated virtually in the 2024 Library of Congress Designing Storage Architectures for Digital Collections meeting. As usual, there were a set of very interesting talks. The slides from the presentations are now online so, below the fold, I discuss the talks I found particulary interesting.

Tuesday, March 5, 2024

Microsoft's Archival Storage Research

2016 Media Shipments

Exabytes Revenue $/GB
Flash120$38.7B$0.320
Hard Disk693$26.8B$0.039
LTO Tape40$0.65B$0.016
Six years ago I wrote Archival Media: Not a Good Business and included this table. The argument went as follows:
  • The value that can be extracted from data decays rapidly with time.
  • Thus companies would rather invest in current than archival data.
  • Thus archival media and systems are a niche market.
  • Thus archival media and systems lack the manufacturing volume to drive down prices.
  • Thus although quasi-immortal media have low opex (running cost), they have high capex (purchase cost).
  • Especially now interest rates are non-zero, the high capex makes the net present value of their lifetime cost high.
  • Archival media compete with legacy generic media, which have mass-market volumes and have already amortized their R&D, so have low capex but higher opex through their shorter service lives.
  • Because they have much higher volumes and thus much more R&D, generic media have much higher Kryder rates, meaning that although they need to be replaced over time, each new unit at approximately equal cost replaces several old units, reducing opex.
  • Especially now interest rates are non-zero, the net present value of the lower capex but higher opex is likely very competitive.
Below the fold I look into why, despite this, Microsoft has been pouring money into archival system R&D for about a decade.

Thursday, January 9, 2020

Library of Congress Storage Architecture Meeting

.The Library of Congress has finally posted the presentations from the 2019 Designing Storage Architectures for Digital Collections workshop that took place in early September, I've greatly enjoyed the earlier editions of this meeting, so I was sorry I couldn't make it this time. Below the fold, I look at some of the presentations.

Tuesday, October 15, 2019

Nanopore Technology For DNA Storage

DNA assembly for nanopore data storage readout by Randolph Lopez et al from the UW/Microsoft team continues their steady progress in developing technologies for data storage in DNA.

Below the fold, some details and a little discussion.

Thursday, May 16, 2019

Review Of Data Storage In DNA

Luis Ceze, Jeff Nivala and Karin Strauss of the University of Washington and Microsoft Research team have published a fascinating review of the history and state-of-the-art in Molecular digital data storage using DNA. The abstract reads:
Molecular data storage is an attractive alternative for dense and durable information storage, which is sorely needed to deal with the growing gap between information production and the ability to store data. DNA is a clear example of effective archival data storage in molecular form. In this Review , we provide an overview of the process, the state of the art in this area and challenges for mainstream adoption. We also survey the field of in vivo molecular memory systems that record and store information within the DNA of living cells, which, together with in vitro DNA data storage, lie at the growing intersection of computer systems and biotechnology.
They include a comprehensive bibliography. Below the fold, some commentary and a few quibbles.

Tuesday, May 14, 2019

Storing Data In Oligopeptides

Bryan Cafferty et al have published a paper entitled Storage of Information Using Small Organic Molecules. There's a press release from Harvard's Wyss Institute at Storage Beyond the Cloud. Below the fold, some commentary on the differences and similarities between this technique and using DNA to store data.

Thursday, March 21, 2019

Cost-Reducing Writing DNA Data

In DNA's Niche in the Storage Market, I addressed a hypothetical DNA storage company's engineers and posed this challenge:
increase the speed of synthesis by a factor of a quarter of a trillion, while reducing the cost by a factor of fifty trillion, in less than 10 years while spending no more than $24M/yr.
Now, a company called Catalog plans to demo a significant step in the right direction:
The goal of the demonstration, says Park, is to store 125 gigabytes, ... in 24 hours, on less than 1 cubic centimeter of DNA. And to do it for $7,000.
That would be 1E11 bits for $7E3. At the theoretical maximum 2 bits/base, it would be $3.5E-8 per base, versus last year's estimate of 1E-4, or around 30,000 times better.

If the demo succeeds, it marks a major achievement. But below the fold I continue to throw cold water on the medium-term prospects for DNA storage.

Thursday, October 18, 2018

Betteridge's Law Violation

Erez Zadok points me to Wasim Ahmed Bhat's Is a Data-Capacity Gap Inevitable in Big Data Storage? in IEEE Computer. It is a violation of Betteridge's Law of Headlines because the answer isn't no. But what, exactly, is this gap? Follow me below the fold.

Wednesday, May 16, 2018

Shorter talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the shorter talk, which was updated from and entitled DNA's Niche in the Storage Market .

Longer talk at MSST2018

I was invited to give both a longer and a shorter talk at the 34th International Conference on Massive Storage Systems and Technology at Santa Clara University. Below the fold is the text with links to the sources of the longer talk, which was updated from and entitled The Medium-Term Prospects for Long-Term Storage Systems.

Tuesday, February 6, 2018

DNA's Niche in the Storage Market

I've been writing about storing data in DNA for the last five years, both enthusiastically about DNA's long-term prospects as a technology for storage, and pessimistically about its medium-term prospects. This time, I'd like to look at DNA storage systems as a product, and ask where their attributes might provide a fit in the storage marketplace.

As far as I know no-one has ever built a storage system using DNA as a medium, let alone sold one. Indeed, the only work I know on what such a system would actually look like is by the team from Microsoft Research and the University of Washington. Everything below the fold is somewhat informed speculation. If I've got something wrong, I hope the experts will correct me.

Thursday, December 21, 2017

Science Friday's "File Not Found"

Science Friday's Lauren Young has a three-part series on digital preservation:
  1. Ghosts In The Reels is about magnetic tape.
  2. The Librarians Saving The Internet is about Web archiving.
  3. Data Reawakening is about the search for a quasi-immortal medium.
Clearly, increasing public attention to the problem of preserving digital information is a good thing, but I have reservations about these posts. Below the fold, I lay them out.

Tuesday, October 31, 2017

Storage Failures In The Field

It's past time for another look at the invaluable hard drive data that Backblaze puts out quarterly. As Peter Bright notes at Ars Technica, despite being based on limited data, the current stats reveal two interesting observations:
  • Backblaze is seeing reduced rates of infant mortality for the 10TB and 12TB drive generations:
    The initial data from the 10TB and 12TB disks, however, has not shown that pattern. While the data so far is very limited, with 1,240 disks and 14,220 aggregate drive days accumulated so far, none of these disks (both Seagate models) have failed.
  • Backblaze is seeing no reliability advantage from enterprise as against consumer drives:
    the company has now accumulated 3.7 million drive days for the consumer disks and 1.4 million for the enterprise ones. Over this usage, the annualized failure rates are 1.1 percent for the consumer disks and 1.2 percent for the enterprise ones.
Below the fold, some commentary.

Tuesday, September 5, 2017

Long-Lived Scientific Observations

By BabelStone, CC BY-SA 3.0
Source
Keeping scientific data, especially observations that are not repeatable, for the long term is important. In our 2006 Eurosys paper we used an example from China. During the Shang dynasty:
astronomers inscribed eclipse observations on animal bones. About 3200 years later, researchers used these records to estimate that the accumulated clock error was about 7 hours. From this they derived a value for the viscosity of the Earth's mantle as it rebounds from the weight of the glaciers.
Last week we had another, if only one-fifth as old, example of the value of long-ago scientific observations. Korean astronomers' records of a nova in 1437 provide strong evidence that:
1473 nova remains
"cataclysmic binaries"—novae, novae-like variables, and dwarf novae—are one and the same, not separate entities as has been previously suggested. After an eruption, a nova becomes "nova-like," then a dwarf nova, and then, after a possible hibernation, comes back to being nova-like, and then a nova, and does it over and over again, up to 100,000 times over billions of years.
How were these 580-year-old records preserved? Follow me below the fold.

Thursday, May 4, 2017

Tape is "archive heroin"

I've been boring my blog readers for years with my skeptical take on quasi-immortal media. Among the many, many reasons why long media life, such as claimed for tape, is irrelevant to practical digital preservation is that investing in long media life is a bet against technological progress.

Now, at IEEE Spectrum, Marty Perlmutter's The Lost Picture Show: Hollywood Archivists Can’t Outpace Obsolescence is a great explanation of why tape's media longevity is irrelevant to long-term storage:
While LTO is not as long-lived as polyester film stock, which can last for a century or more in a cold, dry environment, it’s still pretty good.

The problem with LTO is obsolescence. Since the beginning, the technology has been on a Moore’s Law–like march that has resulted in a doubling in tape storage densities every 18 to 24 months. As each new generation of LTO comes to market, an older generation of LTO becomes obsolete. LTO manufacturers guarantee at most two generations of backward compatibility. What that means for film archivists with perhaps tens of thousands of LTO tapes on hand is that every few years they must invest millions of dollars in the latest format of tapes and drives and then migrate all the data on their older tapes—or risk losing access to the information altogether.

That costly, self-perpetuating cycle of data migration is why Dino Everett, film archivist for the University of Southern California, calls LTO “archive heroin—the first taste doesn’t cost much, but once you start, you can’t stop. And the habit is expensive.” As a result, Everett adds, a great deal of film and TV content that was “born digital,” even work that is only a few years old, now faces rapid extinction and, in the worst case, oblivion.
Note also that the required migration consumes a lot of bandwidth, meaning that in order to supply the bandwidth needed to ingest the incoming data you need a lot more drives. This reduces the tape/drive ratio, and thus decreases tape's apparent cost advantage. Not to mention that migrating data from tape to tape is far less automated and thus far more expensive than migrating between on-line media such as disk.

Tuesday, December 13, 2016

The Medium-Term Prospects for Long-Term Storage Systems

Back in May I posted The Future of Storage, a brief talk written for a DARPA workshop of the same name. The participants were experts in one or another area of storage technology, so the talk left out a lot of background that a more general audience would have needed. Below the fold, I try to cover the same ground but with this background included, which makes for a long post.

This is an enhanced version of a journal article that has been accepted for publication in Library Hi Tech, with images that didn't meet the journal's criteria, and additional material reflecting developments since submission. Storage technology evolution can't be slowed down to the pace of peer review.

Tuesday, November 29, 2016

Talks at the Library of Congress Storage Architecture Meeting

Slides from the talks at last September's Library of Congress Storage Architecture meeting are now on-line. Below the fold, links to and commentary on three of them.

Wednesday, October 5, 2016

Another Vint Cerf Column

Vint Cerf has another column on the problem of digital preservation. He concludes:
These thoughts immediately raise the question of financial support for such work. In the past, there were patrons and the religious orders of the Catholic Church as well as the centers of Islamic science and learning that underwrote the cost of such preservation. It seems inescapable that our society will need to find its own formula for underwriting the cost of preserving knowledge in media that will have some permanence. That many of the digital objects to be preserved will require executable software for their rendering is also inescapable. Unless we face this challenge in a direct way, the truly impressive knowledge we have collectively produced in the past 100 years or so may simply evaporate with time.
Vint is right about the fundamental problem but wrong about how to solve it. He is right that the problem isn't not knowing how to make digital information persistent, it is not knowing how to pay to make digital information persistent. Yearning for quasi-immortal media makes the problem of paying for it worse not better, because quasi-immortal media such as DNA are both more expensive and their more expensive cost is front-loaded. Copyability is inherent in on-line information, that's how you know it is on-line. Work with this grain of the medium, don't fight it.

Thursday, September 15, 2016

Nature's DNA storage clickbait

Andy Extance at Nature has a news article that illustrates rather nicely the downside of Marcia McNutt's (editor-in-chief of Science) claim that one reason to pay the subscription to top journals is that:
Our news reporters are constantly searching the globe for issues and events of interest to the research and nonscience communities.
Follow me below the fold for an analysis of why no-one should be paying Nature to publish this kind of stuff.