Showing posts with label storage media. Show all posts
Showing posts with label storage media. Show all posts

Tuesday, January 20, 2026

Internet Archive's Storage

Internet Archive Staff
Bruce Li introduces his The Long Now of the Web: Inside the Internet Archive’s Fight Against Forgetting thus:
This report delves into the mechanics of the Internet Archive with the precision of a teardown. We will strip back the chassis to examine the custom-built PetaBox servers that heat the building without air conditioning. We will trace the evolution of the web crawlers—from the early tape-based dumps of Alexa Internet to the sophisticated browser-based bots of 2025. We will analyze the financial ledger of this non-profit giant, exploring how it survives on a budget that is a rounding error for its Silicon Valley neighbors. And finally, we will look to the future, where the "Decentralized Web" (DWeb) promises to fragment the Archive into a million pieces to ensure it can never be destroyed.
It is long, detailed, comprehensive and well worth reading in full. Below the fold I comment on the part about storage.

Tuesday, October 28, 2025

The Bathtub Curve

The economics of long-term data storage are critically dependent not just upon the Kryder rate, the rate at which the technology improves cost per byte, but also upon the reliability of the media over time. You want to replace media because they are no longer economic, not because they are no longer reliable despite still being economic.

Source
For more than a decade Backblaze has been providing an important public service by publishing data on the reliability of their hard drives, and more recently their SSDs. Below the fold I comment on this month's post from their Drive Stats Team, Are Hard Drives Getting Better? Let’s Revisit the Bathtub Curve.

Wikipedia defines the Bathtub Curve as a common concept in reliability engineering:
The 'bathtub' refers to the shape of a line that curves up at both ends, similar in shape to a bathtub. The bathtub curve has 3 regions:
  1. The first region has a decreasing failure rate due to early failures.
  2. The middle region is a constant failure rate due to random failures.
  3. The last region is an increasing failure rate due to wear-out failures.

Thursday, September 18, 2025

Hard Disk Unexpectedly Not Dead

As I read Zak Killian's Expect HDD, SSD shortages as AI rewrites the rules of storage hierarchy — multiple companies announce price hikes, too I realized I had forgotten to write this year's version of my annual post on the Library of Congress' Desihning Storage Architectures meeting, which was back in March. So below the fold I discuss a few of the DSA talks, Killian's more recent post, and yet another development in DNA storage. The TL;DR is that the long-predicted death of hard disks is continuing to fail to materialize, and so is the equally long-predicted death of tape.

Tuesday, August 19, 2025

2025 Optical Media Durability Update

Seven years ago I posted Optical Media Durability and discovered:
Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
Here are the subsequent annual updates:
It is time once again for the mind-numbing process of feeding 45 disks through the readers to verify their checksums, and yet again this year every single MD5 was successfully verified. Below the fold, the details.

Tuesday, June 17, 2025

The State Of Storage

The Register is running a series on The State Of Storage. Below the fold I flag some articles worth reading.

Monday, April 7, 2025

Paul Evan Peters Award Lecture

At the Spring 2025 Membership Meeting of the Coalition for Networked Information, Vicky and I received the Paul Evan Peters Award.

You can tell this is an extraordinary honor from the list of previous awardees, and the fact that it is the first time it has been awarded in successive years. Part of the award is the opportunity to make an extended presentation to open the meeting. Our talk was entitled Lessons From LOCKSS, and the abstract was:
Vicky and David will look back over their two decades with the LOCKSS Program. Vicky will focus on the Program's initial goals and how they evolved as the landscape of academic communication changed. David will focus on the Program's technology, how it evolved, and how this history reveals a set of seductive, persistent but impractical ideas.
CNI has posted the video of the entire opening plenary to YouTube. Don Waters' generous introduction starts at 14:28 and Vicky starts talking at 20:00.

Below the fold is the text with links to the sources, information that appeared on slides but was not spoken, and much additional information in footnotes.

Friday, March 14, 2025

Archival Storage

I gave a talk at the Berkeley I-school's Information Access Seminar entitled Archival Storage. Below the fold is the text of the talk with links to the sources and the slides (with yellow background).

Tuesday, January 7, 2025

Storage Roundup

It is time for another roundup of topics in storage that have caught my eye recently. Below the fold I discuss the possible ending of the HAMR saga and various developments in archival storage technology.

Tuesday, August 27, 2024

2024 Optical Media Durability Update

Six years ago I posted Optical Media Durability and discovered:
Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
Here are the subsequent annual updates:
It is time once again for the mind-numbing process of feeding 45 disks through the readers to verify their checksums, and yet again this year every single MD5 was successfully verified. Below the fold, the details.

Tuesday, May 28, 2024

Library of Congress: Designing Storage Architectures 2024

I participated virtually in the 2024 Library of Congress Designing Storage Architectures for Digital Collections meeting. As usual, there were a set of very interesting talks. The slides from the presentations are now online so, below the fold, I discuss the talks I found particulary interesting.

Tuesday, March 12, 2024

Petabit Optical Media?

Source
Sabine Hossenfelder does the good Dr. Pangloss proud in her report on A 3D nanoscale optical disk memory with petabit capacity by Miao Zhao et al. Their abstract claims that:
we increase the capacity of [optical data storage] to the petabit level by extending the planar recording architecture to three dimensions with hundreds of layers, meanwhile breaking the optical diffraction limit barrier of the recorded spots. We develop an optical recording medium based on a photoresist film doped with aggregation-induced emission dye, which can be optically stimulated by femtosecond laser beams. This film is highly transparent and uniform, and the aggregation-induced emission phenomenon provides the storage mechanism. It can also be inhibited by another deactivating beam, resulting in a recording spot with a super-resolution scale. This technology makes it possible to achieve exabit-level storage by stacking nanoscale disks into arrays, which is essential in big data centres with limited space.
Below the fold I discuss this technology.

Tuesday, March 5, 2024

Microsoft's Archival Storage Research

2016 Media Shipments

Exabytes Revenue $/GB
Flash120$38.7B$0.320
Hard Disk693$26.8B$0.039
LTO Tape40$0.65B$0.016
Six years ago I wrote Archival Media: Not a Good Business and included this table. The argument went as follows:
  • The value that can be extracted from data decays rapidly with time.
  • Thus companies would rather invest in current than archival data.
  • Thus archival media and systems are a niche market.
  • Thus archival media and systems lack the manufacturing volume to drive down prices.
  • Thus although quasi-immortal media have low opex (running cost), they have high capex (purchase cost).
  • Especially now interest rates are non-zero, the high capex makes the net present value of their lifetime cost high.
  • Archival media compete with legacy generic media, which have mass-market volumes and have already amortized their R&D, so have low capex but higher opex through their shorter service lives.
  • Because they have much higher volumes and thus much more R&D, generic media have much higher Kryder rates, meaning that although they need to be replaced over time, each new unit at approximately equal cost replaces several old units, reducing opex.
  • Especially now interest rates are non-zero, the net present value of the lower capex but higher opex is likely very competitive.
Below the fold I look into why, despite this, Microsoft has been pouring money into archival system R&D for about a decade.

Thursday, January 18, 2024

A Lesson Learned

You know how backups work great until you really need them? Below the fold, a lesson learned from my recent example of this phenomenon.

Thursday, August 17, 2023

Optical Media Durability Update

Five years ago I posted Optical Media Durability and discovered:
Surprisingly, I'm getting good data from CD-Rs more than 14 years old, and from DVD-Rs nearly 12 years old. Your mileage may vary.
Four years ago I repeated the mind-numbing process of feeding 45 disks through the reader and verifying their checksums. Three years ago I did it again, and then again two years ago, and then again a year ago.

It is time again for this annual chore, and yet again this year every single MD5 was successfully verified. Below the fold, the details.

Tuesday, June 20, 2023

2023 Storage Roundup

It is time for another roundup of news from the storage industry, so follow me below the fold.

Thursday, April 13, 2023

Compute In Storage

Tobias Mann's Los Alamos Taps Seagate To Put Compute On Spinning Rust describes progress in the concept of computational storage. I first discussed this in my 2010 JCDL keynote, based on 2009's FAWN, the Fast Array of Wimpy Nodes by David Anderson et al from Carnegie-Mellon. Their work started from the observation that moving data from storage to memory for processing by a CPU takes time and energy, and the faster you do it the more energy it takes. So the less of it you do, the better. Below the fold I start from FAWN and end up with the work under way at Los Alamos.

Tuesday, October 11, 2022

The "DNA Typewriter"

It is time to catch up on a few developments in the field of storing data via chemicals, such as DNA. Below the fold I discuss a half-dozen recent reports.

Thursday, June 9, 2022

Backblaze On Hard Disk Reliability

It has been a long time since I blogged about the invaluable hard drive reliability data that Backblaze has been publishing quarterly since 2015, so I checked their blog and found Andy Klein's Star Wars themed Backblaze Drive Stats for Q1 2022, as well as his fascinating How Long Do Disk Drives Last?. Below the fold I comment on both.

Tuesday, May 17, 2022

Storage Update: Part 3

This is part 3 of my storage update; see Part 1, on DNA storage, and Part 2, on SSD reliability. This is Part 3, on the 2022 Library of Congress "Designing Storage Architectures" meeting. The agenda with links to the presentations is here. Below the fold I have comments on some of them.

Tuesday, March 22, 2022

Storage Update: Part 2

This is part 2 of my latest update on storage technology. Part 1, covering developments in DNA as a storage medium is here. This part was sparked by a paper at Usenix's File And Storage Technologies conference from Bianca Schroeder's group at U. Toronto and NetApp on the performanmce of SSDs at scale. It followed on from their 2020 FAST "Best Paper" that I discussed in Enterprise SSD Reliability, and it prompted me to review the literature of this area. The result is below the fold.