DSHR's Blog: human error

Showing posts with label human error. Show all posts

Tuesday, September 11, 2018

What Does Data "Durability" Mean

In What Does 11 Nines of Durability Really Mean? David Friend writes:

No amount of nines can prevent data loss.

There is one very important and inconvenient truth about reliability: Two-thirds of all data loss has nothing to do with hardware failure.

The real culprits are a combination of human error, viruses, bugs in application software, and malicious employees or intruders. Almost everyone has accidentally erased or overwritten a file. Even if your cloud storage had one million nines of durability, it can’t protect you from human error.

Friend may be right that these are the top 5 causes of data loss, but over the timescale of preservation as opposed to storage they are far from the only ones. In Requirements for Digital Preservation Systems: A Bottom-Up Approach we listed 13 of them. Below the fold, some discussion of the meaning and usefulness of durability claims.

Living With Insecurity

My post Not Whether But When took off from the Equifax breach, attempting to explain why the Platonic ideal of a computer system storing data that is safe against loss or leakage cannot exist in the real world. Below the fold, I try to cover some of the implications of this fact.

Not Whether But When

Richard Smith, the CEO of Equifax while the company leaked personal information on most Americans (and suffered at least one more leak that was active for about a year up to last March) was held accountable for these failings by being allowed to retire with a mere $90M. But at Fortune, John Patrick Pullen quotes him as uttering an uncomfortable truth:

"There's those companies that have been breached and know it, and there are those companies that have been breached and don't know it,"

Pullen points out that:

The speech, given by Smith to students and faculty at the university's Terry College of Business, covered a lot of ground, but it frequently returned to security issues that kept the former CEO awake at night—foremost among them was the company's large database.

Smith should have been losing sleep:

Though it was still 21 days before his company would reveal that it had been massively hacked, Equifax, at that time, had been breached and knew it.

Two years ago, the amazing Maciej Cegłowski gave one of his barn-burning speeches, entitled Haunted by Data (my emphasis):

imagine data not as a pristine resource, but as a waste product, a bunch of radioactive, toxic sludge that we don’t know how to handle. In particular, I'd like to draw a parallel between what we're doing and nuclear energy, another technology whose beneficial uses we could never quite untangle from the harmful ones. A singular problem of nuclear power is that it generated deadly waste whose lifespan was far longer than the institutions we could build to guard it. Nuclear waste remains dangerous for many thousands of years. This oddity led to extreme solutions like 'put it all in a mountain' and 'put a scary sculpture on top of it' so that people don't dig it up and eat it. But we never did find a solution. We just keep this stuff in swimming pools or sitting around in barrels.

The fact is that, just like nuclear waste, we have never found a solution to the interconnected problems of keeping data stored in real-world computer systems safe from attack and safe from leaking. It isn't a question of whether the bad guys will get in to the swimming pools and barrels of data, and exfiltrate it. It is simply when they will do so, and how long it will take you to find out that they have. Below the fold I look at the explanation for this fact. I'll get to the implications of our inability to maintain security in a subsequent post.

Archive vs. Ransomware

Archives perennially ask the question "how few copies can we get away with?"

This is a question I've blogged about in 2016 and 2011 and 2010, when I concluded:

The number of copies needed cannot be discussed except in the context of a specific threat model.

The important threats are not amenable to quantitative modeling.

Defense against the important threats requires many more copies than against the simple threats, to allow for the "anonymity of crowds".

I've also written before about the immensely profitable business of ransomware. Recent events, such as WannaCrypt, NotPetya and the details of NSA's ability to infect air-gapped computers should convince anyone that ransomware is a threat to which archives are exposed. Below the fold I look into how archives should be designed to resist this credible threat.

Threats to stored data

Recently there's been a lively series of exchanges on the pasig-discuss mail list, sparked by an inquiry from Jeanne Kramer-Smyth of the World Bank as to any additional risks posed by media such as disks that did encryption or compression. It morphed into discussion of the "how many copies" question and related issues. Below the fold, my reflections on the discussion.

Lurking Malice in the Cloud

It is often claimed that the cloud is more secure than on-premises IT:

If you ask Greg Arnette if the cloud is more secure than on-premises infrastructure he’ll say “absolutely yes.” Arnette is CTO of cloud archive provider Sonian, which is hosted mostly in AWS’s cloud. The public cloud excels in two critical security areas, Arnette contends: Information resiliency and privacy.

But even if the cloud provider's infrastructure were completely secure, using the cloud does not free the user from all responsibility for security. In Lurking Malice in the Cloud: Understanding and Detecting Cloud Repository as a Malicious Service, a team from Georgia Tech, Indiana U., Bloomington and UCSB report on the alarming results of a survey of the use of cloud services to store malware components. Many of the malware stashes they found were hosted in cloud storage rented by legitimate companies, presumably the result of inadequate attention to security details by those companies. Below the fold, some details and comments.

Correlated Distraction

It is 11:44AM Pacific and I'm driving, making a left on to Central Expressway in Mountain View, CA and trying to avoid another vehicle whose driver isn't paying attention when an ear-splitting siren goes off in my car. After a moment of panic I see "Connected" on the infotainment system display. Its the emergency alert system. When it is finally safe to stop and check, I see this message:

Emergency Alert: Dust Storm Warning in this area until 12:00PM MST. Avoid travel. Check local media - NWS.

WTF? Where to even begin with this stupidity? Well, here goes:

"this area" - what area? In the Bay Area we have earthquakes, wildfires, flash floods, but we don't yet have dust storms. Why does the idiot who composed the message think they know where everyone who will read it is?
Its 11:44AM Pacific, or 18:44UTC. That's 12:44PM Mountain. Except we're both on daylight savings time. So did the message mean 12:00PM MDT, in which case the message was already 44 minutes too late? Or did the message mean 12:00MST, or 19:00UTC, in which case it had 16 minutes to run? Why send a warning 44 minutes late or use the wrong time zone?
A dust storm can be dangerous, so giving people 16 minutes (but not -44 minutes) warning could save some lives. Equally, distracting everyone in "this area" who is driving, operating machinery, performing surgery, etc. could cost some lives. Did anyone balance the upsides and downsides of issuing this warning, even assuming it only reached people in "this area"?
I've written before about the importance and difficulty of modelling correlated failures. Now that essentially every driver is carrying (but hopefully not talking on) a cellphone, the emergency alert system is a way to cause correlated distraction of every driver across the entire nation. Correlated distraction caused by rubbernecking at accidents is a well-known cause of additional accidents. But at least that is localized in space. Who thought that building a system to cause correlated distraction of every driver in the nation was a good idea?
Who has authority to trigger the distraction? Who did trigger the distraction? Can we get that person fired?
This is actually the third time the siren has gone off while I'm driving. The previous two were Amber alerts. Don't get me wrong. I think getting drivers to look out for cars that have abducted children is a good idea, and I'm glad to see the overhead signs on freeways used for that purpose. But it isn't a good enough idea to justify the ear-splitting siren and consequent distraction. So I had already followed instructions to disable Amber alerts. I've now also disabled Emergency alerts.

So, once again, because no-one thought What Could Possibly Go Wrong?, a potentially useful system has crashed and burned.

DSHR's Blog

Tuesday, September 11, 2018

What Does Data "Durability" Mean

Thursday, October 5, 2017

Living With Insecurity

Tuesday, October 3, 2017

Not Whether But When

Thursday, July 6, 2017

Archive vs. Ransomware

Thursday, March 23, 2017

Threats to stored data

Tuesday, November 22, 2016

Lurking Malice in the Cloud

Tuesday, August 9, 2016

Correlated Distraction