SlideShare a Scribd company logo
WHITE PAPER
Dealing with
Dark Data
We’re in the difficult middle years of the
information age, where a nexus of factors
like cheap storage, rich HD media, ubiquitous
connectivity and more sophisticated SaaS
products are generating more data than we
can affordably store or meaningfully process.
Why are we growing so much?
Data is flooding in from a multitude of sources
– some known and some invisible – which
organizations today have neither the time nor
the resources to effectively manage, let alone
benefit from.
The trouble is, whilst big data and analytics
remain in vogue, neither the volume of
data produced, nor the impulse to store it
all, will change. In the pursuit of business
intelligence, many organizations are hoarding
– often unconsciously - useless data with
the expectation that its potential value will
eventually offset the costs of a bloated and
unnavigable storage environment.
Dark Data
The main culprit behind this trend is something
Gartner has called “dark data” – data which
accumulates through automatic and manual
processes, but which remains invisible to the
business: idle, unanalyzed and without a clear
owner. Being invisible, quantifying exactly how
much dark data organizations are struggling
with is problematic, but that hasn’t stopped the
major analysts trying.
First, a 2013 survey conducted by IDG
Research Services found that only 28% of
stored data presents any value to the day-
to-day operations of a business, suggesting a
massive 72% is non-essential.
Second, IDC’s “Top 10 predictions for CMOs in
2014” corroborates these figures, suggesting
that organizations will fail to realize any value
from 80% of the customer data they hold
because of “immature enterprise value chains”.
Just in case you don’t speak analyst, that
means current data management practices
aren’t capable of locating and extracting the
supposedly valuable information hidden
amongst terabytes of collected data.
It’s expensive to maintain that much unused
data, as Gartner rightly points out: “…
organizations that fail to optimize the way they
manage and retain their data will be forced
to deal with constant increases in storage
costs”. But financial cost is only a part of the
reason dark data is so damaging. Perhaps
more importantly, dark data has become so
ubiquitous that it obscures the useful stuff.
It’s not just that organizations don’t have an
adequate tool to sift through the data heap;
it’s that in worshipping at the altar of analytics
prematurely, we are actively hoarding
useless data in the hope of one day extracting
enormous value from it.
As IDC’s CMO of Advisory Services put it,
whilst big data analytics is a hot topic, most of
this collected data: “[is] garbage. IDC’s data
group researchers say that some 80% of data
collected has no meaning whatsoever.” Or at
least it won’t, until organizations are “smart
enough [to have] a tool be able to differentiate
between the signal and the noise.”
A survey by IDG Research Services found
that only 28% of stored data presents any
value to the day-to-day operations of a
business.
02
What does dark data look like?
Before we go on to look at what these tools
might look like, we should think about the scale
of the problem we expect them to fix. We must
categorize the types of dark data organizations
possess, and for each category, reconcile its
potential value against the cost of its storage.
For instance, server log files are individually
small and unobtrusive, and may contain
useful insights into customer behaviour when
processed together. Even if they’re dark, they
don’t represent a significant burden on the
storage environment.
Unstructured data, on the other hand, is
without exception the single biggest driver
in dark data growth. It’s a broad category of
storage, which can include almost anything
that exists outside of semantically tagged
field forms and databases, and is estimated
to constitute around 70-80% of all data in an
average organization.
It’s often human-generated information in the
form of documents, presentations, reports,
graphics, videos and audio that all begin as
potentially valuable, but end up as half-finished
ideas, discarded early-drafts or simply assets
that serve their purpose and are no longer
useful.
Why is there so much of it?
The answer to the spiraling growth of
unstructured data is the same as its cause –
data management practices (or rather, the
lack of them). We’ll go on to look at the way
tools can encourage better policy-based
management of the data lifecycle shortly, but it
is briefly worth reiterating that the solution to
dark data is not technology – it is management.
There’s no single cause behind the volume and
variety of unstructured data organizations
produce. Some of it is just a symptom of
technological progress. We are using,
producing and sharing more stuff - whether
that is documents, presentations, emails, or
media – because both the tools (and therefore
output) have become more sophisticated and
the quality of connectivity between us is faster
and more reliable.
There is one common thread though:
standards of data management have not kept
up with the pace of data growth. Not by a long
shot.
One of the most common problems is poorly
maintained folder structures. In organizations
where users are free to create data and
folders within shared file stores, duplication
of both content and the effort required to
create it is incredibly common. Users become
less productive because they can’t find the
information they need, and the file stores
become a tangled mess of non-standardized
naming conventions, leading to massive
amounts of erroneous data putting a great
strain on storage.
Another common problem is that old and
unused file data is not actively retired once
it is updated or has become irrelevant. In the
Databarracks Data Health Check 2014, 49%
of 401 respondents did not actively distinguish
between unused and recently accessed file
data despite it being the largest cause of
storage growth.
Unstructured data is estimated to
constitute around 70-80% of all data in an
average organization.
03
What do we do about it?
There is an appetite for tools able to shed some light on dark data. IDG’s report found that whilst
77% of enterprises expressed interest in a single platform solution that automatically manages
data, only 10% actually had a completely automated process in place.
Of course, organizations struggling with dark data (which, to be clear, is everyone) must first
identify what they hope to achieve in finding it. Is it that there may be hidden value in documents
long forgotten about, or that they hope to retire useless data to enable more cost-effective
storage?
In truth, this is a bit of a false dilemma – the answer is probably a combination of the two.
However, it remains a useful distinction to make, if only to make a more informed decision about
the capabilities they require from their chosen solution.
Prospective data analytics tools must offer three core capabilities to reveal the location and
condition of dark data, and minimize preventable growth in future.
Search
First, organizations need a strong search capability that scrapes
both metadata and the actual content of unstructured data.
This increases visibility into the dark areas of your storage
environment and connects users to the information they need
more quickly.
Analyze
Secondly, organizations need powerful analytics and reporting
capabilities in order to extract actionable intelligence from large
volumes of dark data. This is a twofold challenge: half technical
and half design. The analytics must be accurate, responsive
and exhaustive, but they must also be beautifully visualized to
increase usability, comprehension and insight.
Archive
Finally, to address the problem of dark data in the long term,
data analytics tools must facilitate the transfer of old and unused
data to cheaper archive storage platforms. Cloud-based object
storage is a cheap and highly scalable alternative to costly
primary storage, and with the creation of management policies
based on usage-rates and compliance obligations, organizations
can automate the process of retiring inactive data.
651 results found
in 10 ms
Server-1
Software
MarketingEvents
Sales
To find out more visit www.kazoup.com.
Kazoup brings unstructured file data back under control in 3 steps: search, analyze and archive. Leveraging
beautiful data visualization, policy-based lifecycle management and cheap cloud object storage, Kazoup
helps you realize more value from your data whilst lowering the cost of storage.

More Related Content

PDF
Business_Analytics_Presentation_Luke_Caratan
Luke Caratan
 
PDF
Big data's impact on online marketing
Pros Global Inc
 
PDF
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Jennifer Walker
 
PDF
Taming the data beast
The Marketing Distillery
 
PDF
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Jennifer Walker
 
PDF
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
IT Support Engineer
 
PDF
Big Data Management: Work Smarter Not Harder
Jennifer Walker
 
Business_Analytics_Presentation_Luke_Caratan
Luke Caratan
 
Big data's impact on online marketing
Pros Global Inc
 
Move It Don't Lose It: Is Your Big Data Collecting Dust?
Jennifer Walker
 
Taming the data beast
The Marketing Distillery
 
Hadoop: Data Storage Locker or Agile Analytics Platform? It’s Up to You.
Jennifer Walker
 
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
IT Support Engineer
 
Big Data Management: Work Smarter Not Harder
Jennifer Walker
 

What's hot (20)

DOCX
Understanding Dark Data
Ahmed Banafa
 
PDF
What's the Big Deal About Big Data?
Logi Analytics
 
PDF
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
ijmnct
 
PDF
Dark data
Amir Sedighi
 
PDF
TierPoint_ColocationWhitepaper-Six_Reasons
The Marketing Survivalist
 
PDF
Getting down to business on Big Data analytics
The Marketing Distillery
 
PDF
7 trends-for-big-data
Tableau Software
 
PDF
Gartner Predicts 2018
Javier Caravantes
 
PPTX
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Brad Culbert
 
PDF
Accelerate Data Discovery
Attivio
 
PPTX
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
PDF
Extract the Analyzed Information from Dark Data
ijtsrd
 
PDF
Big data and oracle
Sourabh Saxena
 
PPTX
Dealing with Dark Data
Simplex Consulting
 
PPTX
Dark Data Discovery & Governance with File Analysis
Craig Adams
 
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
PDF
The ABCs of Big Data
The Marketing Distillery
 
PPTX
Big Data
Faisal Ahmed
 
PDF
Slow Data Kills Business eBook - Improve the Customer Experience
InterSystems
 
PPTX
Better Architecture for Data: Adaptable, Scalable, and Smart
Paul Boal
 
Understanding Dark Data
Ahmed Banafa
 
What's the Big Deal About Big Data?
Logi Analytics
 
BRIDGING DATA SILOS USING BIG DATA INTEGRATION
ijmnct
 
Dark data
Amir Sedighi
 
TierPoint_ColocationWhitepaper-Six_Reasons
The Marketing Survivalist
 
Getting down to business on Big Data analytics
The Marketing Distillery
 
7 trends-for-big-data
Tableau Software
 
Gartner Predicts 2018
Javier Caravantes
 
Business Analytics & Big Data Trends and Predictions 2014 - 2015
Brad Culbert
 
Accelerate Data Discovery
Attivio
 
Big Data, NoSQL, NewSQL & The Future of Data Management
Tony Bain
 
Extract the Analyzed Information from Dark Data
ijtsrd
 
Big data and oracle
Sourabh Saxena
 
Dealing with Dark Data
Simplex Consulting
 
Dark Data Discovery & Governance with File Analysis
Craig Adams
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
The ABCs of Big Data
The Marketing Distillery
 
Big Data
Faisal Ahmed
 
Slow Data Kills Business eBook - Improve the Customer Experience
InterSystems
 
Better Architecture for Data: Adaptable, Scalable, and Smart
Paul Boal
 
Ad

Similar to Dealing with Dark Data (20)

PPTX
Dark data by Worapol Alex Pongpech
BAINIDA
 
PDF
The state-of-dark-data-report
Vaibhav Agarwal
 
PPTX
Dark Data Revelation and its Potential Benefits
PromptCloud
 
PDF
THE DATABERG REPORT SEE WHAT OTHERS DON’T
at MicroFocus Italy ❖✔
 
PDF
Managing The Data Explosion
Laura Hood
 
PDF
veritas-strike-global-report_a4-sdc2
Marius Ghinea
 
PDF
Mastering the Dark Data Challenge - Harnessing AI for Enhanced Data Governanc...
Enterprise Knowledge
 
PDF
Harness the power of data
Harsha MV
 
PDF
Semantic 'Radar' Steers Users to Insights in the Data Lake
Cognizant
 
PDF
Semantic 'Radar' Steers Users to Insights in the Data Lake
Thomas Kelly, PMP
 
PDF
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Happiest Minds Technologies
 
PDF
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
happiestmindstech
 
PDF
Symantec Data Insight for Storage
Symantec
 
PDF
Prague data management meetup 2015 11-23
Martin Bém
 
PDF
Data foundation for analytics excellence
Mudit Mangal
 
PPTX
bigdata introduction for students pg msc
DharaniMani4
 
PDF
Noise to Signal - The Biggest Problem in Data
DATAVERSITY
 
PDF
The Post-Relational Reality Sets In: 2011 Survey on Unstructured Data
MarkLogic Corporation
 
DOCX
Bidata
Tamojit Das
 
Dark data by Worapol Alex Pongpech
BAINIDA
 
The state-of-dark-data-report
Vaibhav Agarwal
 
Dark Data Revelation and its Potential Benefits
PromptCloud
 
THE DATABERG REPORT SEE WHAT OTHERS DON’T
at MicroFocus Italy ❖✔
 
Managing The Data Explosion
Laura Hood
 
veritas-strike-global-report_a4-sdc2
Marius Ghinea
 
Mastering the Dark Data Challenge - Harnessing AI for Enhanced Data Governanc...
Enterprise Knowledge
 
Harness the power of data
Harsha MV
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Cognizant
 
Semantic 'Radar' Steers Users to Insights in the Data Lake
Thomas Kelly, PMP
 
Whitepaper: Big Data 101 - Creating Real Value from the Data Lifecycle - Happ...
Happiest Minds Technologies
 
Big Data 101 - Creating Real Value from the Data Lifecycle - Happiest Minds
happiestmindstech
 
Symantec Data Insight for Storage
Symantec
 
Prague data management meetup 2015 11-23
Martin Bém
 
Data foundation for analytics excellence
Mudit Mangal
 
bigdata introduction for students pg msc
DharaniMani4
 
Noise to Signal - The Biggest Problem in Data
DATAVERSITY
 
The Post-Relational Reality Sets In: 2011 Survey on Unstructured Data
MarkLogic Corporation
 
Bidata
Tamojit Das
 
Ad

Recently uploaded (20)

PDF
Build Multi-agent using Agent Development Kit
FadyIbrahim23
 
PDF
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
The Role of Automation and AI in EHS Management for Data Centers.pdf
TECH EHS Solution
 
PDF
Micromaid: A simple Mermaid-like chart generator for Pharo
ESUG
 
PPTX
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Smart Panchayat Raj e-Governance App.pptx
Rohitnikam33
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
TestNG for Java Testing and Automation testing
ssuser0213cb
 
PDF
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
PDF
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
PPTX
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
PDF
Why Use Open Source Reporting Tools for Business Intelligence.pdf
Varsha Nayak
 
Build Multi-agent using Agent Development Kit
FadyIbrahim23
 
Wondershare Filmora 14.5.20.12999 Crack Full New Version 2025
gsgssg2211
 
Presentation about variables and constant.pptx
kr2589474
 
The Role of Automation and AI in EHS Management for Data Centers.pdf
TECH EHS Solution
 
Micromaid: A simple Mermaid-like chart generator for Pharo
ESUG
 
GALILEO CRS SYSTEM | GALILEO TRAVEL SOFTWARE
philipnathen82
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Smart Panchayat Raj e-Governance App.pptx
Rohitnikam33
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
TestNG for Java Testing and Automation testing
ssuser0213cb
 
Jenkins: An open-source automation server powering CI/CD Automation
SaikatBasu37
 
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
Why Use Open Source Reporting Tools for Business Intelligence.pdf
Varsha Nayak
 

Dealing with Dark Data

  • 2. We’re in the difficult middle years of the information age, where a nexus of factors like cheap storage, rich HD media, ubiquitous connectivity and more sophisticated SaaS products are generating more data than we can affordably store or meaningfully process. Why are we growing so much? Data is flooding in from a multitude of sources – some known and some invisible – which organizations today have neither the time nor the resources to effectively manage, let alone benefit from. The trouble is, whilst big data and analytics remain in vogue, neither the volume of data produced, nor the impulse to store it all, will change. In the pursuit of business intelligence, many organizations are hoarding – often unconsciously - useless data with the expectation that its potential value will eventually offset the costs of a bloated and unnavigable storage environment. Dark Data The main culprit behind this trend is something Gartner has called “dark data” – data which accumulates through automatic and manual processes, but which remains invisible to the business: idle, unanalyzed and without a clear owner. Being invisible, quantifying exactly how much dark data organizations are struggling with is problematic, but that hasn’t stopped the major analysts trying. First, a 2013 survey conducted by IDG Research Services found that only 28% of stored data presents any value to the day- to-day operations of a business, suggesting a massive 72% is non-essential. Second, IDC’s “Top 10 predictions for CMOs in 2014” corroborates these figures, suggesting that organizations will fail to realize any value from 80% of the customer data they hold because of “immature enterprise value chains”. Just in case you don’t speak analyst, that means current data management practices aren’t capable of locating and extracting the supposedly valuable information hidden amongst terabytes of collected data. It’s expensive to maintain that much unused data, as Gartner rightly points out: “… organizations that fail to optimize the way they manage and retain their data will be forced to deal with constant increases in storage costs”. But financial cost is only a part of the reason dark data is so damaging. Perhaps more importantly, dark data has become so ubiquitous that it obscures the useful stuff. It’s not just that organizations don’t have an adequate tool to sift through the data heap; it’s that in worshipping at the altar of analytics prematurely, we are actively hoarding useless data in the hope of one day extracting enormous value from it. As IDC’s CMO of Advisory Services put it, whilst big data analytics is a hot topic, most of this collected data: “[is] garbage. IDC’s data group researchers say that some 80% of data collected has no meaning whatsoever.” Or at least it won’t, until organizations are “smart enough [to have] a tool be able to differentiate between the signal and the noise.” A survey by IDG Research Services found that only 28% of stored data presents any value to the day-to-day operations of a business. 02
  • 3. What does dark data look like? Before we go on to look at what these tools might look like, we should think about the scale of the problem we expect them to fix. We must categorize the types of dark data organizations possess, and for each category, reconcile its potential value against the cost of its storage. For instance, server log files are individually small and unobtrusive, and may contain useful insights into customer behaviour when processed together. Even if they’re dark, they don’t represent a significant burden on the storage environment. Unstructured data, on the other hand, is without exception the single biggest driver in dark data growth. It’s a broad category of storage, which can include almost anything that exists outside of semantically tagged field forms and databases, and is estimated to constitute around 70-80% of all data in an average organization. It’s often human-generated information in the form of documents, presentations, reports, graphics, videos and audio that all begin as potentially valuable, but end up as half-finished ideas, discarded early-drafts or simply assets that serve their purpose and are no longer useful. Why is there so much of it? The answer to the spiraling growth of unstructured data is the same as its cause – data management practices (or rather, the lack of them). We’ll go on to look at the way tools can encourage better policy-based management of the data lifecycle shortly, but it is briefly worth reiterating that the solution to dark data is not technology – it is management. There’s no single cause behind the volume and variety of unstructured data organizations produce. Some of it is just a symptom of technological progress. We are using, producing and sharing more stuff - whether that is documents, presentations, emails, or media – because both the tools (and therefore output) have become more sophisticated and the quality of connectivity between us is faster and more reliable. There is one common thread though: standards of data management have not kept up with the pace of data growth. Not by a long shot. One of the most common problems is poorly maintained folder structures. In organizations where users are free to create data and folders within shared file stores, duplication of both content and the effort required to create it is incredibly common. Users become less productive because they can’t find the information they need, and the file stores become a tangled mess of non-standardized naming conventions, leading to massive amounts of erroneous data putting a great strain on storage. Another common problem is that old and unused file data is not actively retired once it is updated or has become irrelevant. In the Databarracks Data Health Check 2014, 49% of 401 respondents did not actively distinguish between unused and recently accessed file data despite it being the largest cause of storage growth. Unstructured data is estimated to constitute around 70-80% of all data in an average organization. 03
  • 4. What do we do about it? There is an appetite for tools able to shed some light on dark data. IDG’s report found that whilst 77% of enterprises expressed interest in a single platform solution that automatically manages data, only 10% actually had a completely automated process in place. Of course, organizations struggling with dark data (which, to be clear, is everyone) must first identify what they hope to achieve in finding it. Is it that there may be hidden value in documents long forgotten about, or that they hope to retire useless data to enable more cost-effective storage? In truth, this is a bit of a false dilemma – the answer is probably a combination of the two. However, it remains a useful distinction to make, if only to make a more informed decision about the capabilities they require from their chosen solution. Prospective data analytics tools must offer three core capabilities to reveal the location and condition of dark data, and minimize preventable growth in future. Search First, organizations need a strong search capability that scrapes both metadata and the actual content of unstructured data. This increases visibility into the dark areas of your storage environment and connects users to the information they need more quickly. Analyze Secondly, organizations need powerful analytics and reporting capabilities in order to extract actionable intelligence from large volumes of dark data. This is a twofold challenge: half technical and half design. The analytics must be accurate, responsive and exhaustive, but they must also be beautifully visualized to increase usability, comprehension and insight. Archive Finally, to address the problem of dark data in the long term, data analytics tools must facilitate the transfer of old and unused data to cheaper archive storage platforms. Cloud-based object storage is a cheap and highly scalable alternative to costly primary storage, and with the creation of management policies based on usage-rates and compliance obligations, organizations can automate the process of retiring inactive data. 651 results found in 10 ms Server-1 Software MarketingEvents Sales To find out more visit www.kazoup.com. Kazoup brings unstructured file data back under control in 3 steps: search, analyze and archive. Leveraging beautiful data visualization, policy-based lifecycle management and cheap cloud object storage, Kazoup helps you realize more value from your data whilst lowering the cost of storage.