SlideShare a Scribd company logo
TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE
(Who’s Afraid of…)
The Big Bad Data Wolf?
The Big Bad Data Challenge – Big Data & the
Data Quality Imperative
Presented By:
Nigel Turner
VP Information
Management Strategy
1
The tale of the Three Little Pigs
2© Copyright 2013, Trillium Software, Inc. All rights reserved.
Big Data – what is it?
 Set of new concepts, practices & technologies to
manage & exploit digital data
 Can be defined as:
 “Data that exceeds the processing capability of
conventional database systems. The data is too big,
moves too fast, or doesn’t fit the strictures of your
database architecture”
(Source: Ed Dumbill – O’Reilly Community)
 Its key premise is that all data has potential
value if it can be collected, analysed and used to
generate actionable insight
3
3
3© Copyright 2013, Trillium Software, Inc. All rights reserved.
Where does Big Data come from?
SOCIAL
MEDIA &
SOCIAL
NETWORKS
MACHINE
GENERATED
WIDELY KNOWN
SOURCES
4
4
4
© Copyright 2013, Trillium Software, Inc. All rights reserved.
What’s different about
Big Data?
 New technologies which enable distributed & highly
scalable MPP (Massively Parallel Processing), e.g.
 Apache Hadoop
 MapReduce
 NoSQL databases
 Strong emphasis on analytical approaches
 Emergence of “data science”
 Predictive Analytics
 Data Mining
 The “democratisation” of data
 Data made available to all (cf Cloud Computing)
 Business and not IT led BI
5
Big Data & Data Quality – parallel
worlds?
6
BIG
DATA
DATA
QUALITY
© Copyright 2013, Trillium Software, Inc. All rights reserved.
Parallel worlds… or are they (1)?
7
Shared with 100,000+
others and counting…
Parallel worlds… or are they (2)?
8
“ I spend the vast majority of my time cleaning
data systems…cleaning and preparing
data sets makes everything I do better
… it’s the highest value activity I do”
Josh Wills
Senior Director of Data Science
Cloudera
(From “Training a new generation of
Data Scientists” – Cloudera video)
When Big Data & Data Quality
worlds collide…
9
Big Data will
expose Data Quality
shortcomings
Poor Data Quality
will undermine the
value of Big Data
investments
Big Data – building on solid
foundations
BIG DATA / ANALYTICS
DATA QUALITY FOUNDATION
10
The 3Vs and the DQ challenge
• Exponential growth of data – predicted 40-60% per
annum
• 2.5 quintillion bytes of data are created every day
• 90% of all digital data created in the last two years
• Data generated more varied and complex than before:
– Text, Audio, Images, Machine Generated etc.
• Much of this data is semi-structured or unstructured
• Traditional IT techniques ill equipped to process &
analyse it
• Data often generated in real time
• Analysis and response needs to be rapid, often also
real time
• Traditional BI / DW environments cannot cope – new
approaches are needed
11
11
Big Data –
Foundations of Success
 Identifying the right data to solve the business
problem or opportunity
 The ability to integrate & match varied data from
multiple data sources
 structured, semi-structured, unstructured
 Building the right IT infrastructure to support Big
Data applications
 Having the right capabilities & skills to exploit
the data
12
12
Big Data – some vertical
applications
 Retail: using point of sale & social media data to
supplement & enrich traditional CRM / Marketing data
 Insurance & Banking: fraud detection
 Health: holistic patient analysis
 Utilities: consumption peaks & troughs & capacity
planning
 Telcos: call routing optimisation & customer churn
 Manufacturing: predictive fault identification & supply
chain optimisation
 Research: particle analysis, genomics etc.
13
Example Big Data benefit:
The Open Big Data Cloud
14
SOURCE: LINKED OPEN DATA (LOD) COMMUNITY
Big Data in practice - Volvo
 Every Volvo vehicle has hundreds of
microprocessors / sensors
 Data generated used within the car itself
but also captured for analysis by Volvo
and its dealers
 All data is loaded into a centralised
analysis hub & integrated with CRM,
dealership, product & social network data
 Used to optimise design & manufacturing,
enhance customer interaction, improve
safety & act on customer feedback
15
Big Data – Barriers & Pitfalls
 The sheer volume of data – what’s worth using?
 Data extraction challenges
 The ability to match data from disparate sources
/ formats / media
 The time taken to integrate new data sources
 The risks of mismatching and incorrect
identification of individuals
 Legal & regulatory pitfalls
 Security concerns – corporate & individual
 Lack of skills & expertise
16
16
Big Data – the data integration
challenge
SOCIAL
MEDIA
SENSORS
OPEN
DATA
EMAIL
MOBILES
EXTERNALDATASOURCES
INTERNALDATASOURCES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
17
Big Data – the Data Quality
Imperative (1)
 Need to profile external and internal data sources
 Need to classify data to define what data really
matters
 Need to assure the quality of internal (and some
external) data sources for accuracy, completeness,
consistency
 Need to define & apply business rules & metadata
management to how the data will be defined and
used
 Need for a data governance framework to ensure
consistency & control
18
Big Data – the Data Quality
Imperative (2)
 Need processes & tools to enable:
 Source data profiling
 Data integration
 Data parsing
 Data standardisation
 Business rule creation & management
 Metadata management & a shared business / IT glossary
 Data de-duplication
 Data normalisation
 Data matching
 Data enrichment
 Data audit
 Many of these functions must be capable of
being carried out in real time with zero lag
19
Big Data – DQ as the key enabler
SOCIAL
MEDIA
SENSOR
S
OPEN
DATA
EMAIL
EXTERNALDATASOURCES
INTERNALDATASOURCES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT & KNOWLEDGE
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
DATA QUALITY PLATFORM
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
MOBILES
20
Big Data – some algorithms
1. BIG DATA + POOR DATA QUALITY = BIG
PROBLEMS
2. DATA DEMOCRITISATION – DATA GOVERNANCE =
ANARCHY
3. DATA MASH UPS – DATA QUALITY = DATA MESS
4. BIG DATA ANALYTICS + POOR DQ = WRONG
RESULTS
5. BIG DATA – DATA ASSURANCE = JAIL
6. 3V + DATA QUALITY = 4V (VALIDITY)
21
Big Data & Data Quality –
summary
• Big Data will depend on
data quality to reap its
claimed benefits – the
GIGO truism
• The democratization of
data will expose poor
DQ
• The need for Data
Governance increases as
data becomes more
accessible
• Data skills will become
more valued for ‘data
science’
• Big Data will increase
the 3Vs of data
• Control of data becomes
more difficult – scope
and variety of use
increases
• Data standards &
business rules become
more complex
• Potential legal &
regulatory minefield
22
22
What action should we take as
data management / DQ
professionals?
 Identify and get involved in any current or
planned Big Data initiatives within our
organisations
 Ensure that the Data Quality and Data
Governance implications & imperatives of these
initiatives are understood
 Plan for the new Data Quality and Data
Governance challenges that these trends will
pose
23
23
So who’s afraid of the Big Bad
Data Wolf?
24
Questions
(Who’s Afraid of…) The Big Bad Data Wolf?
The big Bad Data challenge – Big Data &
the Data Quality imperative
25

More Related Content

What's hot (18)

PPTX
Protecting data privacy in analytics and machine learning ISACA London UK
Ulf Mattsson
 
PDF
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
PDF
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Denodo
 
PDF
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
PDF
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
PDF
New Analytic Uses of Master Data Management in the Enterprise
DATAVERSITY
 
PDF
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
DATAVERSITY
 
PDF
Logical Data Fabric: Architectural Components
Denodo
 
PDF
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Denodo
 
PDF
Four Pillars of Business Analytics by Actuate
Edgar Alejandro Villegas
 
PDF
Evaluating Big Data Predictive Analytics Platforms
Teradata Aster
 
PDF
Big Data Analytics Architecture PowerPoint Presentation Slides
SlideTeam
 
PDF
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
DATAVERSITY
 
PDF
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
PPT
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
PPTX
Big and fast data strategy 2017 jr
Jonathan Raspaud
 
PDF
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
DATAVERSITY
 
PPTX
Enterprise data architecture of complex distributed applications & services
Davinder Kohli
 
Protecting data privacy in analytics and machine learning ISACA London UK
Ulf Mattsson
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
DATAVERSITY
 
Rethink Your 2021 Data Management Strategy with Data Virtualization (ASEAN)
Denodo
 
Advanced Analytics and Machine Learning with Data Virtualization (India)
Denodo
 
DAS Slides: Emerging Trends in Data Architecture – What’s the Next Big Thing?
DATAVERSITY
 
New Analytic Uses of Master Data Management in the Enterprise
DATAVERSITY
 
Slides: Why You Need End-to-End Data Quality to Build Trust in Kafka
DATAVERSITY
 
Logical Data Fabric: Architectural Components
Denodo
 
Data Democratization for Faster Decision-making and Business Agility (ASEAN)
Denodo
 
Four Pillars of Business Analytics by Actuate
Edgar Alejandro Villegas
 
Evaluating Big Data Predictive Analytics Platforms
Teradata Aster
 
Big Data Analytics Architecture PowerPoint Presentation Slides
SlideTeam
 
Webinar: Decoding the Mystery - How to Know if You Need a Data Catalog, a Dat...
DATAVERSITY
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
Choosing the Right Big Data Architecture for your Business
Chicago Hadoop Users Group
 
Big and fast data strategy 2017 jr
Jonathan Raspaud
 
How to Crunch Petabytes with Hadoop and Big Data Using InfoSphere BigInsights...
DATAVERSITY
 
Enterprise data architecture of complex distributed applications & services
Davinder Kohli
 

Viewers also liked (9)

PPTX
Big data for quality education
Malintha Adikari
 
PPTX
When Big Data and Predictive Analytics Collide: Visual Magic Happens
Chase McMichael
 
PDF
Unit 1. quality, total quality, tqm
Shekhar Mallur
 
PDF
Big data analysis concepts and references
Information Security Awareness Group
 
PPTX
Total quality management in education
Sam Luke
 
PPTX
DICE & Cloudify – Quality Big Data Made Easy
Cloudify Community
 
PDF
DAMA Webinar - Big and Little Data Quality
DATAVERSITY
 
PDF
Big Data et Sport - Gestion de données & Analytics
Groupe D.FI
 
PDF
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Big data for quality education
Malintha Adikari
 
When Big Data and Predictive Analytics Collide: Visual Magic Happens
Chase McMichael
 
Unit 1. quality, total quality, tqm
Shekhar Mallur
 
Big data analysis concepts and references
Information Security Awareness Group
 
Total quality management in education
Sam Luke
 
DICE & Cloudify – Quality Big Data Made Easy
Cloudify Community
 
DAMA Webinar - Big and Little Data Quality
DATAVERSITY
 
Big Data et Sport - Gestion de données & Analytics
Groupe D.FI
 
Hype vs. Reality: The AI Explainer
Luminary Labs
 
Ad

Similar to Big data and the data quality imperative (20)

PDF
The Bigger They Are The Harder They Fall
Trillium Software
 
PDF
Data Profiling: The First Step to Big Data Quality
Precisely
 
PPTX
Introduction to Big Data
SpringPeople
 
PDF
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
PPTX
Deliveinrg explainable AI
Gary Allemann
 
PDF
Heavy, Messy, Misleading: How Big Data is a human problem, not a tech one
Pulsar Platform
 
PDF
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Francesco D'Orazio
 
PDF
Level Seven - Expedient Big Data presentation
Doug Denton
 
PPTX
Big Data: Profile and Skills of the Information Professional.
Luísa Alvim
 
PPTX
Fundamentals of Big Data
The Wisdom Daily
 
PDF
SC6 Workshop 1: What can big data do for you?
BigData_Europe
 
PPTX
Big data session five ( a )f
marukanda
 
PDF
Ictam big data
Terry Bunio
 
PDF
The value of big data analytics
Marc Vael
 
PPTX
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
BigDataExpo
 
PPTX
big data.pptx
ParasSundriyal2
 
PDF
Applying Data Quality Best Practices at Big Data Scale
Precisely
 
PPTX
Big Data Analytics_Unit1.pptx
PrabhaJoshi4
 
PPTX
The New Age Data Quality
Ranjeet202050
 
PDF
Big Data Analytics
Sreedhar Chowdam
 
The Bigger They Are The Harder They Fall
Trillium Software
 
Data Profiling: The First Step to Big Data Quality
Precisely
 
Introduction to Big Data
SpringPeople
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
Deliveinrg explainable AI
Gary Allemann
 
Heavy, Messy, Misleading: How Big Data is a human problem, not a tech one
Pulsar Platform
 
Heavy, messy, misleading. Why Big Data is a human problem, not a technology one.
Francesco D'Orazio
 
Level Seven - Expedient Big Data presentation
Doug Denton
 
Big Data: Profile and Skills of the Information Professional.
Luísa Alvim
 
Fundamentals of Big Data
The Wisdom Daily
 
SC6 Workshop 1: What can big data do for you?
BigData_Europe
 
Big data session five ( a )f
marukanda
 
Ictam big data
Terry Bunio
 
The value of big data analytics
Marc Vael
 
Big Data Expo 2015 - Trillium software Big Data and the Data Quality
BigDataExpo
 
big data.pptx
ParasSundriyal2
 
Applying Data Quality Best Practices at Big Data Scale
Precisely
 
Big Data Analytics_Unit1.pptx
PrabhaJoshi4
 
The New Age Data Quality
Ranjeet202050
 
Big Data Analytics
Sreedhar Chowdam
 
Ad

More from Trillium Software (10)

PDF
Trillium software garp march 2014 presentation bfast briefing
Trillium Software
 
PDF
How Underwriters Can Access Claims Data Now
Trillium Software
 
PDF
Trillium Software CRMUG Webinar August 6, 2013
Trillium Software
 
PDF
How to Identify Claims High-Risk Insurance Claims Faster and More Accurately
Trillium Software
 
PDF
Cloud Computing and Data Governance
Trillium Software
 
PDF
Trillium Software Building the Business Case for Data Quality
Trillium Software
 
PDF
Lean Mean Data Governance Machine Webinar Part 1
Trillium Software
 
PDF
Lean Mean Data Governance Machine Webinar Part 2
Trillium Software
 
PDF
Creating Your Data Governance Dashboard
Trillium Software
 
PDF
The Changing Data Quality & Data Governance Landscape
Trillium Software
 
Trillium software garp march 2014 presentation bfast briefing
Trillium Software
 
How Underwriters Can Access Claims Data Now
Trillium Software
 
Trillium Software CRMUG Webinar August 6, 2013
Trillium Software
 
How to Identify Claims High-Risk Insurance Claims Faster and More Accurately
Trillium Software
 
Cloud Computing and Data Governance
Trillium Software
 
Trillium Software Building the Business Case for Data Quality
Trillium Software
 
Lean Mean Data Governance Machine Webinar Part 1
Trillium Software
 
Lean Mean Data Governance Machine Webinar Part 2
Trillium Software
 
Creating Your Data Governance Dashboard
Trillium Software
 
The Changing Data Quality & Data Governance Landscape
Trillium Software
 

Recently uploaded (20)

PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Complete Network Protection with Real-Time Security
L4RGINDIA
 
PDF
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
July Patch Tuesday
Ivanti
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Complete Network Protection with Real-Time Security
L4RGINDIA
 
Wojciech Ciemski for Top Cyber News MAGAZINE. June 2025
Dr. Ludmila Morozova-Buss
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
July Patch Tuesday
Ivanti
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
NewMind AI Journal - Weekly Chronicles - July'25 Week II
NewMind AI
 

Big data and the data quality imperative

  • 1. TRILLIUM SOFTWARE 2013 CUSTOMER CONFERENCE (Who’s Afraid of…) The Big Bad Data Wolf? The Big Bad Data Challenge – Big Data & the Data Quality Imperative Presented By: Nigel Turner VP Information Management Strategy 1
  • 2. The tale of the Three Little Pigs 2© Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 3. Big Data – what is it?  Set of new concepts, practices & technologies to manage & exploit digital data  Can be defined as:  “Data that exceeds the processing capability of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architecture” (Source: Ed Dumbill – O’Reilly Community)  Its key premise is that all data has potential value if it can be collected, analysed and used to generate actionable insight 3 3 3© Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 4. Where does Big Data come from? SOCIAL MEDIA & SOCIAL NETWORKS MACHINE GENERATED WIDELY KNOWN SOURCES 4 4 4 © Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 5. What’s different about Big Data?  New technologies which enable distributed & highly scalable MPP (Massively Parallel Processing), e.g.  Apache Hadoop  MapReduce  NoSQL databases  Strong emphasis on analytical approaches  Emergence of “data science”  Predictive Analytics  Data Mining  The “democratisation” of data  Data made available to all (cf Cloud Computing)  Business and not IT led BI 5
  • 6. Big Data & Data Quality – parallel worlds? 6 BIG DATA DATA QUALITY © Copyright 2013, Trillium Software, Inc. All rights reserved.
  • 7. Parallel worlds… or are they (1)? 7 Shared with 100,000+ others and counting…
  • 8. Parallel worlds… or are they (2)? 8 “ I spend the vast majority of my time cleaning data systems…cleaning and preparing data sets makes everything I do better … it’s the highest value activity I do” Josh Wills Senior Director of Data Science Cloudera (From “Training a new generation of Data Scientists” – Cloudera video)
  • 9. When Big Data & Data Quality worlds collide… 9 Big Data will expose Data Quality shortcomings Poor Data Quality will undermine the value of Big Data investments
  • 10. Big Data – building on solid foundations BIG DATA / ANALYTICS DATA QUALITY FOUNDATION 10
  • 11. The 3Vs and the DQ challenge • Exponential growth of data – predicted 40-60% per annum • 2.5 quintillion bytes of data are created every day • 90% of all digital data created in the last two years • Data generated more varied and complex than before: – Text, Audio, Images, Machine Generated etc. • Much of this data is semi-structured or unstructured • Traditional IT techniques ill equipped to process & analyse it • Data often generated in real time • Analysis and response needs to be rapid, often also real time • Traditional BI / DW environments cannot cope – new approaches are needed 11 11
  • 12. Big Data – Foundations of Success  Identifying the right data to solve the business problem or opportunity  The ability to integrate & match varied data from multiple data sources  structured, semi-structured, unstructured  Building the right IT infrastructure to support Big Data applications  Having the right capabilities & skills to exploit the data 12 12
  • 13. Big Data – some vertical applications  Retail: using point of sale & social media data to supplement & enrich traditional CRM / Marketing data  Insurance & Banking: fraud detection  Health: holistic patient analysis  Utilities: consumption peaks & troughs & capacity planning  Telcos: call routing optimisation & customer churn  Manufacturing: predictive fault identification & supply chain optimisation  Research: particle analysis, genomics etc. 13
  • 14. Example Big Data benefit: The Open Big Data Cloud 14 SOURCE: LINKED OPEN DATA (LOD) COMMUNITY
  • 15. Big Data in practice - Volvo  Every Volvo vehicle has hundreds of microprocessors / sensors  Data generated used within the car itself but also captured for analysis by Volvo and its dealers  All data is loaded into a centralised analysis hub & integrated with CRM, dealership, product & social network data  Used to optimise design & manufacturing, enhance customer interaction, improve safety & act on customer feedback 15
  • 16. Big Data – Barriers & Pitfalls  The sheer volume of data – what’s worth using?  Data extraction challenges  The ability to match data from disparate sources / formats / media  The time taken to integrate new data sources  The risks of mismatching and incorrect identification of individuals  Legal & regulatory pitfalls  Security concerns – corporate & individual  Lack of skills & expertise 16 16
  • 17. Big Data – the data integration challenge SOCIAL MEDIA SENSORS OPEN DATA EMAIL MOBILES EXTERNALDATASOURCES INTERNALDATASOURCES CRM BILLING OPS SALES PRODS ANALYTICS PLATFORM 1 ANALYTICS PLATFORM 2 ANALYTICS PLATFORM 3 ANALYTICS PLATFORM n ACTIONABLE INSIGHT & KNOWLEDGE 17
  • 18. Big Data – the Data Quality Imperative (1)  Need to profile external and internal data sources  Need to classify data to define what data really matters  Need to assure the quality of internal (and some external) data sources for accuracy, completeness, consistency  Need to define & apply business rules & metadata management to how the data will be defined and used  Need for a data governance framework to ensure consistency & control 18
  • 19. Big Data – the Data Quality Imperative (2)  Need processes & tools to enable:  Source data profiling  Data integration  Data parsing  Data standardisation  Business rule creation & management  Metadata management & a shared business / IT glossary  Data de-duplication  Data normalisation  Data matching  Data enrichment  Data audit  Many of these functions must be capable of being carried out in real time with zero lag 19
  • 20. Big Data – DQ as the key enabler SOCIAL MEDIA SENSOR S OPEN DATA EMAIL EXTERNALDATASOURCES INTERNALDATASOURCES CRM BILLING OPS SALES PRODS ANALYTICS PLATFORM 1 ANALYTICS PLATFORM 2 ANALYTICS PLATFORM 3 ANALYTICS PLATFORM n ACTIONABLE INSIGHT & KNOWLEDGE PROFILE PARSE STANDARDISE MATCH ENRICH DATA QUALITY PLATFORM PROFILE PARSE STANDARDISE MATCH ENRICH MOBILES 20
  • 21. Big Data – some algorithms 1. BIG DATA + POOR DATA QUALITY = BIG PROBLEMS 2. DATA DEMOCRITISATION – DATA GOVERNANCE = ANARCHY 3. DATA MASH UPS – DATA QUALITY = DATA MESS 4. BIG DATA ANALYTICS + POOR DQ = WRONG RESULTS 5. BIG DATA – DATA ASSURANCE = JAIL 6. 3V + DATA QUALITY = 4V (VALIDITY) 21
  • 22. Big Data & Data Quality – summary • Big Data will depend on data quality to reap its claimed benefits – the GIGO truism • The democratization of data will expose poor DQ • The need for Data Governance increases as data becomes more accessible • Data skills will become more valued for ‘data science’ • Big Data will increase the 3Vs of data • Control of data becomes more difficult – scope and variety of use increases • Data standards & business rules become more complex • Potential legal & regulatory minefield 22 22
  • 23. What action should we take as data management / DQ professionals?  Identify and get involved in any current or planned Big Data initiatives within our organisations  Ensure that the Data Quality and Data Governance implications & imperatives of these initiatives are understood  Plan for the new Data Quality and Data Governance challenges that these trends will pose 23 23
  • 24. So who’s afraid of the Big Bad Data Wolf? 24
  • 25. Questions (Who’s Afraid of…) The Big Bad Data Wolf? The big Bad Data challenge – Big Data & the Data Quality imperative 25