SlideShare a Scribd company logo
Moving Data Science from an Event to
a Program
Wayne Applebaum, Ph. D.
What Gartner Sees
2
“… by 2017, 33 percent of Fortune 100
organizations will experience an
information crisis, due to their inability to
effectively value, govern and trust their
enterprise information.”
Gartner Press Release, February 27, 2014
How it should work
Business
processes and
Business
decisions
bracket a
robust
infrastructure of
data tools and
processes.Transactional Information/Other Data
Measures Analytics Tools
Business Decisions
Target data store
Load Quality Processes
Business Processes
How it usually works
Silo’s of
data that
are difficult
to put
together
"Those who don't know history are destined to repeat
it.”-Edmund Burke
Here’s a Data Scientist Viewpoint
• Identifying Data
Sources
• Data Correctness
• Data Quality
• Business
Involvement
• Multiple Sources
• Data Governance
• Flexibility
• Takes up 80% of
their time
Why the Problem is Getting Worse
• Use and value placed on data and is increasing
• More decisions are being made in the same
amount of time
• Answers aren’t in the silos-you need to cross the
silos to get them
• Business demand for information based decision
is not discussed in the popular media
6
Pressure and opportunity for data and analytics
is rising
Reuse is becoming a business necessity
Emergence of Business Decision Data
7
Data
Business
Decision
Data
MasterTransactional
While the basic rules of Data Governance remains the
same. the scope is expanding
Transactions Vs. Decisions
8
Transactions
Decisions
Process each transaction as quickly as possible
Consolidate Information to make the correct
decision as quickly as possible
The Data Governance-No Free
Lunch Rule
9
When it comes to integrating data sources
There is no free lunch
You have to understand the data and
context to be able to make decisions
10
Creating the Data Hub: Overview
Scope
Identifying Key Objects/Values
Creating the
Controlled
Vocabulary
Object
Mapping
Creating the
Canonical/Targ
et Model
Creating and Rules and Standards
Implementing
Data Retrieval
Creating the
User Interface
Developing
Load
Procedures
Architecture Decisions
Ingestion, Database, Data Governance. Retrieval
Where do we go from here?
11
• Implement Data Governance early
• Integrate Data Governance Across Silo’s
• Recognize that Data Governance doesn’t end with
Master Data
• Big Data represents a new challenges because the
meaning of a transaction is no longer defined on entry
• Create the governance and structures to support both
transactions and decisions
• Consider Data Hubs for cross silo integrations
Governance is essential for reuse and reuse is essential
to maximize value

More Related Content

What's hot (20)

PPTX
Reproducible Dashboards and other great things to do with Jupyter
Domino Data Lab
 
PPTX
Andreas weigend
BigDataExpo
 
PPTX
Notilyze SAS
BigDataExpo
 
PPTX
Building Data Science Teams: A Moneyball Approach
joshwills
 
PPTX
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
PDF
Data quality management Basic
Khaled Mosharraf
 
PDF
Leveraged Analytics at Scale
Domino Data Lab
 
PDF
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Formulatedby
 
PPTX
Giovanni Lanzani GoDataDriven
BigDataExpo
 
PDF
H2O World - What you need before doing predictive analysis - Keen.io
Sri Ambati
 
PDF
Data quality - The True Big Data Challenge
Stefan Kühn
 
PDF
Pay no attention to the man behind the curtain - the unseen work behind data ...
mark madsen
 
PDF
Back to Square One: Building a Data Science Team from Scratch
Klaas Bosteels
 
PDF
Building a Data Platform Strata SF 2019
mark madsen
 
PDF
Data Architecture: OMG It’s Made of People
mark madsen
 
PPTX
Valuing the data asset
Bala Iyer
 
PDF
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
PDF
What is a Data Scientist
Experian_US
 
PDF
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Precisely
 
PDF
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 
Reproducible Dashboards and other great things to do with Jupyter
Domino Data Lab
 
Andreas weigend
BigDataExpo
 
Notilyze SAS
BigDataExpo
 
Building Data Science Teams: A Moneyball Approach
joshwills
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
Data quality management Basic
Khaled Mosharraf
 
Leveraged Analytics at Scale
Domino Data Lab
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Formulatedby
 
Giovanni Lanzani GoDataDriven
BigDataExpo
 
H2O World - What you need before doing predictive analysis - Keen.io
Sri Ambati
 
Data quality - The True Big Data Challenge
Stefan Kühn
 
Pay no attention to the man behind the curtain - the unseen work behind data ...
mark madsen
 
Back to Square One: Building a Data Science Team from Scratch
Klaas Bosteels
 
Building a Data Platform Strata SF 2019
mark madsen
 
Data Architecture: OMG It’s Made of People
mark madsen
 
Valuing the data asset
Bala Iyer
 
Architecting a Data Platform For Enterprise Use (Strata NY 2018)
mark madsen
 
What is a Data Scientist
Experian_US
 
Foundational Strategies for Trust in Big Data Part 1: Getting Data to the Pla...
Precisely
 
Creating a DevOps Practice for Analytics -- Strata Data, September 28, 2017
Caserta
 

Similar to Moving Data Science from an Event to A Program: Considerations in Creating Sustainable and Reusable Data Sources (20)

PPTX
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
Beth Fitzpatrick
 
PPTX
How Data Integration and Governance Enables HR to Drive Value .pptx
Natasha Ramdial - Roopnarine
 
PDF
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
PPTX
Is Your Agency Data Challenged?
DLT Solutions
 
PDF
Stop the madness - Never doubt the quality of BI again using Data Governance
Mary Levins, PMP
 
PPTX
Securing big data (july 2012)
Marc Vael
 
PDF
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
PDF
The Merger is Happening, Now What Do We Do?
DATUM LLC
 
PPTX
Most Common Data Governance Challenges in the Digital Economy
Robyn Bollhorst
 
PPTX
Cff data governance best practices
Beth Fitzpatrick
 
PDF
Data driven decision making
SHAHZAD M. SALEEM
 
PDF
Why data governance is the new buzz?
Aachen Data & AI Meetup
 
PPTX
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
windu19
 
PDF
Fate of the Chief Data Officer
Tamarah Usher
 
PDF
Hcd wp-2012-better dataleadstobetteranalytics
Health Care DataWorks
 
PPTX
Successful stewardship Presentation
Certus Solutions
 
PDF
Increasing Agility Through Data Virtualization
Denodo
 
PPTX
Data Governance Course without AI_Week 1.pptx
lateeth1
 
PDF
Building Rules for Data Governance
Precisely
 
PDF
WHITE PAPER: Distributed Data Quality
Alan D. Duncan
 
DC Salesforce1 Tour Data Governance Lunch Best Practices deck
Beth Fitzpatrick
 
How Data Integration and Governance Enables HR to Drive Value .pptx
Natasha Ramdial - Roopnarine
 
Data-Ed Webinar: Data Quality Success Stories
DATAVERSITY
 
Is Your Agency Data Challenged?
DLT Solutions
 
Stop the madness - Never doubt the quality of BI again using Data Governance
Mary Levins, PMP
 
Securing big data (july 2012)
Marc Vael
 
Foundational Strategies for Trust in Big Data Part 2: Understanding Your Data
Precisely
 
The Merger is Happening, Now What Do We Do?
DATUM LLC
 
Most Common Data Governance Challenges in the Digital Economy
Robyn Bollhorst
 
Cff data governance best practices
Beth Fitzpatrick
 
Data driven decision making
SHAHZAD M. SALEEM
 
Why data governance is the new buzz?
Aachen Data & AI Meetup
 
Explorasi Data untuk Peluang Bisnis dan Pengembangan Karir.pptx
windu19
 
Fate of the Chief Data Officer
Tamarah Usher
 
Hcd wp-2012-better dataleadstobetteranalytics
Health Care DataWorks
 
Successful stewardship Presentation
Certus Solutions
 
Increasing Agility Through Data Virtualization
Denodo
 
Data Governance Course without AI_Week 1.pptx
lateeth1
 
Building Rules for Data Governance
Precisely
 
WHITE PAPER: Distributed Data Quality
Alan D. Duncan
 
Ad

More from Domino Data Lab (20)

PDF
What's in your workflow? Bringing data science workflows to business analysis...
Domino Data Lab
 
PDF
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
PDF
Racial Bias in Policing: an analysis of Illinois traffic stops data
Domino Data Lab
 
PPTX
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
PDF
Summertime Analytics: Predicting E. coli and West Nile Virus
Domino Data Lab
 
PDF
GeoViz: A Canvas for Data Science
Domino Data Lab
 
PDF
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
PDF
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
PDF
Software Engineering for Data Scientists
Domino Data Lab
 
PDF
Making Big Data Smart
Domino Data Lab
 
PPTX
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
PPTX
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
PDF
The Role and Importance of Curiosity in Data Science
Domino Data Lab
 
PDF
Fuzzy Matching to the Rescue
Domino Data Lab
 
PDF
How to Effectively Combine Numerical Features and Categorical Features
Domino Data Lab
 
PDF
Building Up Local Models of Customers
Domino Data Lab
 
PPTX
Making Investing A Science
Domino Data Lab
 
PDF
How to Use Data Science to Affect Company Change
Domino Data Lab
 
PDF
Making Media with Jupyter
Domino Data Lab
 
PDF
Lean Data Science
Domino Data Lab
 
What's in your workflow? Bringing data science workflows to business analysis...
Domino Data Lab
 
The Proliferation of New Database Technologies and Implications for Data Scie...
Domino Data Lab
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Domino Data Lab
 
Leveraging Data Science in the Automotive Industry
Domino Data Lab
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Domino Data Lab
 
GeoViz: A Canvas for Data Science
Domino Data Lab
 
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
How I Learned to Stop Worrying and Love Linked Data
Domino Data Lab
 
Software Engineering for Data Scientists
Domino Data Lab
 
Making Big Data Smart
Domino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
The Role and Importance of Curiosity in Data Science
Domino Data Lab
 
Fuzzy Matching to the Rescue
Domino Data Lab
 
How to Effectively Combine Numerical Features and Categorical Features
Domino Data Lab
 
Building Up Local Models of Customers
Domino Data Lab
 
Making Investing A Science
Domino Data Lab
 
How to Use Data Science to Affect Company Change
Domino Data Lab
 
Making Media with Jupyter
Domino Data Lab
 
Lean Data Science
Domino Data Lab
 
Ad

Recently uploaded (20)

PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
PDF
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
ICONIQ State of AI Report 2025 - The Builder's Playbook
Razin Mustafiz
 
Transcript: Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 

Moving Data Science from an Event to A Program: Considerations in Creating Sustainable and Reusable Data Sources

  • 1. Moving Data Science from an Event to a Program Wayne Applebaum, Ph. D.
  • 2. What Gartner Sees 2 “… by 2017, 33 percent of Fortune 100 organizations will experience an information crisis, due to their inability to effectively value, govern and trust their enterprise information.” Gartner Press Release, February 27, 2014
  • 3. How it should work Business processes and Business decisions bracket a robust infrastructure of data tools and processes.Transactional Information/Other Data Measures Analytics Tools Business Decisions Target data store Load Quality Processes Business Processes
  • 4. How it usually works Silo’s of data that are difficult to put together "Those who don't know history are destined to repeat it.”-Edmund Burke
  • 5. Here’s a Data Scientist Viewpoint • Identifying Data Sources • Data Correctness • Data Quality • Business Involvement • Multiple Sources • Data Governance • Flexibility • Takes up 80% of their time
  • 6. Why the Problem is Getting Worse • Use and value placed on data and is increasing • More decisions are being made in the same amount of time • Answers aren’t in the silos-you need to cross the silos to get them • Business demand for information based decision is not discussed in the popular media 6 Pressure and opportunity for data and analytics is rising Reuse is becoming a business necessity
  • 7. Emergence of Business Decision Data 7 Data Business Decision Data MasterTransactional While the basic rules of Data Governance remains the same. the scope is expanding
  • 8. Transactions Vs. Decisions 8 Transactions Decisions Process each transaction as quickly as possible Consolidate Information to make the correct decision as quickly as possible
  • 9. The Data Governance-No Free Lunch Rule 9 When it comes to integrating data sources There is no free lunch You have to understand the data and context to be able to make decisions
  • 10. 10 Creating the Data Hub: Overview Scope Identifying Key Objects/Values Creating the Controlled Vocabulary Object Mapping Creating the Canonical/Targ et Model Creating and Rules and Standards Implementing Data Retrieval Creating the User Interface Developing Load Procedures Architecture Decisions Ingestion, Database, Data Governance. Retrieval
  • 11. Where do we go from here? 11 • Implement Data Governance early • Integrate Data Governance Across Silo’s • Recognize that Data Governance doesn’t end with Master Data • Big Data represents a new challenges because the meaning of a transaction is no longer defined on entry • Create the governance and structures to support both transactions and decisions • Consider Data Hubs for cross silo integrations Governance is essential for reuse and reuse is essential to maximize value