SlideShare a Scribd company logo
Demystifying Data Science
Our Speaker
David Fussichen
Analytics8 President
AGENDA
1
2
3
What actually happens on data science projects
Data Science vs BI
How to get started
Data Science = Advanced Analytics
What Is Data Science?
DATA SCIENCE: a general term that includes the
process of obtaining, transforming, analyzing, and
communicating data to answer a question.
MACHINE LEARNING: using algorithms and computers to
do a task better as more data is fed to the algorithm.
A primary type of data science analysis and a subset of AI.
The 2 Domains of Machine Learning
(Unsupervised Learning)
Purpose: Finding Groups
(Supervised Learning)
Purpose: Predictions
Real Use Cases
• Fraud detection
• Credit and pre-payment risk
• Likelihood of default
• Automatically rebalance portfolios
FINANCIAL SERVICES
HEALTHCARE
• Predict and prevent readmissions
• ICU utilization prediction
• Predict payer fraud
• Likelihood of adhering to regimes
INSURANCE
• Predict how much claim will cost
• Likelihood of converting a quote to a policy
• Predict fraudulent claims
GOVERNMENT
• Predict incoming threats
• Proactively block misuse of data
• Predict Medicare fraud
TRANSPORTATION
• Predict wear and tear
• Route optimization and schedule planning
MANUFACTURING,
RETAIL,
MAKRETING,
CYBER SECURITY,
CUSTOMER SERVICE, & MORE!
Step 1: Define Project Objectives
Infigen Energy seeks to predict the
likelihood of failure of the main bearing on
each of its 1500 wind turbines. Predicting the
failure of a main bearing will save up to
$300,000 per year.
The results of the model will be used to
adjust the turbine speed to extend the life of a
bearing while a replacement is planned.
The request: use their data to predict turbine
failure in the next 30 days.
Step 2: Acquire and Explore Data
Training Data
Turbine Farm Location Make-Model Cycles (M)
Last
Maint.
Date
Output
30 Day
Mait.
Types
Events Failed
456 Forsayth 2-26 GE-Alstom-3MW ECO 100 4.7 2/5/2015 400 MW ST AR N
324 Walkaway 2 14-16 GE-Alstom -4.8MW-158 1.9 8/15/2015 375 MW ST GTR N
293 Cherry Tree 8-25 GE-Alstom -3MW ECO 100 5.8 7/13/2016 427 MW OH D Y
323 Woakwine 12-5 GE-Alstom -4.8MW-158 8.6 12/1/2017 275 MW ST NM N
Step 2: Acquire and Explore Data
Training Data
Turbine Farm Location Make-Model Cycles (M)
Last
Maint.
Date
Output
30 Day
Mait.
Types
Events Failed
456 Forsayth 2-26 GE-Alstom-3MW ECO 100 4.7 2/5/2015 400 MW ST AR N
324 Walkaway 2 14-16 GE-Alstom -4.8MW-158 1.9 8/15/2015 375 MW ST GTR N
293 Cherry Tree 8-25 GE-Alstom -3MW ECO 100 5.8 7/13/2016 427 MW OH D Y
323 Woakwine 12-5 GE-Alstom -4.8MW-158 8.6 12/1/2017 275 MW ST NM N
FEATURES
TARGET
Skipping Ahead
Turbine Farm Location Make-Model Cycles (M)
Last
Maint.
Date
Output
30 Day
Mait.
Types
Events Failed
845 Woodline 7-23 GE-Alstom-3MW ECO 100 5.9 5/14/2016 435 MW OH NM ?
466 Lake Bonney 1-19 GE-Alstom -4.8MW-158 6.9 1/22/2014 300 MW ST GTR ?
852 Walkaway 2 6-9 GE-Alstom -4.8MW-158 11.8 9/22/2015 510 MW OH GTR ?
156 Bodangora 14-3 GE-Alstom -4.8MW-158 7.6 6/7/2017 274 MW ST AR ?
Production Data before we run our model
Skipping Ahead
Turbine Farm Location Make-Model Cycles (M)
Last
Maint.
Date
Output
30 Day
Mait.
Types
Events Failed
845 Woodline 7-23 GE-Alstom-3MW ECO 100 5.9 5/14/2016 435 MW OH NM .051
466 Lake Bonney 1-19 GE-Alstom -4.8MW-158 6.9 1/22/2014 300 MW ST GTR .062
852 Walkaway 2 6-9 GE-Alstom -4.8MW-158 11.8 9/22/2015 510 MW OH GTR .371
156 Bodangora 14-3 GE-Alstom -4.8MW-158 7.6 6/7/2017 274 MW ST AR .013
Production Data after we run our model
Now we can make informed decisions about turbine speed and preventative maintenance!
Back to the Process – Step 3: Modeling
A “model” is just a math function – the same as we learned about in algebra class
Algebra Class Model Machine Learning Model
f(x) = y f(model number, type, cycles, . . .) = y
Algebra Class Example Machine Learning Example
f(x) = 2x+3 f([features]) = [Tensor Flow Neural Network]
f(5) =13 f([specific features]) = 0.035
The modeling process uses training data to pick the most appropriate model and
determine the parameters for the model.
Step 3: Modeling
Modeling can be difficult
to do without extensive
academic training.
Software helps.
Tweaking a model will
only result in small
incremental
improvements to
prediction accuracy.
Feature Engineering
is where dramatic
improvements are
possible.
Create new features
from your existing
ones to improve
model performance
Step 4: Interpret and Communicate
Let your BI system help you tell the story.
Load model results into your
system for deeper analysis.
Turbine Farm Make-Model Failed
845 Woodline GE-Alstom-3MW ECO 100 .051
466 Lake
Bonney
GE-Alstom -4.8MW-158 .062
852 Walkaway 2 GE-Alstom -4.8MW-158 .371
156 Bodangora GE-Alstom -4.8MW-158 .013
Step 5: Implement
BI
Data
Science
Data Science vs. Business Intelligence
“Operationalized Data Science”
Augmented
Analytics
Business Intelligence Data Science
Intent Data accessibility, discovery Finding patterns in data
Scope As-was / As-is Future
Audience Broad Limited
Output Dashboards, Reports PowerPoint
Approach Dimensional Algorithmical
Technology Relational, Associative Machine Learning
Developers COE, Biz IT, Analysts Data Scientists
BI vs. Data Science Today
Getting Started
• Learning more about Data Science
• The first projects
• Operationalized Data Science (“Augmented Analytics”)
Learning More About Data Science
STATS TOOLSCODE IT
BLACK BOX
ENTERPRISE SOFTWARE
• Amazon Machine Learning Tutorial
• Microsoft Azure Machine Learning Studio
• DataRobot Webinars
• Wikipedia
We’ll send these out after the webinar!
Resources
Assess and prepare your data
- Most companies’ data is a mess – it needs to be prepared for machine learning models.
Getting Started – The First Projects
Ask yourself: What does it mean to me to make this prediction?
Identify business reasons for implementing data science
- What value does it provide and how can it be measured?
- Build models around your business objectives.
Assess staffing needs. Do you have the staff, skills, and resources needed?
Make a plan: complete project prioritization matrix
Getting Started - Tips
2
1
3
4
Stick with simple models
Explore more problems
Learn from a sample of data
Focus on automation
Focus on reducing the time between the data acquisition and the development of the
first simple predictive model.
Instead of exploring 1 business problem with incredible sophistication, explore many,
and build a simple predictive model for each one.
Instead of using massive computing resources to process big data, explore data
subsamples which will enable the exploration of more hypotheses.
Streamline manual processes, and develop algorithms to help automate.
Via Harvard Business Review: “Why You’re Not Getting Value from Your Data
Moving to Operationalized Data Science
• Use your corporate data
• Store data in formats for human users AND machines
• Use Machine Learning models to prioritize data collection efforts
• Hire and use the PhDs to oversight and program design work
• Train your data-literate professionals on Machine Learning
• Use Machine Learning tools instead of hacking away at code
• Use enterprise-class ETL tools to wrangle and organize data
• Use enterprise-class BI tools to prepare training data
• Use enterprise-class BI tools to tell the story
So how do you get there?
Questions?

More Related Content

PPT
Excel Datamining Addin Advanced
DataminingTools Inc
 
PDF
IRJET- Missing Data Imputation by Evidence Chain
IRJET Journal
 
PDF
Industrial Machine Learning
Grigorios Tsoumakas
 
PDF
50120140503013
IAEME Publication
 
PDF
Variance rover system
eSAT Journals
 
PDF
Variance rover system web analytics tool using data
eSAT Publishing House
 
PDF
V2 i9 ijertv2is90699-1
warishali570
 
PDF
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET Journal
 
Excel Datamining Addin Advanced
DataminingTools Inc
 
IRJET- Missing Data Imputation by Evidence Chain
IRJET Journal
 
Industrial Machine Learning
Grigorios Tsoumakas
 
50120140503013
IAEME Publication
 
Variance rover system
eSAT Journals
 
Variance rover system web analytics tool using data
eSAT Publishing House
 
V2 i9 ijertv2is90699-1
warishali570
 
IRJET- Survey of Estimation of Crop Yield using Agriculture Data
IRJET Journal
 

What's hot (9)

PDF
Building a Predictive Model
DKALab
 
PDF
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET Journal
 
PDF
Preprocessing and secure computations for privacy preservation data mining
IAEME Publication
 
PPTX
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Edureka!
 
PDF
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
PPTX
Musings of kaggler
Kai Xin Thia
 
PDF
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET Journal
 
PPTX
Trending Topics in Machine Learning
Techsparks
 
PPT
Data Science in the Real World: Making a Difference
Srinath Perera
 
Building a Predictive Model
DKALab
 
IRJET- Probability based Missing Value Imputation Method and its Analysis
IRJET Journal
 
Preprocessing and secure computations for privacy preservation data mining
IAEME Publication
 
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Edureka!
 
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 
Musings of kaggler
Kai Xin Thia
 
IRJET- Effecient Support Itemset Mining using Parallel Map Reducing
IRJET Journal
 
Trending Topics in Machine Learning
Techsparks
 
Data Science in the Real World: Making a Difference
Srinath Perera
 
Ad

Similar to Demystifying Data Science Webinar - February 14, 2018 (20)

PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
PPTX
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
PDF
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET Journal
 
PDF
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
IRJET Journal
 
PDF
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET Journal
 
PDF
Machine learning and big data
Poo Kuan Hoong
 
PDF
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
mattdenesuk
 
PDF
Email Spam Detection Using Machine Learning
IRJET Journal
 
PDF
CSC1202 Lecture 2 Data Science Processes.pdf
jayashirymorgan
 
PDF
Mtc strategy-briefing-houston-pd m-05212018-3
Dania Kodeih
 
PDF
SOLIDWORKS reseller Whitepaper by Promedia Systems
Cavien Clever
 
PPTX
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
PDF
IRJET- Intelligence Extraction using Various Machine Learning Algorithms
IRJET Journal
 
PDF
Post Graduate Admission Prediction System
IRJET Journal
 
PDF
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
IRJET Journal
 
PDF
Internship Presentation.pdf
vishwajeetparmar1
 
PDF
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
BAINIDA
 
PDF
Experimenting with Data!
Andrea Montemaggio
 
PDF
Lecture_1_-_Course_Overview_(Inked).pdf
RTEFGDFGJU
 
PPTX
Navigating-the-World-of-Data-Science.pptx
KrAppu
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
Data Science as a Service: Intersection of Cloud Computing and Data Science
Pouria Amirian
 
IRJET- Fault Detection and Prediction of Failure using Vibration Analysis
IRJET Journal
 
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
IRJET Journal
 
IRJET- Comparison of Classification Algorithms using Machine Learning
IRJET Journal
 
Machine learning and big data
Poo Kuan Hoong
 
Big Data, Physics, and the Industrial Internet: How Modeling & Analytics are ...
mattdenesuk
 
Email Spam Detection Using Machine Learning
IRJET Journal
 
CSC1202 Lecture 2 Data Science Processes.pdf
jayashirymorgan
 
Mtc strategy-briefing-houston-pd m-05212018-3
Dania Kodeih
 
SOLIDWORKS reseller Whitepaper by Promedia Systems
Cavien Clever
 
AI Class Topic 3: Building Machine Learning Predictive Systems (Predictive Ma...
Value Amplify Consulting
 
IRJET- Intelligence Extraction using Various Machine Learning Algorithms
IRJET Journal
 
Post Graduate Admission Prediction System
IRJET Journal
 
E-Healthcare monitoring System for diagnosis of Heart Disease using Machine L...
IRJET Journal
 
Internship Presentation.pdf
vishwajeetparmar1
 
Data Analytics in your IoT Solution Fukiat Julnual, Technical Evangelist, Mic...
BAINIDA
 
Experimenting with Data!
Andrea Montemaggio
 
Lecture_1_-_Course_Overview_(Inked).pdf
RTEFGDFGJU
 
Navigating-the-World-of-Data-Science.pptx
KrAppu
 
Ad

More from Analytics8 (9)

PPTX
The Path to Data and Analytics Modernization
Analytics8
 
PPTX
Use Sales Data to Develop a Customer-Centric Sales Approach
Analytics8
 
PPTX
Build a Case for BI with ROI Figures
Analytics8
 
PPTX
Webinar: Develop Workplace Diversity and Inclusion Programs Supported by Data...
Analytics8
 
PPTX
Communicate Data with the Right Visualizations
Analytics8
 
PDF
Building a Data Governance Strategy
Analytics8
 
PPTX
SpendView: Get Full Visibility Of Your Spend | Qonnections 2016
Analytics8
 
PDF
Data model scorecard (Article 5 of 11)
Analytics8
 
PDF
Data model scorecard article 2 of 11
Analytics8
 
The Path to Data and Analytics Modernization
Analytics8
 
Use Sales Data to Develop a Customer-Centric Sales Approach
Analytics8
 
Build a Case for BI with ROI Figures
Analytics8
 
Webinar: Develop Workplace Diversity and Inclusion Programs Supported by Data...
Analytics8
 
Communicate Data with the Right Visualizations
Analytics8
 
Building a Data Governance Strategy
Analytics8
 
SpendView: Get Full Visibility Of Your Spend | Qonnections 2016
Analytics8
 
Data model scorecard (Article 5 of 11)
Analytics8
 
Data model scorecard article 2 of 11
Analytics8
 

Recently uploaded (20)

PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PDF
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Fundamentals and Techniques of Biophysics and Molecular Biology (Pranav Kumar...
RohitKumar868624
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Probability systematic sampling methods.pptx
PrakashRajput19
 

Demystifying Data Science Webinar - February 14, 2018

  • 3. AGENDA 1 2 3 What actually happens on data science projects Data Science vs BI How to get started
  • 4. Data Science = Advanced Analytics What Is Data Science? DATA SCIENCE: a general term that includes the process of obtaining, transforming, analyzing, and communicating data to answer a question. MACHINE LEARNING: using algorithms and computers to do a task better as more data is fed to the algorithm. A primary type of data science analysis and a subset of AI.
  • 5. The 2 Domains of Machine Learning (Unsupervised Learning) Purpose: Finding Groups (Supervised Learning) Purpose: Predictions
  • 6. Real Use Cases • Fraud detection • Credit and pre-payment risk • Likelihood of default • Automatically rebalance portfolios FINANCIAL SERVICES HEALTHCARE • Predict and prevent readmissions • ICU utilization prediction • Predict payer fraud • Likelihood of adhering to regimes INSURANCE • Predict how much claim will cost • Likelihood of converting a quote to a policy • Predict fraudulent claims GOVERNMENT • Predict incoming threats • Proactively block misuse of data • Predict Medicare fraud TRANSPORTATION • Predict wear and tear • Route optimization and schedule planning MANUFACTURING, RETAIL, MAKRETING, CYBER SECURITY, CUSTOMER SERVICE, & MORE!
  • 7. Step 1: Define Project Objectives Infigen Energy seeks to predict the likelihood of failure of the main bearing on each of its 1500 wind turbines. Predicting the failure of a main bearing will save up to $300,000 per year. The results of the model will be used to adjust the turbine speed to extend the life of a bearing while a replacement is planned. The request: use their data to predict turbine failure in the next 30 days.
  • 8. Step 2: Acquire and Explore Data Training Data Turbine Farm Location Make-Model Cycles (M) Last Maint. Date Output 30 Day Mait. Types Events Failed 456 Forsayth 2-26 GE-Alstom-3MW ECO 100 4.7 2/5/2015 400 MW ST AR N 324 Walkaway 2 14-16 GE-Alstom -4.8MW-158 1.9 8/15/2015 375 MW ST GTR N 293 Cherry Tree 8-25 GE-Alstom -3MW ECO 100 5.8 7/13/2016 427 MW OH D Y 323 Woakwine 12-5 GE-Alstom -4.8MW-158 8.6 12/1/2017 275 MW ST NM N
  • 9. Step 2: Acquire and Explore Data Training Data Turbine Farm Location Make-Model Cycles (M) Last Maint. Date Output 30 Day Mait. Types Events Failed 456 Forsayth 2-26 GE-Alstom-3MW ECO 100 4.7 2/5/2015 400 MW ST AR N 324 Walkaway 2 14-16 GE-Alstom -4.8MW-158 1.9 8/15/2015 375 MW ST GTR N 293 Cherry Tree 8-25 GE-Alstom -3MW ECO 100 5.8 7/13/2016 427 MW OH D Y 323 Woakwine 12-5 GE-Alstom -4.8MW-158 8.6 12/1/2017 275 MW ST NM N FEATURES TARGET
  • 10. Skipping Ahead Turbine Farm Location Make-Model Cycles (M) Last Maint. Date Output 30 Day Mait. Types Events Failed 845 Woodline 7-23 GE-Alstom-3MW ECO 100 5.9 5/14/2016 435 MW OH NM ? 466 Lake Bonney 1-19 GE-Alstom -4.8MW-158 6.9 1/22/2014 300 MW ST GTR ? 852 Walkaway 2 6-9 GE-Alstom -4.8MW-158 11.8 9/22/2015 510 MW OH GTR ? 156 Bodangora 14-3 GE-Alstom -4.8MW-158 7.6 6/7/2017 274 MW ST AR ? Production Data before we run our model
  • 11. Skipping Ahead Turbine Farm Location Make-Model Cycles (M) Last Maint. Date Output 30 Day Mait. Types Events Failed 845 Woodline 7-23 GE-Alstom-3MW ECO 100 5.9 5/14/2016 435 MW OH NM .051 466 Lake Bonney 1-19 GE-Alstom -4.8MW-158 6.9 1/22/2014 300 MW ST GTR .062 852 Walkaway 2 6-9 GE-Alstom -4.8MW-158 11.8 9/22/2015 510 MW OH GTR .371 156 Bodangora 14-3 GE-Alstom -4.8MW-158 7.6 6/7/2017 274 MW ST AR .013 Production Data after we run our model Now we can make informed decisions about turbine speed and preventative maintenance!
  • 12. Back to the Process – Step 3: Modeling A “model” is just a math function – the same as we learned about in algebra class Algebra Class Model Machine Learning Model f(x) = y f(model number, type, cycles, . . .) = y Algebra Class Example Machine Learning Example f(x) = 2x+3 f([features]) = [Tensor Flow Neural Network] f(5) =13 f([specific features]) = 0.035 The modeling process uses training data to pick the most appropriate model and determine the parameters for the model.
  • 13. Step 3: Modeling Modeling can be difficult to do without extensive academic training. Software helps. Tweaking a model will only result in small incremental improvements to prediction accuracy. Feature Engineering is where dramatic improvements are possible. Create new features from your existing ones to improve model performance
  • 14. Step 4: Interpret and Communicate Let your BI system help you tell the story. Load model results into your system for deeper analysis. Turbine Farm Make-Model Failed 845 Woodline GE-Alstom-3MW ECO 100 .051 466 Lake Bonney GE-Alstom -4.8MW-158 .062 852 Walkaway 2 GE-Alstom -4.8MW-158 .371 156 Bodangora GE-Alstom -4.8MW-158 .013
  • 16. BI Data Science Data Science vs. Business Intelligence “Operationalized Data Science” Augmented Analytics
  • 17. Business Intelligence Data Science Intent Data accessibility, discovery Finding patterns in data Scope As-was / As-is Future Audience Broad Limited Output Dashboards, Reports PowerPoint Approach Dimensional Algorithmical Technology Relational, Associative Machine Learning Developers COE, Biz IT, Analysts Data Scientists BI vs. Data Science Today
  • 18. Getting Started • Learning more about Data Science • The first projects • Operationalized Data Science (“Augmented Analytics”)
  • 19. Learning More About Data Science STATS TOOLSCODE IT BLACK BOX ENTERPRISE SOFTWARE
  • 20. • Amazon Machine Learning Tutorial • Microsoft Azure Machine Learning Studio • DataRobot Webinars • Wikipedia We’ll send these out after the webinar! Resources
  • 21. Assess and prepare your data - Most companies’ data is a mess – it needs to be prepared for machine learning models. Getting Started – The First Projects Ask yourself: What does it mean to me to make this prediction? Identify business reasons for implementing data science - What value does it provide and how can it be measured? - Build models around your business objectives. Assess staffing needs. Do you have the staff, skills, and resources needed? Make a plan: complete project prioritization matrix
  • 22. Getting Started - Tips 2 1 3 4 Stick with simple models Explore more problems Learn from a sample of data Focus on automation Focus on reducing the time between the data acquisition and the development of the first simple predictive model. Instead of exploring 1 business problem with incredible sophistication, explore many, and build a simple predictive model for each one. Instead of using massive computing resources to process big data, explore data subsamples which will enable the exploration of more hypotheses. Streamline manual processes, and develop algorithms to help automate. Via Harvard Business Review: “Why You’re Not Getting Value from Your Data
  • 23. Moving to Operationalized Data Science • Use your corporate data • Store data in formats for human users AND machines • Use Machine Learning models to prioritize data collection efforts • Hire and use the PhDs to oversight and program design work • Train your data-literate professionals on Machine Learning • Use Machine Learning tools instead of hacking away at code • Use enterprise-class ETL tools to wrangle and organize data • Use enterprise-class BI tools to prepare training data • Use enterprise-class BI tools to tell the story So how do you get there?