SlideShare a Scribd company logo
Agile Data Science
INFORMS BIG DATA CONFERENCE
6/23/2013
!
Presented by Joel S Horwitz
Follow me @JSHorwitz
Alpine Data Labs
Agile Manifesto
We are uncovering better ways of developing software by doing it and
helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
We are uncovering better ways of developing software models by doing it
and helping others do it. Through this work we have come to value:
!
Individuals and interactions over processes and tools
Working software models over comprehensive documentation
Customer collaboration over contract negotiation
Responding to change over following a plan
!
That is, while there is value in the items on the right, we value the items on
the left more.
Agile Manifesto
Business Analytics Technologist
Linear workflows in non-agile culture
????
Business Owner
Business Analyst
Data Owner
Data Scientist
Business Stakeholders
< / >
001011
= 5
= 10
Agile is about continuous interactions
Analytics
• Feature Creation
• Model / Scoring
• Evaluation
Technologist
• Wrangling
• Interpretation
• Pipelines
Business
• Presentation
• Deployment
• Productize
Minimally Viable Data Products (MVDP)
crossfilter.js or D3.JS
Tableau
Dashboards
Product
Recommendations
Also, Product
Recommendations
People
Recommendations
One model, many use cases.
Agile Data Science Feedback Loop
Instrumentation
/ Data Collection
Analysis &
Design
Results &
Interpretation
Cultivating
Data Intuition
What do you need?
1. Business Champion
2.Integrated Environment
3. Analytics Ninjas
1. Business Champion(s): Defining the Problem
Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals.
Chief Technology Officer
SVP of Product
Chief Marketing Officer
VP of Sales
Problem statement… Monthly active user count
growth was stagnating. Evidence of where to look…
acquisition funnel, user engagement, and customer loyalty.
Question to answer… How
do I connect each part of
the acquisition funnel?
We started with the acquisition funnel… web visits, downloads, installs, and activations.
2. Environment: Before
• Many data silos and technology limitations
• Data definitions not well defined.
• Multiple data formats.
• No place to build MVDPs at scale.
2. Environment: After
Open Source Technologies
Flume, Sqoop, Oozie, and others
Analytics Sandbox
Relational
Database
Minimally Viable Data Products
Selectively include data
that is Relevant
How was our analytics platform deployed and
connected to the network of systems.
First there was FTP… dump raw log files from web analytics, content delivery network, and
application log files.
!
Second there was MS SQL… Most of the data was in Microsoft SQL databases.
!
Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop.
!
Now there is web based analytics…
What is Hadoop?
• Overview: Apache Hadoop is a framework for running applications on large cluster built of
commodity hardware.
• Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by
Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on
compute nodes throughout a cluster to enable reliable, extremely rapid computations.
• Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a
machine learning system.
3. Analytics Ninjas
Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller
Curiosity
Willingness to learn
Resourceful
Risk Takers
!
Schedule Standups… daily is best depending on the project it can vary
Take notes!
Open dialogue (peer review)
Share knowledge
!
Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success
Create a Wiki
Work backwards from the end goal to the analysis to the data
Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
What is Analytics?
Analytics is the application of computer technology, operational research, and statistics to solve problems in business
and industry.
!
Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net
value, and many other factors.
!
Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data
from our connected world.
!
Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand
database management tools
!
McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per
year and the European public sector €250 billion.
Analytics Example: Platform Monetization
Search Lifetime Value
• 3rd Party Distribution
• SEM
• Organic
Traffic Quality Analysis
• PPC
• PPD
• PPI
• PPA
1. Business Champion
• Sales & Product
2.Integrated Environment
• Web analytics / app data and SQL
Database.
3. Analytics Ninjas
• Search engine marketers, business
analysts, and statisticians.
Examples of Analytics: Campaign Management
1. Business Champion
• Sales, Product, Marketing, and Project
Managers.
2.Integrated Environment
• Data Warehouse, SQL DB, Hadoop, and
Tableau
3. Analytics Ninjas
• Web analytics, product managers,
business analysts, and business
intelligence.
Examples of Analytics: Customer Sentiment
Keywords by Ratings
1. Business Champion
• Product and Marketing
2.Integrated Environment
• Mobile analytics, app logfiles,
and Hadoop.
3. Analytics Ninjas
• Web analytics, product
managers, business analysts,
mobile developers.
What to avoid
Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook
FQL, others?).
Analysis lifecycle…
1. Give me data… import data and ETL it into submission.
2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever
3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)...
known knowns = data puking
(dashboards)
Interestingness rocks! Tell me where to look (I’m
feeling lucky…)
get people to "make love to data" to make
actual good use of it
Viva la revolucion! Data democracy!
Thank you! Any Questions?
Want to jump start your Agile Data Science project? Head over to
http://start.alpinenow.com
Follow me on @JSHorwitz

More Related Content

PPTX
Agile Data Science
Alexander Bauer
 
PDF
Agile Data Science
Volodymyr Kazantsev
 
PDF
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
PDF
Full-stack Data Scientist
Alexey Grigorev
 
PPT
Présentation de JIRA Agile par Stéphane Génin au Kanban Day 2015
French Kanban User Group
 
PPTX
Formation Professional Scrum Master I
Guillaume LAURIE
 
PPTX
Using Git and BitBucket
Medhat Dawoud
 
PDF
Data-centric design and the knowledge graph
Alan Morrison
 
Agile Data Science
Alexander Bauer
 
Agile Data Science
Volodymyr Kazantsev
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Sudeep Das, Ph.D.
 
Full-stack Data Scientist
Alexey Grigorev
 
Présentation de JIRA Agile par Stéphane Génin au Kanban Day 2015
French Kanban User Group
 
Formation Professional Scrum Master I
Guillaume LAURIE
 
Using Git and BitBucket
Medhat Dawoud
 
Data-centric design and the knowledge graph
Alan Morrison
 

What's hot (20)

PPTX
Scrum and JIRA
Mikael Chudinov
 
PDF
Data engineering zoomcamp introduction
Alexey Grigorev
 
PPTX
Data as a Product by Wayne Eckerson
Zoomdata
 
PDF
筋肉によるGoコードジェネレーション
lestrrat
 
PDF
Netflix on Cloud - combined slides for Dev and Ops
Adrian Cockcroft
 
PDF
Getting started with BigQuery
Pradeep Bhadani
 
PDF
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
PPTX
Jira fundamentals
Vitaliy Patsay
 
PDF
Bitbucket and Git
Mohit Shukla
 
PPTX
BigQuery walk through.pptx
VikRam S
 
PDF
Context Aware Recommendations at Netflix
Linas Baltrunas
 
PDF
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
PDF
A Comparative Study of Data Management Maturity Models
Data Crossroads
 
PDF
Delivering Trusted Insights with Integrated Data Quality for Collibra
Precisely
 
PDF
Architecting Agile Data Applications for Scale
Databricks
 
PDF
Session-Based Recommender Systems
Eötvös Loránd University
 
PPTX
Gobernanza de datos - Azure Purview
dbLearner
 
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
PDF
How to Become a Data Scientist
ryanorban
 
PPTX
Agile scrum
Santhu Rao
 
Scrum and JIRA
Mikael Chudinov
 
Data engineering zoomcamp introduction
Alexey Grigorev
 
Data as a Product by Wayne Eckerson
Zoomdata
 
筋肉によるGoコードジェネレーション
lestrrat
 
Netflix on Cloud - combined slides for Dev and Ops
Adrian Cockcroft
 
Getting started with BigQuery
Pradeep Bhadani
 
Recsys 2016 tutorial: Lessons learned from building real-life recommender sys...
Xavier Amatriain
 
Jira fundamentals
Vitaliy Patsay
 
Bitbucket and Git
Mohit Shukla
 
BigQuery walk through.pptx
VikRam S
 
Context Aware Recommendations at Netflix
Linas Baltrunas
 
Déjà Vu: The Importance of Time and Causality in Recommender Systems
Justin Basilico
 
A Comparative Study of Data Management Maturity Models
Data Crossroads
 
Delivering Trusted Insights with Integrated Data Quality for Collibra
Precisely
 
Architecting Agile Data Applications for Scale
Databricks
 
Session-Based Recommender Systems
Eötvös Loránd University
 
Gobernanza de datos - Azure Purview
dbLearner
 
Data Warehousing Trends, Best Practices, and Future Outlook
James Serra
 
How to Become a Data Scientist
ryanorban
 
Agile scrum
Santhu Rao
 
Ad

Viewers also liked (12)

PPT
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
The Hive
 
PDF
Key performance indicators in professional service firms
transentis consulting
 
PPTX
Target Operating Model Research
Genpact Ltd
 
PPT
Target Operating Model Definition
stuart1403
 
PDF
CRISP-DM: a data science project methodology
Sergey Shelpuk
 
PDF
Agile Data Science 2.0
Russell Jurney
 
PDF
Crisp dm
Dardarian78
 
PPTX
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Damian R. Mingle, MBA
 
PDF
Crisp-DM
Aldo Quelopana
 
PDF
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
PPTX
Operating Model
rmuse70
 
PDF
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
Agile Data Science by Russell Jurney_ The Hive_Janruary 29 2014
The Hive
 
Key performance indicators in professional service firms
transentis consulting
 
Target Operating Model Research
Genpact Ltd
 
Target Operating Model Definition
stuart1403
 
CRISP-DM: a data science project methodology
Sergey Shelpuk
 
Agile Data Science 2.0
Russell Jurney
 
Crisp dm
Dardarian78
 
Practical Data Science the WPC Healthcare Strategy for Delivering Meaningful ...
Damian R. Mingle, MBA
 
Crisp-DM
Aldo Quelopana
 
CRISP-DM - Agile Approach To Data Mining Projects
Michał Łopuszyński
 
Operating Model
rmuse70
 
8 Steps to Creating a Data Strategy
Silicon Valley Data Science
 
Ad

Similar to Agile data science (20)

PPTX
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
PDF
Big Data at a Gaming Company: Spil Games
Rob Winters
 
PDF
Why Big Data is Really about Small Data
Hurwitz & Associates
 
PPTX
000 introduction to big data analytics 2021
Dendej Sawarnkatat
 
PDF
Trends in analytics - Feb 2019
Rahul Saxena
 
PPTX
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
MassTLC
 
PDF
Big Data Analytics
Sreedhar Chowdam
 
PDF
Big data Analytics
ShivanandaVSeeri
 
PPTX
Big data insights part i
Raji Gogulapati
 
PPTX
001 More introduction to big data analytics
Dendej Sawarnkatat
 
PPTX
Big Data Analytics
Global Business Solutions SME
 
PDF
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
PDF
Big Data Tools PowerPoint Presentation Slides
SlideTeam
 
PDF
Making the Most of Customer Data
WSO2
 
PPT
Web analyticsandbigdata techweek2011
Raghu Kashyap
 
PDF
Big Data, Little Data, and Everything in Between
xband
 
PDF
Business with Big data
Bruno Curtarelli
 
PPTX
Finance and Accounting BPM
Bob Samuels
 
PDF
Data science and its potential to change business as we know it. The Roadmap ...
InnoTech
 
PPT
Advanced analytics
Shankar R
 
Big Data & Business Analytics: Understanding the Marketspace
Bala Iyer
 
Big Data at a Gaming Company: Spil Games
Rob Winters
 
Why Big Data is Really about Small Data
Hurwitz & Associates
 
000 introduction to big data analytics 2021
Dendej Sawarnkatat
 
Trends in analytics - Feb 2019
Rahul Saxena
 
McKinsey MassTLC Big Data Seminar Keynote - February 28, 2014
MassTLC
 
Big Data Analytics
Sreedhar Chowdam
 
Big data Analytics
ShivanandaVSeeri
 
Big data insights part i
Raji Gogulapati
 
001 More introduction to big data analytics
Dendej Sawarnkatat
 
Big Data Analytics
Global Business Solutions SME
 
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
Big Data Tools PowerPoint Presentation Slides
SlideTeam
 
Making the Most of Customer Data
WSO2
 
Web analyticsandbigdata techweek2011
Raghu Kashyap
 
Big Data, Little Data, and Everything in Between
xband
 
Business with Big data
Bruno Curtarelli
 
Finance and Accounting BPM
Bob Samuels
 
Data science and its potential to change business as we know it. The Roadmap ...
InnoTech
 
Advanced analytics
Shankar R
 

Recently uploaded (20)

PPTX
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PDF
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Introduction to Biostatistics Presentation.pptx
AtemJoshua
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Data-Users-in-Database-Management-Systems (1).pptx
dharmik832021
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Blue Futuristic Cyber Security Presentation.pdf
tanvikhunt1003
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 

Agile data science

  • 1. Agile Data Science INFORMS BIG DATA CONFERENCE 6/23/2013 ! Presented by Joel S Horwitz Follow me @JSHorwitz Alpine Data Labs
  • 2. Agile Manifesto We are uncovering better ways of developing software by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more.
  • 3. We are uncovering better ways of developing software models by doing it and helping others do it. Through this work we have come to value: ! Individuals and interactions over processes and tools Working software models over comprehensive documentation Customer collaboration over contract negotiation Responding to change over following a plan ! That is, while there is value in the items on the right, we value the items on the left more. Agile Manifesto
  • 4. Business Analytics Technologist Linear workflows in non-agile culture ???? Business Owner Business Analyst Data Owner Data Scientist Business Stakeholders < / > 001011 = 5 = 10
  • 5. Agile is about continuous interactions Analytics • Feature Creation • Model / Scoring • Evaluation Technologist • Wrangling • Interpretation • Pipelines Business • Presentation • Deployment • Productize
  • 6. Minimally Viable Data Products (MVDP) crossfilter.js or D3.JS Tableau Dashboards Product Recommendations Also, Product Recommendations People Recommendations One model, many use cases.
  • 7. Agile Data Science Feedback Loop Instrumentation / Data Collection Analysis & Design Results & Interpretation Cultivating Data Intuition
  • 8. What do you need? 1. Business Champion 2.Integrated Environment 3. Analytics Ninjas
  • 9. 1. Business Champion(s): Defining the Problem Executive Sponsor(s) who have a vested interest… ready to take action from results, has an impact to their business goals. Chief Technology Officer SVP of Product Chief Marketing Officer VP of Sales Problem statement… Monthly active user count growth was stagnating. Evidence of where to look… acquisition funnel, user engagement, and customer loyalty. Question to answer… How do I connect each part of the acquisition funnel? We started with the acquisition funnel… web visits, downloads, installs, and activations.
  • 10. 2. Environment: Before • Many data silos and technology limitations • Data definitions not well defined. • Multiple data formats. • No place to build MVDPs at scale.
  • 11. 2. Environment: After Open Source Technologies Flume, Sqoop, Oozie, and others Analytics Sandbox Relational Database Minimally Viable Data Products Selectively include data that is Relevant
  • 12. How was our analytics platform deployed and connected to the network of systems. First there was FTP… dump raw log files from web analytics, content delivery network, and application log files. ! Second there was MS SQL… Most of the data was in Microsoft SQL databases. ! Third there was Hadoop (with Hive)… Engineering and Development backup to Hadoop. ! Now there is web based analytics…
  • 13. What is Hadoop? • Overview: Apache Hadoop is a framework for running applications on large cluster built of commodity hardware. • Storage: Hadoop Distributed File System (HDFS™) is the primary storage system used by Hadoop applications. HDFS creates multiple replicas of data blocks and distributes them on compute nodes throughout a cluster to enable reliable, extremely rapid computations. • Applications:  Apache Hive is a large scale Data Warehouse system and Apache Mahout is a machine learning system.
  • 14. 3. Analytics Ninjas Build a TIGER Team… subject matter expert, modeler, technologist, and storyteller Curiosity Willingness to learn Resourceful Risk Takers ! Schedule Standups… daily is best depending on the project it can vary Take notes! Open dialogue (peer review) Share knowledge ! Centralize and Version Control Your work… files, code, knowledge, and data lineage are key to success Create a Wiki Work backwards from the end goal to the analysis to the data Share, share, and share! Stand on the shoulder of Giants! Why start new, there is plenty of boilerplate to go around.
  • 15. What is Analytics? Analytics is the application of computer technology, operational research, and statistics to solve problems in business and industry. ! Historically, Analytics was heavily used in banking for portfolio assessment using social status, geographical location, net value, and many other factors. ! Today, Analytics is applied to a vast number of industries and is re-emerging due to the phenomenal explosion of data from our connected world. ! Big Data consists of data sets that grow so large and complex that they become awkward to work with using on-hand database management tools ! McKinsey Global Institute estimates that big data analysis could save the American health care system $300 billion per year and the European public sector €250 billion.
  • 16. Analytics Example: Platform Monetization Search Lifetime Value • 3rd Party Distribution • SEM • Organic Traffic Quality Analysis • PPC • PPD • PPI • PPA 1. Business Champion • Sales & Product 2.Integrated Environment • Web analytics / app data and SQL Database. 3. Analytics Ninjas • Search engine marketers, business analysts, and statisticians.
  • 17. Examples of Analytics: Campaign Management 1. Business Champion • Sales, Product, Marketing, and Project Managers. 2.Integrated Environment • Data Warehouse, SQL DB, Hadoop, and Tableau 3. Analytics Ninjas • Web analytics, product managers, business analysts, and business intelligence.
  • 18. Examples of Analytics: Customer Sentiment Keywords by Ratings 1. Business Champion • Product and Marketing 2.Integrated Environment • Mobile analytics, app logfiles, and Hadoop. 3. Analytics Ninjas • Web analytics, product managers, business analysts, mobile developers.
  • 19. What to avoid Its all about plugins for web analytics… Google Analytics, App Stores (iOS, Android, others), Social (Twitter, Facebook FQL, others?). Analysis lifecycle… 1. Give me data… import data and ETL it into submission. 2. What does this data mean… Crunch, Blend, Join, Pivot, Predict, Count, Map, or whatever 3. Show and Tell… Static (powerpoint, excel, etc.) or dynamic (trends, filtering, drilldown). (No more dashboards)... known knowns = data puking (dashboards) Interestingness rocks! Tell me where to look (I’m feeling lucky…) get people to "make love to data" to make actual good use of it Viva la revolucion! Data democracy!
  • 20. Thank you! Any Questions? Want to jump start your Agile Data Science project? Head over to http://start.alpinenow.com Follow me on @JSHorwitz