SlideShare a Scribd company logo
Productionizing Data Science at
A Startup's (ongoing) Journey building Machine Learning Products
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Who am I?
• Matt Mills
• Born and raised in Atlanta
• BS in Industrial and Systems Engineering 2014, MS in Analytics 2015
• @statmills or www.statmills.com
Experience's mobile commerce, ticketing, and data
solutions empower sports and entertainment
leaders to generate new revenue streams, sell more
tickets, and make smarter decisions.
www.expapp.com/solutions
What is Experience?
What is Experience?
What is Experience?
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
Data Science at the end of 2016
• ~13 Engineers and 1 (me!) Data Scientist
• What happens to my work?
Manager /
Management
Other
Departments
Partners
Goal for 2017
• Make an Impact on our Customers (Fans)
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
Goal for 2017
• Make an Impact on our Customers (Fans)
Predictive
Model
Influence
Fan
Behavior
Manager /
Management
Other
Departments
Partners
Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
Goal for 2017: Continued
• Create a process to deploy models into production and use
predictions in real time
• Some considerations
• Minimal use of limited Engineering Resources
• Scalable (speed and processing power)
• Cheap, like, super cheap (read: Free)
• Had to handle data cleansing
Some Potential Solutions
Some Potential Solutions
• Build own R/Python Server
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
• Pay for ML Service
Some Potential Solutions
• Build own R/Python Server
• Learn Scala/Spark
• Pay for ML Service
Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
Scaling Experience Data Science with h2o
• ML Algorithms written in pure Java
• APIs written for R, Python, Scala, Spark
• Built for scale
• parallel and distributed out of the box
• Open Source
• Models exportable as Java Objects
to embed in other apps
• Can embed python pre-processing
scripts within the POJO
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
h2o Architecture
https://github.com/h2oai/h2o-meetups/blob/master/2017_09_12_Dublin/
2017_09_12_H2O_Intro_and_AutoML.pdf
h2o Algorithm List
h2o vs scikit-learn Syntax and Process
https://github.com/h2oai/h2o-meetups/blob/master/2015_05_14_H2O_Overview/H2O_Overview.pdf
Experience Production Pipeline
Experience Production Pipeline
Experience Predictive Modeling Pipeline
1. App Sends Data
2. Data Cleaning in Python 3. Predictions done in h2o
4. App Gets Prediction
input()
sys.stdout.flush()
{JSON}
Benefits of Using Open Source Software
Experience Predictive Modeling Setup
Model Deployment
Code
Pulled via Github
Terraform
to create infrastructure
and manage state
Served
via ECS
Dockerize
via Dockerfile and
stored in ECR
Discovery
via Consul
Agenda
• Intro
• Setting the Scene
• Proposed Pipeline
• A Look Back
Pros and Cons of Current Set-Up
Pros
• Automated process to deploy
models into production
• Can iterate models with no/limited
effort from engineering
Cons
• Can only use algorithms available
to h2o (e.g. no multilevel models,
GAMs, Bayesian)
• h2o drives Python, why not the
other way around?
Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
Conclusion and Questions
1. Lack of skills and/or support doesn’t have to stop you from putting models
into production
2. What’s best for your Data Scientists might not be best for your Engineers
and vice-versa
www.statmills.com
? http://docs.h2o.ai/
https://www.expapp.com/about/#careers

More Related Content

PDF
Managing mobile apps with ml df2020
Krzysztof Jackowski
 
PDF
#SITNL 2014 - SAP Tech Ed takeaway
svleuken
 
PDF
User! 2019 best practices for building shiny enterprise applications
Appsilon Data Science
 
PDF
[Jira Day 2018] Requirements Management Automation with Atlassian Stack
Deviniti
 
PDF
SAP Developer Center - March 2016 update
Vitaliy Rudnytskiy
 
PDF
2018-10-23 7 C - Using Graph API to read outlook mail for accounting - Hansam...
aOS Community
 
PPTX
Microsoft Graph – Subscription API
Hansamali Gamage
 
PPTX
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Sri Ambati
 
Managing mobile apps with ml df2020
Krzysztof Jackowski
 
#SITNL 2014 - SAP Tech Ed takeaway
svleuken
 
User! 2019 best practices for building shiny enterprise applications
Appsilon Data Science
 
[Jira Day 2018] Requirements Management Automation with Atlassian Stack
Deviniti
 
SAP Developer Center - March 2016 update
Vitaliy Rudnytskiy
 
2018-10-23 7 C - Using Graph API to read outlook mail for accounting - Hansam...
aOS Community
 
Microsoft Graph – Subscription API
Hansamali Gamage
 
Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications ...
Sri Ambati
 

What's hot (20)

PDF
Shiny.collections - Google Docs-like live collaboration in Shiny!
Marek Rogala
 
PDF
Google Charts for native Android apps
Chuck Greb
 
PPTX
Introduction to graphQL
Muhilvarnan V
 
PDF
Webinar: BI Mobile with SpagoBI: be aware everywhere!
SpagoWorld
 
PDF
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...
Databricks
 
PDF
Massive Streaming Analytics with Spark Streaming
Paolo Platter
 
PDF
2018-10-23 3C - Lean, Scrum and low code approach of SharePoint and O365 proj...
aOS Community
 
PPTX
DevOpsDays Amsterdam 2016 workshop
Arnold Van Wijnbergen
 
PDF
Driverless AI - Intro + Interactive Hands-on Lab
Sri Ambati
 
PDF
Hyc program 17.10
Marek Nawa
 
PDF
[Jira Day 2018] PPM: The Tempo Story
Deviniti
 
PDF
GraphQL Advanced
LeanIX GmbH
 
PPTX
Art of K2 Overview
Jeffry Belnap
 
PDF
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays
 
PDF
Webinar: Free inquiry and Ad hoc reporting with SpagoBI
SpagoWorld
 
PDF
Introduction to Kafka Streams
confluent
 
PPTX
Get Started with Driverless AI Recipes - Hands-on Training
Sri Ambati
 
PDF
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PyData Piraeus
 
PPTX
Fifth elephant 2017 Data Pipeline workshop
Ketan Khairnar
 
PDF
apidays LIVE Paris 2021 - Localizing OpenAPI Specification by Olga Baybakova,...
apidays
 
Shiny.collections - Google Docs-like live collaboration in Shiny!
Marek Rogala
 
Google Charts for native Android apps
Chuck Greb
 
Introduction to graphQL
Muhilvarnan V
 
Webinar: BI Mobile with SpagoBI: be aware everywhere!
SpagoWorld
 
Herding Cats: Migrating Dozens of Oddball Analytics Systems to Apache Spark w...
Databricks
 
Massive Streaming Analytics with Spark Streaming
Paolo Platter
 
2018-10-23 3C - Lean, Scrum and low code approach of SharePoint and O365 proj...
aOS Community
 
DevOpsDays Amsterdam 2016 workshop
Arnold Van Wijnbergen
 
Driverless AI - Intro + Interactive Hands-on Lab
Sri Ambati
 
Hyc program 17.10
Marek Nawa
 
[Jira Day 2018] PPM: The Tempo Story
Deviniti
 
GraphQL Advanced
LeanIX GmbH
 
Art of K2 Overview
Jeffry Belnap
 
apidays LIVE Paris 2021 - Building an analytics API by David Wobrock, Botify
apidays
 
Webinar: Free inquiry and Ad hoc reporting with SpagoBI
SpagoWorld
 
Introduction to Kafka Streams
confluent
 
Get Started with Driverless AI Recipes - Hands-on Training
Sri Ambati
 
H2o.ai presentation at 2nd Virtual Pydata Piraeus meetup
PyData Piraeus
 
Fifth elephant 2017 Data Pipeline workshop
Ketan Khairnar
 
apidays LIVE Paris 2021 - Localizing OpenAPI Specification by Olga Baybakova,...
apidays
 
Ad

Similar to Productionizing Data Science at Experience (20)

PPTX
Maintainable Machine Learning Products
Andrew Musselman
 
PPTX
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
PDF
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
PDF
Building Business Applications in Office 365 SharePoint Online Using Logic Apps
Prashant G Bhoyar (Microsoft MVP)
 
PPTX
2015 Data Science Summit @ dato Review
Hang Li
 
PDF
DevOps for DataScience
Stepan Pushkarev
 
PDF
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
 
PPTX
PWR 106 Business Process Automation for SharePoint
William Huneycutt, II
 
PDF
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
 
PDF
KTern.AI-SAP-DXaaS-Workshop-PLAN
KTern.AI
 
PDF
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
PPTX
SharePoint 2013 Dev Features
Ricardo Wilkins
 
PPTX
Searching for SharePoint Analytics
Jeff Fried
 
PDF
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
PDF
From Developer to Data Scientist - Gaines Kergosien
ITCamp
 
PPTX
vishwa ppt.pptxvishwa ppt.pptxvishwa ppt.pptx
ajayrm685
 
PPTX
Introduction to Agile Hardware
Cprime
 
PDF
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop
 
PPTX
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp
 
Maintainable Machine Learning Products
Andrew Musselman
 
SPSNYC2019 - What is Common Data Model and how to use it?
Nicolas Georgeault
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Building Business Applications in Office 365 SharePoint Online Using Logic Apps
Prashant G Bhoyar (Microsoft MVP)
 
2015 Data Science Summit @ dato Review
Hang Li
 
DevOps for DataScience
Stepan Pushkarev
 
Open Data Science Conference Big Data Infrastructure – Introduction to Hadoop...
DataKitchen
 
PWR 106 Business Process Automation for SharePoint
William Huneycutt, II
 
End-to-End Data Pipelines with Apache Spark
Burak Yavuz
 
KTern.AI-SAP-DXaaS-Workshop-PLAN
KTern.AI
 
Your Roadmap for An Enterprise Graph Strategy
Neo4j
 
SharePoint 2013 Dev Features
Ricardo Wilkins
 
Searching for SharePoint Analytics
Jeff Fried
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Databricks
 
From Developer to Data Scientist - Gaines Kergosien
ITCamp
 
vishwa ppt.pptxvishwa ppt.pptxvishwa ppt.pptx
ajayrm685
 
Introduction to Agile Hardware
Cprime
 
Bluegranite AA Webinar FINAL 28JUN16
Andy Lathrop
 
ITCamp 2019 - Andy Cross - Machine Learning with ML.NET and Azure Data Lake
ITCamp
 
Ad

Recently uploaded (20)

PDF
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PPTX
Presentation about variables and constant.pptx
safalsingh810
 
PPTX
Smart Panchayat Raj e-Governance App.pptx
Rohitnikam33
 
PDF
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PPTX
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 
QAware_Mario-Leander_Reimer_Architecting and Building a K8s-based AI Platform...
QAware GmbH
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
lesson-2-rules-of-netiquette.pdf.bshhsjdj
jasmenrojas249
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Become an Agentblazer Champion Challenge Kickoff
Dele Amefo
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Exploring AI Agents in Process Industries
amoreira6
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
Presentation about variables and constant.pptx
safalsingh810
 
Smart Panchayat Raj e-Governance App.pptx
Rohitnikam33
 
Teaching Reproducibility and Embracing Variability: From Floating-Point Exper...
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
PFAS Reporting Requirements 2026 Are You Submission Ready Certivo.pptx
Certivo Inc
 

Productionizing Data Science at Experience