SlideShare a Scribd company logo
Machine Learning Startup
Ben Lackey
Cloud
• Amazon Web Services (AWS)
• Machine Learning
• Seems usable but difficult. The most mature of any cloud vendor.
• Microsoft Azure
• Cortana Analytics
• Seems unusable
• Incredibly complex
• Revolution
• Surprised more isn’t going on with this
• Google Cloud Platform (GCP)
• Cloud Machine Learning – “Today needs a data scientist. Next needs you.”
• Based on Tensor Flow
• Private Beta
• Image Recognition
• Extremely usable
Open Source
• scipy – Probably the most usable thing out there. Actually quite
nice. Doesn’t scale.
• R – Arcane but can be figured out. Doesn’t scale.
• Spark MLlib – Not convinced anyone outside of AMP Lab can use it
properly. Scales quite nicely if you don’t write bad Scala. Everyone
writes bad Scala though. “It’s like Java without semicolons.”
• Tensor Flow – Incomplete and only usable by Google people today.
Can be used by a lay person to multiply large matrices.
Enterprise
• Dato (a pleasure to use)
• H2O.ai (linear regressions)
• Context Relevant
(featurization)
• Arimo
• Alpine Data
• Skytree
• Ayasdi
• Palantir (consulting)
• Scaled Inference
• Data Robot
• Predixion
• Wise.io
• Serial Metrics
• BigML
• Kensho
• Sumo Logic
• Dataiku
• Brighterion
Desired Machine Learning Lifecycle (repeat)
1. Batch data wrangling
2. Data exploration
• Tell me interesting things about the data. I’m not sure what I’m looking for, but I think there’s some
value here.
• Ideally Tableau on steroids
• In real life sometimes just gplot
3. Characterize a supervised learning problem (biggest gap I’ve seen).
4. Build a model
• Feature generation and selection
• Training and validation
5. Deploy a model
• Real time data wrangling and featurization reusing transformations from batch
• Real time scoring
• Proactively detect model degradation
The same old use cases…
• Predictive Maintenance
• GE Predix is going to wipe the floor with you
• Incremental improvements on linear regressions SAS users wrote 30 years
ago
• Fraud Detection
• Up Sell / Cross Sell
• Loss Prediction / Actuarial
• High Frequency Trading
• How about just starting a hedge fund? It’ll be easier than selling
software.
Where is the value?
• Used to be in the hardware
• Still lingering in the software
• Next the data
• Later the features
• Next the model (or features of features)
Why does ML matter?
• I want to predict the future.
• So I can delight my customers by anticipating their desires
• Brooks Brothers service at Walmart prices
• Lower my operating costs through automation
• And perhaps outmaneuver competitors
• I can use my data to do it, now all I need is one of these machine learning
things.
Where’s my machine learning thing?
• Pulls in the data
• Ask it to predict a variable and it figures out how auto magically.
• Maybe even ask it to say what is predictable from the data (mutual
information) and it says so automatically.
• Scales automatically on the cloud so you can throw money at it to
make it go faster.
• Easily embedded in the enterprise (.jar, REST)
• Explicable - you can pull up a web browser, understand what is
happening and explain it to your compliance people
Problems in ML continued…
• Training a model isn’t hard. It is computationally intensive.
• Featurization is hard and computationally intensive. This is a large
part of why neural nets are back (again)
Goal
• Building a standalone business to IPO is unlikely.
• Really want to get bought by a big three cloud provider and
integrated into their ML portfolio.
• Customers want an end to end solution.
• Customers don’t want to think about infrastructure. ML people
aren’t good at IT.
Ideas
• Automatically characterize ML problems and then plug into existing systems
• Machine learning replaces large chunks of my ERP.
• Or my core banking system
• Or billing system
• All very hard. Require lots of domain expertise.
• Consultancy – maybe cloud specific?
• Generate features? Little software package to do that. Aim to build and get
acquired?
• Hosted ML
• Easiest to build on Google.
• Easiest to get bought by Microsoft.
• Method might not matter. Use GBT for all classification problems.
Pricing Model
• Docker image – Free, single node
• Embeds training engine
• Admin webpage
• REST scoring endpoint
• Cloud
• Training – Multitenant, charge for
• Storage
• Compute
• Scoring
• REST endpoint, charge per call
• Export a jar with a time bomb, charge per export (probably not the best enforce
compliance)
• Data Clearinghouse
• Rebate on storage if data set is made public

More Related Content

What's hot (10)

PPTX
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Elad Rosenheim
 
PDF
The high resolution web
Patric Lanhed
 
PDF
Retrofitting Adaptive Designs
Matt Machell
 
PDF
The Event Driven Future Powered by Serverless
mfazal
 
PPTX
Wp7 performance challenges
Gergely Orosz
 
PDF
Uptick presentation
Svetlana Khomko
 
PDF
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
European Innovation Academy
 
PDF
Scalable Deep Learning on AWS using Apache MXNet (May 2017)
Julien SIMON
 
PPTX
When and Where to Embed Business Intelligence
Looker
 
PDF
An AI Bot will Build and Run your next site… eventually
Ronald Ashri
 
Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Elad Rosenheim
 
The high resolution web
Patric Lanhed
 
Retrofitting Adaptive Designs
Matt Machell
 
The Event Driven Future Powered by Serverless
mfazal
 
Wp7 performance challenges
Gergely Orosz
 
Uptick presentation
Svetlana Khomko
 
EIA2017Italy - Danny Lange - Artificial Intelligence - A Game Changer in App ...
European Innovation Academy
 
Scalable Deep Learning on AWS using Apache MXNet (May 2017)
Julien SIMON
 
When and Where to Embed Business Intelligence
Looker
 
An AI Bot will Build and Run your next site… eventually
Ronald Ashri
 

Similar to Machine Learning Startup (20)

PDF
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
PDF
Machine Learning
IBA Group
 
PDF
The Machine Learning Solutions Architect Handbook - 2nd Edition (Early Access...
bibirivania
 
PDF
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
PDF
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
 
PDF
Ml masterclass
Maxwell Rebo
 
PDF
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc
 
PPTX
AI Ml Introduction with images and examples.pptx
bajajrishabh96tech
 
PDF
The Machine Learning Workflow with Azure
Ivo Andreev
 
PDF
DutchMLSchool. ML for Energy Trading and Automotive Sector
BigML, Inc
 
PDF
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
PPTX
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
PDF
ML.pdf
SamuelAwuah1
 
PPTX
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
PDF
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
PPTX
Machine learning
Navdeep Asteya
 
PDF
AWS Machine Learning & Google Cloud Machine Learning
SC5.io
 
PDF
ML crash course
mikaelhuss
 
PPTX
Integrating Machine Learning Capabilities into your team
Cameron Vetter
 
PDF
Predictive apps for startups
Louis Dorard
 
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Machine Learning
IBA Group
 
The Machine Learning Solutions Architect Handbook - 2nd Edition (Early Access...
bibirivania
 
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
 
Ml masterclass
Maxwell Rebo
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
BigML, Inc
 
AI Ml Introduction with images and examples.pptx
bajajrishabh96tech
 
The Machine Learning Workflow with Azure
Ivo Andreev
 
DutchMLSchool. ML for Energy Trading and Automotive Sector
BigML, Inc
 
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Ed Fernandez
 
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
ML.pdf
SamuelAwuah1
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
Machine learning
Navdeep Asteya
 
AWS Machine Learning & Google Cloud Machine Learning
SC5.io
 
ML crash course
mikaelhuss
 
Integrating Machine Learning Capabilities into your team
Cameron Vetter
 
Predictive apps for startups
Louis Dorard
 
Ad

Recently uploaded (20)

PDF
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
PDF
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
PDF
Enhancing Security in VAST: Towards Static Vulnerability Scanning
ESUG
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PDF
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
PPTX
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
PDF
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
PDF
Troubleshooting Virtual Threads in Java!
Tier1 app
 
PDF
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
PDF
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
PDF
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
PPT
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
PPTX
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PDF
Protecting the Digital World Cyber Securit
dnthakkar16
 
PDF
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Virtual Threads in Java: A New Dimension of Scalability and Performance
Tier1 app
 
Supabase Meetup: Build in a weekend, scale to millions
Carlo Gilmar Padilla Santana
 
Enhancing Security in VAST: Towards Static Vulnerability Scanning
ESUG
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
How to Download and Install ADT (ABAP Development Tools) for Eclipse IDE | SA...
SAP Vista, an A L T Z E N Company
 
ASSIGNMENT_1[1][1][1][1][1] (1) variables.pptx
kr2589474
 
Adobe Illustrator Crack Full Download (Latest Version 2025) Pre-Activated
imang66g
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
AWS_Agentic_AI_in_Indian_BFSI_A_Strategic_Blueprint_for_Customer.pdf
siddharthnetsavvies
 
Troubleshooting Virtual Threads in Java!
Tier1 app
 
ChatPharo: an Open Architecture for Understanding How to Talk Live to LLMs
ESUG
 
Salesforce Pricing Update 2025: Impact, Strategy & Smart Cost Optimization wi...
GetOnCRM Solutions
 
Salesforce Implementation Services Provider.pdf
VALiNTRY360
 
Why Reliable Server Maintenance Service in New York is Crucial for Your Business
Sam Vohra
 
classification of computer and basic part of digital computer
ravisinghrajpurohit3
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Protecting the Digital World Cyber Securit
dnthakkar16
 
Step-by-Step Guide to Install SAP HANA Studio | Complete Installation Tutoria...
SAP Vista, an A L T Z E N Company
 
Ad

Machine Learning Startup

  • 2. Cloud • Amazon Web Services (AWS) • Machine Learning • Seems usable but difficult. The most mature of any cloud vendor. • Microsoft Azure • Cortana Analytics • Seems unusable • Incredibly complex • Revolution • Surprised more isn’t going on with this • Google Cloud Platform (GCP) • Cloud Machine Learning – “Today needs a data scientist. Next needs you.” • Based on Tensor Flow • Private Beta • Image Recognition • Extremely usable
  • 3. Open Source • scipy – Probably the most usable thing out there. Actually quite nice. Doesn’t scale. • R – Arcane but can be figured out. Doesn’t scale. • Spark MLlib – Not convinced anyone outside of AMP Lab can use it properly. Scales quite nicely if you don’t write bad Scala. Everyone writes bad Scala though. “It’s like Java without semicolons.” • Tensor Flow – Incomplete and only usable by Google people today. Can be used by a lay person to multiply large matrices.
  • 4. Enterprise • Dato (a pleasure to use) • H2O.ai (linear regressions) • Context Relevant (featurization) • Arimo • Alpine Data • Skytree • Ayasdi • Palantir (consulting) • Scaled Inference • Data Robot • Predixion • Wise.io • Serial Metrics • BigML • Kensho • Sumo Logic • Dataiku • Brighterion
  • 5. Desired Machine Learning Lifecycle (repeat) 1. Batch data wrangling 2. Data exploration • Tell me interesting things about the data. I’m not sure what I’m looking for, but I think there’s some value here. • Ideally Tableau on steroids • In real life sometimes just gplot 3. Characterize a supervised learning problem (biggest gap I’ve seen). 4. Build a model • Feature generation and selection • Training and validation 5. Deploy a model • Real time data wrangling and featurization reusing transformations from batch • Real time scoring • Proactively detect model degradation
  • 6. The same old use cases… • Predictive Maintenance • GE Predix is going to wipe the floor with you • Incremental improvements on linear regressions SAS users wrote 30 years ago • Fraud Detection • Up Sell / Cross Sell • Loss Prediction / Actuarial • High Frequency Trading • How about just starting a hedge fund? It’ll be easier than selling software.
  • 7. Where is the value? • Used to be in the hardware • Still lingering in the software • Next the data • Later the features • Next the model (or features of features)
  • 8. Why does ML matter? • I want to predict the future. • So I can delight my customers by anticipating their desires • Brooks Brothers service at Walmart prices • Lower my operating costs through automation • And perhaps outmaneuver competitors • I can use my data to do it, now all I need is one of these machine learning things.
  • 9. Where’s my machine learning thing? • Pulls in the data • Ask it to predict a variable and it figures out how auto magically. • Maybe even ask it to say what is predictable from the data (mutual information) and it says so automatically. • Scales automatically on the cloud so you can throw money at it to make it go faster. • Easily embedded in the enterprise (.jar, REST) • Explicable - you can pull up a web browser, understand what is happening and explain it to your compliance people
  • 10. Problems in ML continued… • Training a model isn’t hard. It is computationally intensive. • Featurization is hard and computationally intensive. This is a large part of why neural nets are back (again)
  • 11. Goal • Building a standalone business to IPO is unlikely. • Really want to get bought by a big three cloud provider and integrated into their ML portfolio. • Customers want an end to end solution. • Customers don’t want to think about infrastructure. ML people aren’t good at IT.
  • 12. Ideas • Automatically characterize ML problems and then plug into existing systems • Machine learning replaces large chunks of my ERP. • Or my core banking system • Or billing system • All very hard. Require lots of domain expertise. • Consultancy – maybe cloud specific? • Generate features? Little software package to do that. Aim to build and get acquired? • Hosted ML • Easiest to build on Google. • Easiest to get bought by Microsoft. • Method might not matter. Use GBT for all classification problems.
  • 13. Pricing Model • Docker image – Free, single node • Embeds training engine • Admin webpage • REST scoring endpoint • Cloud • Training – Multitenant, charge for • Storage • Compute • Scoring • REST endpoint, charge per call • Export a jar with a time bomb, charge per export (probably not the best enforce compliance) • Data Clearinghouse • Rebate on storage if data set is made public