SlideShare a Scribd company logo
Dato Confidential1
Neel Kishan – Technical Sales Lead
neel@dato.com
Dato Confidential
Hello my name is
Neel Kishan
Technical Sales Lead
(former neuroscientist, GPU programmer,
Eagle Scout, Chicago sports fan)
2
neel@dato.com
Let’s Schedule a Time to Talk:
https://calendly.com/dato-neel
Dato Confidential
We empower developers to
create intelligent applications with
real-time machine learning services
quickly and easily.
Intelligent
Applications
Dato
Platform
GraphLab
Create
Dato
Predictive
Services
Machine
Learning
Lifecycle
Dato Confidential4
Teams have found ways to build
intelligent applications…
Recommenders
Lead Scoring
Churn Prediction
Multi-channel Targeting
Auto-Summarization
Fraud detection
Intrusion Detection
Demand Forecasting
Data Matching
Failure Prediction
Dato Confidential5
Why do these projects take so long?
• Lengthy code rewrites for scalable production services
• Mundane tasks to integrate libraries, transform data to
specific formats, fill in missing values, etc.
• Many tools are just slow
Dato Confidential6
Challenges for developing intelligent apps
• Algorithm-centric APIs create confusion and a steep
learning curve
• Understanding models has been a craft passed only
through tribal knowledge
• Production services are hard to maintain and manage
Dato Confidential
Intuitive APIs
Easy to learn with smart defaults so your first application comes together fast
Deploy instantly as REST
Eliminates the lengthy rewrites to integrate and serve live, at scale
Integrated libraries for any data
Deep learning, graphs, text, and images on a common scalable data structure eliminates all the
glue code and context switching
Dato Machine Learning
Built to rapidly deliver intelligent applications
Dato Confidential
What makes Dato special?
8
Dato Confidential
The Dato Machine Learning Platform
Deploy
Models
Feedback
GraphLab Create &
Dato Distributed
TrainDevelop
Experiments
Dato Predictive Services
Serve
(REST API)
Monitor
www.
on your infrastructure:
GraphLab Create &
Dato Distributed
• Creating models
• Data engineering
• Evaluation &
Visualization
Predictive Services
• Serving models
• Live experimentation
• Model management
Dato Confidential10
Scalable Data Structures for Machine Learning
User Com.
Title Body
User Disc.
SFrame - on-disk, columnar & partitioned table
SGraph – graph structure composed of multiple tables
TimeSeries – table with a time index
Dato Confidential
High performance machine learning
11
0.60%
0.65%
0.70%
0.75%
0.80%
0.85%
0 2 4 6 8 10 12
TestError
Time(hr)
H2O.ai:
10 machines/80 cores
recommenders deep learning & images graph analytics
Faster algorithms accelerate teams
Fails to complete on other systems!
Dato Confidential12
Intuitive API – Easily create a live machine learning service
import graphlab as gl
data = gl.SFrame.read_csv('my_data.csv')
model = gl.recommender.create(
data,
user_id='user',
item_id='movie’,
target='rating')
recommendations = model.recommend(k=5)
cluster = gl.deploy.load(‘s3://path’)
cluster.add(‘servicename’, model)
Create a Recommender
5 lines of code
Toolkit w/auto selection
Deploy in minutes
Dato Confidential13
Dato Machine Learning Toolkits
Applications
• recommender
• sentiment_analysis
• churn_predictor
• data_matching
• pattern_mining
• anomaly_detection
Fundamentals
• regression
• classifier
• nearest_neighbors
• clustering
• deeplearning
• text_analytics
• graph_analytics
Utilities
• model_parameter_search
• cross_validation
• evaluation
• comparison
• feature_engineering
Join us April 7th for a webinar on Deep Learning: Image Similarity and Beyond
Dato Confidential
Demo of GLC & PS
14
Dato Confidential
Deployment scenarios
15
Dato Confidential16
Neel Kishan – Technical Sales Lead
neel@dato.com
Dato Confidential
Appendix
And Supporting Material
Dato Confidential
Dato is becoming the backbone of intelligent applications for 80+ customers
• Commercialization of Carnegie Mellon ML Project founded by Professor
Carlos Guestrin in 2013
• Vibrant user community numbering 40,000+ from Coursera and open
source projects
• Major customers in retail, finance, media, and software
18
Dato Confidential19
Appendix
1919
Deployment Scenarios &
Pricing
Dato Confidential
Machine Learning Deployment Options
20
Dato Predictive Services
Batch write of predictions
Embedded process or script
Export (e.g. PMML)
Dato Confidential
Pricing
• Subscription license
which includes support
and and upgrades
• Licensed by user for
Create & by machine for
production use
• Training & technical
services also available
21
Dato Confidential222222
Use Cases
Dato Confidential23
Our customers are leading
the creation of intelligent
applications
Dato Confidential
Quantifying the value – Fastest to Production & Reduced Operational Cost
Built a 90% accurate sentiment analyzer for hotel reviews after 30 minutes of trying Dato’s
GraphLab Create
Created an efficient (40 mins in Dato vs. 33 days in R) pipeline with 46% lift in accuracy
“[Dato’s] GraphLab CreateTM gives us easy access to some of the most advanced machine
learning and this lets us iterate on our ideas faster”
24
Simplify the process to develop and deploy internal services for SalesForce PDS and adjacent teams
Reduced hundreds of tools to manage, complexity of solution, and development time
Achieved in 2 days with Dato’s GraphLab Create what took 2 weeks in R
Dropped concept to deployment from months to minutes
Replace a heuristic heavy job ranking system to improve job search relevance
Developed in weeks with significant increase in clickthrough after years of no growth
Dato Confidential
Fraud Detection and Security
“Merchant intelligence for safer, more profitable commerce.”
Others like Alan & G2 Web Services:
Alan Krumholz, Principal Data Scientist
Score merchants based on their web presence and actions to help their
banking customers identify fraudulent merchants.
Accelerate business decisions, reducing manual intervention required
and minimizing false positives.
Achieved in 2 days with GraphLab Create what took two weeks in R.
Dropped deployment from months to minutes.
WHO:
INSPIRATION:
VALUE:
OUTCOME:
Customer Success Story
25
Dato Confidential
Data Matching
Customer Success Story
“Fast, free, thorough home search.”
Others like Nick & Zillow:
Nicholas McClure, Senior Data Scientist
Build a service that matches property listings across many inbound data
feeds and collapses to a most accurate listing.
Data & listing quality is critical to Zillow’s core product.
Created an efficient (40 mins in GLC vs. 33 day R pipeline) pipeline with
much higher accuracy (95% up from 65%).
WHO:
INSPIRATION:
VALUE:
OUTCOME:
26
Dato Confidential
Recommenders
Customer Success Story
They are the site for “Advice and support on pregnancy and parenting.”
Others like Shelley & BabyCenter:
Shelley Klopp, DBA & Chief Architect
Build and deploy their first recommender to increase session engagement
by recommending relevant content
Initial model increased average session by multiple page views
First prototype built in < 1 week
Ongoing model experimentation is increasing engagement
WHO:
INSPIRATION:
VALUE:
OUTCOME:
27
Dato Confidential
Sentiment and Text Analysis
Customer Success Story
“Get hired. Love your job.”
Others like Marcos and Glassdoor:
Marcos Sainz, Lead Machine Learning Engineer
Replace a heuristic heavy job ranking system with an ML driven system
to improve job search relevance
More relevant jobs led to happier users and higher clickthrough
Concept to production in weeks
WHO:
INSPIRATION:
VALUE:
OUTCOME:
28
Dato Confidential
Image analytics and Deep features
Customer Success Story
“Smart waste management.”
Others like Ben & Compology:
Ben Chehebar, Co-founder/Lead of Product
Use machine learning to predict how full dumpsters are.
This allows them to augment their human classification using mechanical
turk and allows them to scale their operations.
Concept to deployed service in less than a month with accuracy as good
or better than the humans.
WHO:
INSPIRATION:
VALUE:
OUTCOME:
29

More Related Content

What's hot (20)

PPTX
Getting Started With Dato - August 2015
Turi, Inc.
 
PDF
Dataiku productive application to production - pap is may 2015
Dataiku
 
PPTX
Near realtime AI deployment with huge data and super low latency - Levi Brack...
Sri Ambati
 
PDF
The Rise of the DataOps - Dataiku - J On the Beach 2016
Dataiku
 
PPTX
Production machine learning_infrastructure
joshwills
 
PPTX
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Sri Ambati
 
PDF
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dataiku
 
PDF
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
PDF
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
Sri Ambati
 
PDF
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Ron Bodkin
 
PPTX
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
Sri Ambati
 
PDF
Architecting for Data Science
Johann Schleier-Smith
 
PPTX
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Formulatedby
 
PPTX
Dataiku - From Big Data To Machine Learning
Dataiku
 
PDF
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
Dataconomy Media
 
PDF
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury
 
PDF
MLCommons: Better ML for Everyone
Databricks
 
PDF
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
PPT
Software team linkedin
Prysmian Group
 
PDF
Rakuten - Recommendation Platform
Karthik Murugesan
 
Getting Started With Dato - August 2015
Turi, Inc.
 
Dataiku productive application to production - pap is may 2015
Dataiku
 
Near realtime AI deployment with huge data and super low latency - Levi Brack...
Sri Ambati
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
Dataiku
 
Production machine learning_infrastructure
joshwills
 
Krish Swamy + Balaji Gopalakrishnan, Wells Fargo - Building a World Class Dat...
Sri Ambati
 
The 3 Key Barriers Keeping Companies from Deploying Data Products
Dataiku
 
Intro to Machine Learning with H2O and AWS
Sri Ambati
 
Driverless AI Hands-on Focused on Machine Learning Interpretability - H2O.ai
Sri Ambati
 
Enterprise deep learning lessons bodkin o reilly ai sf 2017
Ron Bodkin
 
HR Analytics: Using Machine Learning to Predict Employee Turnover - Matt Danc...
Sri Ambati
 
Architecting for Data Science
Johann Schleier-Smith
 
Data Science Salon: Introduction to Machine Learning - Marketing Use Case
Formulatedby
 
Dataiku - From Big Data To Machine Learning
Dataiku
 
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
Dataconomy Media
 
Rsqrd AI: How to Design a Reliable and Reproducible Pipeline
Sanjana Chowdhury
 
MLCommons: Better ML for Everyone
Databricks
 
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
Software team linkedin
Prysmian Group
 
Rakuten - Recommendation Platform
Karthik Murugesan
 

Viewers also liked (14)

PPTX
HPE Keynote Hadoop Summit San Jose 2016
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop Graph Processing with Apache Giraph
DataWorks Summit
 
PDF
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
 
PDF
Time Series Analysis with Spark
Sandy Ryza
 
PDF
Apache kudu
Asim Jalis
 
PDF
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 
PDF
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
PDF
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
PPTX
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
 
PPTX
Introduction to Apache Kudu
Jeff Holoman
 
PPTX
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
 
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
PDF
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
 
HPE Keynote Hadoop Summit San Jose 2016
DataWorks Summit/Hadoop Summit
 
Hadoop Graph Processing with Apache Giraph
DataWorks Summit
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Wes McKinney
 
Time Series Analysis with Spark
Sandy Ryza
 
Apache kudu
Asim Jalis
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
rhatr
 
Introducing Apache Giraph for Large Scale Graph Processing
sscdotopen
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Kudu - Fast Analytics on Fast Data
Ryan Bosshart
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
 
Introduction to Apache Kudu
Jeff Holoman
 
The Columnar Era: Leveraging Parquet, Arrow and Kudu for High-Performance Ana...
DataWorks Summit/Hadoop Summit
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Cloudera, Inc.
 
Next-generation Python Big Data Tools, powered by Apache Arrow
Wes McKinney
 
Ad

Similar to Machine Learning with GraphLab Create (20)

PDF
Accelerate ML Deployment with H2O Driverless AI on AWS
Sri Ambati
 
PPTX
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
PDF
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Daniel Zivkovic
 
PPTX
How AI-Powered Search Drives Employee Experience
Lucidworks
 
PDF
D365 Demonstration CRM G Aspiotis
Uni Systems S.M.S.A.
 
PPTX
How to plan your Modern Workplace Project - SPS Denver October 2018
Ammar Hasayen
 
PDF
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
PDF
Digital transformation slideshare
ShivamPatsariya1
 
PDF
Google Cloud Machine Learning
India Quotient
 
PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 
PPTX
The Need for Speed
Capgemini
 
PDF
ChatGPT and not only: how can you use the power of Generative AI at scale
Maxim Salnikov
 
PDF
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
PPTX
It Consulting & Services - Black Basil Technologies
Black Basil Technologies
 
PDF
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
PPT
Open / Drupal Camp Presentation: Brent Bice
LevelTen Interactive
 
PPTX
Webinar: Enterprise Search in 2025
Lucidworks
 
PDF
Using Data Science to Build an End-to-End Recommendation System
VMware Tanzu
 
PDF
SharePoint Inspired 'Get more from your data with Office 365'
Xylos
 
PPTX
Starter Kit for Collaboration from Karuana @ Microsoft IT
Karuana Gatimu
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Sri Ambati
 
Danny Bickson - Python based predictive analytics with GraphLab Create
PyData
 
Building a Data Cloud to enable Analytics & AI-Driven Innovation - Lak Lakshm...
Daniel Zivkovic
 
How AI-Powered Search Drives Employee Experience
Lucidworks
 
D365 Demonstration CRM G Aspiotis
Uni Systems S.M.S.A.
 
How to plan your Modern Workplace Project - SPS Denver October 2018
Ammar Hasayen
 
C19013010 the tutorial to build shared ai services session 1
Bill Liu
 
Digital transformation slideshare
ShivamPatsariya1
 
Google Cloud Machine Learning
India Quotient
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
tsigitnist02
 
The Need for Speed
Capgemini
 
ChatGPT and not only: how can you use the power of Generative AI at scale
Maxim Salnikov
 
Azure AI Foundry: The AI app and agent factory
Maxim Salnikov
 
It Consulting & Services - Black Basil Technologies
Black Basil Technologies
 
Start Getting Your Feet Wet in Open Source Machine and Deep Learning
Ian Gomez
 
Open / Drupal Camp Presentation: Brent Bice
LevelTen Interactive
 
Webinar: Enterprise Search in 2025
Lucidworks
 
Using Data Science to Build an End-to-End Recommendation System
VMware Tanzu
 
SharePoint Inspired 'Get more from your data with Office 365'
Xylos
 
Starter Kit for Collaboration from Karuana @ Microsoft IT
Karuana Gatimu
 
Ad

More from Turi, Inc. (20)

PPTX
Webinar - Analyzing Video
Turi, Inc.
 
PPTX
Webinar - Pattern Mining Log Data - Vega (20160426)
Turi, Inc.
 
PDF
Pattern Mining: Extracting Value from Log Data
Turi, Inc.
 
PPTX
Intelligent Applications with Machine Learning Toolkits
Turi, Inc.
 
PPTX
Text Analysis with Machine Learning
Turi, Inc.
 
PPTX
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Turi, Inc.
 
PDF
Scalable data structures for data science
Turi, Inc.
 
PPTX
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
PDF
Machine learning in production
Turi, Inc.
 
PPTX
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
PPTX
SFrame
Turi, Inc.
 
PPTX
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
PDF
Dato Keynote
Turi, Inc.
 
PDF
New Capabilities in the PyData Ecosystem
Turi, Inc.
 
PPTX
Anomaly Detection Using Isolation Forests
Turi, Inc.
 
PDF
Data! Data! Data! I Can't Make Bricks Without Clay!
Turi, Inc.
 
PPTX
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Turi, Inc.
 
PDF
Pandas & Cloudera: Scaling the Python Data Experience
Turi, Inc.
 
PDF
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
 
PDF
Deep Learning in a Dumpster
Turi, Inc.
 
Webinar - Analyzing Video
Turi, Inc.
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Turi, Inc.
 
Pattern Mining: Extracting Value from Log Data
Turi, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Turi, Inc.
 
Text Analysis with Machine Learning
Turi, Inc.
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Turi, Inc.
 
Scalable data structures for data science
Turi, Inc.
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
Machine learning in production
Turi, Inc.
 
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
SFrame
Turi, Inc.
 
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
Dato Keynote
Turi, Inc.
 
New Capabilities in the PyData Ecosystem
Turi, Inc.
 
Anomaly Detection Using Isolation Forests
Turi, Inc.
 
Data! Data! Data! I Can't Make Bricks Without Clay!
Turi, Inc.
 
Declarative Machine Learning: Bring your own Syntax, Algorithm, Data and Infr...
Turi, Inc.
 
Pandas & Cloudera: Scaling the Python Data Experience
Turi, Inc.
 
Better {ML} Together: GraphLab Create + Spark
Turi, Inc.
 
Deep Learning in a Dumpster
Turi, Inc.
 

Recently uploaded (20)

PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPT
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Presentation - Vibe Coding The Future of Tech
yanuarsinggih1
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Interview paper part 3, It is based on Interview Prep
SoumyadeepGhosh39
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
SFWelly Summer 25 Release Highlights July 2025
Anna Loughnan Colquhoun
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Top iOS App Development Company in the USA for Innovative Apps
SynapseIndia
 

Machine Learning with GraphLab Create

  • 2. Dato Confidential Hello my name is Neel Kishan Technical Sales Lead (former neuroscientist, GPU programmer, Eagle Scout, Chicago sports fan) 2 [email protected] Let’s Schedule a Time to Talk: https://calendly.com/dato-neel
  • 3. Dato Confidential We empower developers to create intelligent applications with real-time machine learning services quickly and easily. Intelligent Applications Dato Platform GraphLab Create Dato Predictive Services Machine Learning Lifecycle
  • 4. Dato Confidential4 Teams have found ways to build intelligent applications… Recommenders Lead Scoring Churn Prediction Multi-channel Targeting Auto-Summarization Fraud detection Intrusion Detection Demand Forecasting Data Matching Failure Prediction
  • 5. Dato Confidential5 Why do these projects take so long? • Lengthy code rewrites for scalable production services • Mundane tasks to integrate libraries, transform data to specific formats, fill in missing values, etc. • Many tools are just slow
  • 6. Dato Confidential6 Challenges for developing intelligent apps • Algorithm-centric APIs create confusion and a steep learning curve • Understanding models has been a craft passed only through tribal knowledge • Production services are hard to maintain and manage
  • 7. Dato Confidential Intuitive APIs Easy to learn with smart defaults so your first application comes together fast Deploy instantly as REST Eliminates the lengthy rewrites to integrate and serve live, at scale Integrated libraries for any data Deep learning, graphs, text, and images on a common scalable data structure eliminates all the glue code and context switching Dato Machine Learning Built to rapidly deliver intelligent applications
  • 9. Dato Confidential The Dato Machine Learning Platform Deploy Models Feedback GraphLab Create & Dato Distributed TrainDevelop Experiments Dato Predictive Services Serve (REST API) Monitor www. on your infrastructure: GraphLab Create & Dato Distributed • Creating models • Data engineering • Evaluation & Visualization Predictive Services • Serving models • Live experimentation • Model management
  • 10. Dato Confidential10 Scalable Data Structures for Machine Learning User Com. Title Body User Disc. SFrame - on-disk, columnar & partitioned table SGraph – graph structure composed of multiple tables TimeSeries – table with a time index
  • 11. Dato Confidential High performance machine learning 11 0.60% 0.65% 0.70% 0.75% 0.80% 0.85% 0 2 4 6 8 10 12 TestError Time(hr) H2O.ai: 10 machines/80 cores recommenders deep learning & images graph analytics Faster algorithms accelerate teams Fails to complete on other systems!
  • 12. Dato Confidential12 Intuitive API – Easily create a live machine learning service import graphlab as gl data = gl.SFrame.read_csv('my_data.csv') model = gl.recommender.create( data, user_id='user', item_id='movie’, target='rating') recommendations = model.recommend(k=5) cluster = gl.deploy.load(‘s3://path’) cluster.add(‘servicename’, model) Create a Recommender 5 lines of code Toolkit w/auto selection Deploy in minutes
  • 13. Dato Confidential13 Dato Machine Learning Toolkits Applications • recommender • sentiment_analysis • churn_predictor • data_matching • pattern_mining • anomaly_detection Fundamentals • regression • classifier • nearest_neighbors • clustering • deeplearning • text_analytics • graph_analytics Utilities • model_parameter_search • cross_validation • evaluation • comparison • feature_engineering Join us April 7th for a webinar on Deep Learning: Image Similarity and Beyond
  • 18. Dato Confidential Dato is becoming the backbone of intelligent applications for 80+ customers • Commercialization of Carnegie Mellon ML Project founded by Professor Carlos Guestrin in 2013 • Vibrant user community numbering 40,000+ from Coursera and open source projects • Major customers in retail, finance, media, and software 18
  • 20. Dato Confidential Machine Learning Deployment Options 20 Dato Predictive Services Batch write of predictions Embedded process or script Export (e.g. PMML)
  • 21. Dato Confidential Pricing • Subscription license which includes support and and upgrades • Licensed by user for Create & by machine for production use • Training & technical services also available 21
  • 23. Dato Confidential23 Our customers are leading the creation of intelligent applications
  • 24. Dato Confidential Quantifying the value – Fastest to Production & Reduced Operational Cost Built a 90% accurate sentiment analyzer for hotel reviews after 30 minutes of trying Dato’s GraphLab Create Created an efficient (40 mins in Dato vs. 33 days in R) pipeline with 46% lift in accuracy “[Dato’s] GraphLab CreateTM gives us easy access to some of the most advanced machine learning and this lets us iterate on our ideas faster” 24 Simplify the process to develop and deploy internal services for SalesForce PDS and adjacent teams Reduced hundreds of tools to manage, complexity of solution, and development time Achieved in 2 days with Dato’s GraphLab Create what took 2 weeks in R Dropped concept to deployment from months to minutes Replace a heuristic heavy job ranking system to improve job search relevance Developed in weeks with significant increase in clickthrough after years of no growth
  • 25. Dato Confidential Fraud Detection and Security “Merchant intelligence for safer, more profitable commerce.” Others like Alan & G2 Web Services: Alan Krumholz, Principal Data Scientist Score merchants based on their web presence and actions to help their banking customers identify fraudulent merchants. Accelerate business decisions, reducing manual intervention required and minimizing false positives. Achieved in 2 days with GraphLab Create what took two weeks in R. Dropped deployment from months to minutes. WHO: INSPIRATION: VALUE: OUTCOME: Customer Success Story 25
  • 26. Dato Confidential Data Matching Customer Success Story “Fast, free, thorough home search.” Others like Nick & Zillow: Nicholas McClure, Senior Data Scientist Build a service that matches property listings across many inbound data feeds and collapses to a most accurate listing. Data & listing quality is critical to Zillow’s core product. Created an efficient (40 mins in GLC vs. 33 day R pipeline) pipeline with much higher accuracy (95% up from 65%). WHO: INSPIRATION: VALUE: OUTCOME: 26
  • 27. Dato Confidential Recommenders Customer Success Story They are the site for “Advice and support on pregnancy and parenting.” Others like Shelley & BabyCenter: Shelley Klopp, DBA & Chief Architect Build and deploy their first recommender to increase session engagement by recommending relevant content Initial model increased average session by multiple page views First prototype built in < 1 week Ongoing model experimentation is increasing engagement WHO: INSPIRATION: VALUE: OUTCOME: 27
  • 28. Dato Confidential Sentiment and Text Analysis Customer Success Story “Get hired. Love your job.” Others like Marcos and Glassdoor: Marcos Sainz, Lead Machine Learning Engineer Replace a heuristic heavy job ranking system with an ML driven system to improve job search relevance More relevant jobs led to happier users and higher clickthrough Concept to production in weeks WHO: INSPIRATION: VALUE: OUTCOME: 28
  • 29. Dato Confidential Image analytics and Deep features Customer Success Story “Smart waste management.” Others like Ben & Compology: Ben Chehebar, Co-founder/Lead of Product Use machine learning to predict how full dumpsters are. This allows them to augment their human classification using mechanical turk and allows them to scale their operations. Concept to deployed service in less than a month with accuracy as good or better than the humans. WHO: INSPIRATION: VALUE: OUTCOME: 29

Editor's Notes

  • #5: Stumbleupon –content tagging (take out Tapjoy) Scruff – content recommendation Glassdoor – personalization TPT – recommender LivingSocial
  • #7: Scalable, performant production services Disconnect between DS and Eng
  • #11: I have struggled to present this. It is really difficult to explain what this is. Only recent that I figured out the reason. It is not 1 thing. It is really 3 or 4 things. - Python API, heavy Pandas inspired. Does a ton of stuff. Also has a rather nice scalable graph datastructure to go with it - A physical storage layer. Heavy compressed column store with type-specific compression routines. Especially aggressive for numeric types. It comes with a file system abstraction (for C++ people fstream, general_fstream) that can read from many places. A special “cache” filesystem which basically is an “in memory file” that dumps to disk when memory gets full. This is how we get compressed in memory performance - And I am not even talking about our Graph Datastructure either. But talk to me if you want to hear more.
  • #14: “join us next week for toolkits”
  • #24: Move this up
  • #25: Deck ends here