June 3rd, 2013BigML Inc, 2013
Challenges to
Make Machine Learning Easy
ACM San Francisco Bay Area Professional Chapter
Francisco J Martin, Ph.D.
BigML Co-founder & CEO
eBay Whitman Campus
June 3rd, 2013BigML Inc, 2013 2
Expert: Published papers at KDD, ICML, NIPS, etc or
developed own ML algorithms used at large scale.
Sampling the Audience
Aficionado: Understands pros/cons of different
techniques and/or can tweak algorithms as needed.
Newbie: Just taking Coursera ML class or reading an
introductory book to ML.
Absolute beginner: ML sounds like science fiction
Practitioner: Very familiar with ML packages (Weka,
Scikit, R, etc).
June 3rd, 2013BigML Inc, 2013 3
Data, data
everywhere
A special report on managing information
Why make ML easy?
In the age of data, Machine
Learning is the key component to:
‣ make data-driven decisions
‣ develop smart applications
‣ build predictive analytics
June 3rd, 2013BigML Inc, 2013
However, Machine Learning is
COMPLEX:
‣tools are complicated and do
not scale well
‣solutions are costly
‣e x p e r t s w i t h i n d u s t r y
experience are scarce
4
Why make ML easy?
http://ttic.uchicago.edu/~samory/
June 3rd, 2013BigML Inc, 2013 5
Why make ML easy?
June 3rd, 2013BigML Inc, 2013 6
Why make ML easy?
April, 2013BigML Inc, 2013 7
BigML
A cloud-based service that makes
Machine Learning SIMPLE
$ bigmler --train customer2012.csv 
--test new_customers.csv 
--objective churn
>>> from bigml.api import BigML
>>> api = BigML()
>>> source = api.create_source("s3://bigml-public/csv/sales.csv")
>>> dataset = api.create_dataset(source)
>>> model = api.create_model(dataset)
$ curl https://bigml.io/model?$BIGML_AUTH 
-X POST 
-H "content-type: application/json" 
-d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'
June 3rd, 2013BigML Inc, 2013 8
Agenda
BigML web-based interface (10-15 min)
Questions (10-15 min)
$ bigmler --train customer2012.csv 
--test new_customers.csv 
--objective churn
>>> from bigml.api import BigML
>>> api = BigML()
>>> source = api.create_source("s3://bigml-public/csv/iris.csv")
>>> dataset = api.create_dataset(source)
>>> model = api.create_model(dataset)
$ curl https://bigml.io/dataset?$BIGML_AUTH 
-X POST 
-H "content-type: application/json" 
-d '{"source": "source/50ca447b3b56356ae0000029"}'
BigML API, API Bindings, BigMLer (5 min)
Challenges (10-15 min)
#1 Machine Learning Breadth and Depth
#2 User Diversity
#3 Simplicity
#4 Scalability
#5 Measuring Impact
#6 Pricing
June 3rd, 2013BigML Inc, 2013 9
How it works
June 3rd, 2013BigML Inc, 2013 10
BigML Resources
csv, arff, xls
https, s3, azure, odata
Sources local and remote
Datasets
Stream histograms
Statistics
Models
Interactive
Compoundable Random Decision Forests
Actionable: exportable to rules, code, pmml
Predictions
Form-based Predictions
Question by Question
Local predictions
Evaluations
Classification
Regression
Comparison
June 3rd, 2013BigML Inc, 2013 11
BigML API
June 3rd, 2013BigML Inc, 2013 12
3,500+ users
35,000+ models
BigML
June 3rd, 2013BigML Inc, 2013 13
FREE subscription?
mail your username to:
acm@bigml.com
June 3rd, 2013BigML Inc, 2013 14
Challenges
#1 Machine Learning breadth and depth
#2 User Diversity
#3 Simplicity
#4 Scalability
#5 Measuring Machine Learning Impact
#6 Pricing
June 3rd, 2013BigML Inc, 2013 15
...or you can deal with that!
#1 Supervised learning
#2 Unsupervised learning
#3 Semi-supervised learning
#4 Reinforcement learning
#5 Learning to Learn
#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 16
...or you can deal with that!#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 17
Phrase a problem as an ML task
The stages of an ML application
Data Wrangling
Feature Engineering
Learn from Data
Pre-evaluate
Measure Impact
June 3rd, 2013BigML Inc, 2013 18
Problems
Techniques
Applications
Classification
Regression
Clustering
Density Estimation
Manifold learning
Active learning
etc.
Just solving a couple of
problems and using a few
techniques thousands of
applications can be developed
churn prevention, date matching, decision making, diagnostics, fraud
detection, detecting tumors, detecting investment opportunities, human
body pose estimation, pedestrian tracking, predictive analytics,
recommendation systems, risk analysis, spam detection, etc
#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 19
Understanding
the past
Predicting the
future
Why Trees first?
June 3rd, 2013BigML Inc, 2013 20
Why Trees?
June 3rd, 2013BigML Inc, 2013 21
A Machine Learning application requires more tasks (that
are even more important) than just learning from data.
Just solving one problem more will enable a huge number of
applications more.
What problem(s) to tackle next and which techniques to
use?
#1 machine learning breadth and depth
June 3rd, 2013BigML Inc, 2013 22
Experts
Aficionados
Practitioner
Newbies
Absolute beginners
#2 user diversity
How to prioritize what to build
next? More features for the
expert or simplifying more for
the newbies?
June 3rd, 2013BigML Inc, 2013 23
Time-to-productivity
+
+
Expertise
#2 user diversity
June 3rd, 2013BigML Inc, 2013 24
#2 user diversity
MBs PBs
MBs PBs
Actual size
Size
Most users believe their data is much bigger than
it really is
June 3rd, 2013BigML Inc, 2013 25
NumberofJobs
+
+
Size of Job
#2 user diversity
June 3rd, 2013BigML Inc, 2013 26
#3 simplicity
June 3rd, 2013BigML Inc, 2013 27
“Any fool can make something
complicated. It takes a genius to
make it simple.”
― Woody Guthrie
#3 simplicity
June 3rd, 2013BigML Inc, 2013 28
‣install
‣configure
‣use
‣train
‣understand
‣test
‣pre-evaluate
‣measure impact
‣deploy
‣scale
‣access programmatically (API)
#3 simplicity
Simple means much more than a easy-to-use interface
June 3rd, 2013BigML Inc, 2013 29
#4 scalability
N
CONCURRENT
JOBS
from
1 CUSTOMER
1 JOB
from
1 USER
N JOBS
from
M CUSTOMERS
June 3rd, 2013BigML Inc, 2013 30
Infrastructure
June 3rd, 2013BigML Inc, 2013 31
#5 measuring machine
learning impact
June 3rd, 2013BigML Inc, 2013 32
Measuring “actual” impact is complex and goes
beyond traditional performance evaluation.
Imagine that an algorithm predicts that user Alice is going
to buy a Magic Potion.
‣ But Magic Potions are out of stock.
‣ Should we blame
‣ the algorithm for the “false positive” prediction?
‣ the data scientist for not including that feature?
‣ operations for running out of stock on things that
customers want to buy?
#5 measuring machine learning impact
June 3rd, 2013BigML Inc, 2013 33
Kiri Wagstaff, Machine Learning that Matters, ICML, 2012
The stages of an ML research program
Very inspirational!!!
June 3rd, 2013BigML Inc, 2013 34
Phrase a problem as an ML task
Data Wrangling
Learn from Data
The stages of an ML application
Feature Engineering
Pre-evaluate
Measure Impact !!!!!
June 3rd, 2013BigML Inc, 2013 35
#6 pricing
June 3rd, 2013BigML Inc, 2013 36
#6 pricing
June 3rd, 2013BigML Inc, 2013 37
Pre-pay-as-you-go
June 3rd, 2013BigML Inc, 2013 38
Subscriptions
June 3rd, 2013BigML Inc, 2013 39
...or you can deal with that!
BigML 1-click model
You can deal
with this...
Machine Learning made easy?
June 3rd, 2013BigML Inc, 2013 40
BigML 1-click model
You can deal
with this...
...or you can deal with that!
Machine Learning made easy?
June 3rd, 2013BigML Inc, 2013 41
Ease-of-use
+
+
2013
Machine Learning made easy?
June 3rd, 2013BigML Inc, 2013 42
Ease-of-use
+
+
2013 2014 2015 2016 2017 2018
Machine Learning made Easy!!!
June 3rd, 2013BigML Inc, 2013 43
Questions
June 3rd, 2013BigML Inc, 2013 44
Unknown Model
f : X -> Y
Example: ideal credit approval formula
Models
M
Example: set of candidate
credit approval formulas
Learning from Data
Learning
Algorithm
Based on Learning from Data by Y. Abu-Mostafa, M. Magdon-Ismail and H. Lin
Final Model
g ~ f
Example: learned credit
approval formula
Training Examples
(x1, l1), (x2, l2), ..., (xN, lN)
Example: historical records of credit customers
x1
xN
labelf1 f2 fn

More Related Content

PDF
MLSEV Virtual. ML: Business Perspective
PPTX
DataRobot - 머신러닝 자동화 플랫폼
PPT
Intelligent Big Data analytics for the future.
PDF
Counter Intuitive Machine Learning for the Industrial Internet of Things
PDF
Industrial Machine Learning (SIGKDD17)
PPTX
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
PDF
Myths and Mathemagical Superpowers of Data Scientists
PDF
DutchMLSchool. Machine Learning: A Business Perspective
MLSEV Virtual. ML: Business Perspective
DataRobot - 머신러닝 자동화 플랫폼
Intelligent Big Data analytics for the future.
Counter Intuitive Machine Learning for the Industrial Internet of Things
Industrial Machine Learning (SIGKDD17)
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
Myths and Mathemagical Superpowers of Data Scientists
DutchMLSchool. Machine Learning: A Business Perspective

What's hot (18)

PDF
DutchMLSchool. Automating Decision Making
PDF
Towards Human-Centered Machine Learning
PPTX
Responsible AI in Industry: Practical Challenges and Lessons Learned
PDF
DN18 | From Counting to Connecting: A Networked and Data-Driven Approach to M...
PDF
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
PDF
楽天技術研究所の次世代AI 技術への挑戦
PDF
MLSEV. Machine Learning: Business Perspective
PDF
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
PDF
Bringing ML To Production, What Is Missing? AMLD 2020
PDF
VSSML18 Introduction to Supervised Learning
PDF
Intelligent Mobility: Machine Learning in the Mobility Industry
PDF
Association Mining
PPTX
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
PDF
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
PDF
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
PDF
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
PDF
DutchMLSchool. Machine Learning End-to-End
PDF
Industrial Machine Learning (at GE)
DutchMLSchool. Automating Decision Making
Towards Human-Centered Machine Learning
Responsible AI in Industry: Practical Challenges and Lessons Learned
DN18 | From Counting to Connecting: A Networked and Data-Driven Approach to M...
DN18 | Technical Debt in Machine Learning | Jaroslaw Szymczak | OLX
楽天技術研究所の次世代AI 技術への挑戦
MLSEV. Machine Learning: Business Perspective
Machine Learning Platformization & AutoML: Adopting ML at Scale in the Enterp...
Bringing ML To Production, What Is Missing? AMLD 2020
VSSML18 Introduction to Supervised Learning
Intelligent Mobility: Machine Learning in the Mobility Industry
Association Mining
Helping data scientists escape the seduction of the sandbox - Krish Swamy, We...
Data Scientist Roles and Responsibilities | Data Scientist Career | Data Scie...
Who is a Data Scientist? | How to become a Data Scientist? | Data Science Cou...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
DutchMLSchool. Machine Learning End-to-End
Industrial Machine Learning (at GE)
Ad

Similar to A few Challenges to Make Machine Learning Easy (20)

PDF
DutchMLSchool. ML Business Perspective
PDF
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
PDF
BSSML17 - Introduction, Models, Evaluations
PDF
VSSML16 LR1. Summary Day 1
PDF
BSSML16 L5. Summary Day 1 Sessions
PDF
Choosing a Machine Learning technique to solve your need
PDF
Past, present and future of predictive APIs - Poul Petersen
PDF
BSSML16 L10. Summary Day 2 Sessions
PDF
BSSML16 L1. Introduction, Models, and Evaluations
PDF
VSSML16 L5. Basic Data Transformations
PDF
Intro to machine learning
PDF
DutchMLSchool. Supervised vs Unsupervised Learning
PDF
BigML Webcast: September 25, 2013
PDF
VSSML17 Review. Summary Day 2 Sessions
PDF
ML.pdf
PDF
Machine learning for IoT - unpacking the blackbox
PPTX
Machine Learning
PDF
AI and ML for Everyone
PDF
Pragmatic Machine Learning @ ML Spain
PPTX
Machine Learning for SEOs - SMXL
DutchMLSchool. ML Business Perspective
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
BSSML17 - Introduction, Models, Evaluations
VSSML16 LR1. Summary Day 1
BSSML16 L5. Summary Day 1 Sessions
Choosing a Machine Learning technique to solve your need
Past, present and future of predictive APIs - Poul Petersen
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L1. Introduction, Models, and Evaluations
VSSML16 L5. Basic Data Transformations
Intro to machine learning
DutchMLSchool. Supervised vs Unsupervised Learning
BigML Webcast: September 25, 2013
VSSML17 Review. Summary Day 2 Sessions
ML.pdf
Machine learning for IoT - unpacking the blackbox
Machine Learning
AI and ML for Everyone
Pragmatic Machine Learning @ ML Spain
Machine Learning for SEOs - SMXL
Ad

More from Pemo Theodore (18)

PDF
David Blumberg, Founding Partner Blumberg Capital: Entrepreneurship, Innovati...
PPTX
Marketing 101 for Startups: How to grow your Business and Talk to Investors a...
PDF
NoPanels
PDF
Wharton sf social impact
PPTX
Keynote wharton socialimpactconf_apr2014
PPTX
Wharton Social Impact Conference: Ecosoc chamber
PPTX
Wharton Social Impact Conference: Building community panel
PDF
US Crowdinvesting Industry
PDF
Equity Crowdfunding Italian Law
PPTX
Alternative Finance: Perspectives, Challenges & Recent Data on Crowdfunding &...
PDF
Regulation of Equity Crowdfunding in Canada
PDF
IPO Bootcamp: From Orrick Panel Event: M&A & IPO Market Update
PPTX
Entrepreneurial lifecycle
PPTX
Hansen Bridgett: Term Sheets & Convertible Notes, Structuring the Deal
PDF
Design with IDEO: Designing Sustainable Human Centered Business Models
PPT
Total access: building and delivering a stand-out investor presentation
PDF
Fundraising Series (Part One) - "Building Your Story"
PPTX
Why are women funded less than men: a crowdsourced conversation
David Blumberg, Founding Partner Blumberg Capital: Entrepreneurship, Innovati...
Marketing 101 for Startups: How to grow your Business and Talk to Investors a...
NoPanels
Wharton sf social impact
Keynote wharton socialimpactconf_apr2014
Wharton Social Impact Conference: Ecosoc chamber
Wharton Social Impact Conference: Building community panel
US Crowdinvesting Industry
Equity Crowdfunding Italian Law
Alternative Finance: Perspectives, Challenges & Recent Data on Crowdfunding &...
Regulation of Equity Crowdfunding in Canada
IPO Bootcamp: From Orrick Panel Event: M&A & IPO Market Update
Entrepreneurial lifecycle
Hansen Bridgett: Term Sheets & Convertible Notes, Structuring the Deal
Design with IDEO: Designing Sustainable Human Centered Business Models
Total access: building and delivering a stand-out investor presentation
Fundraising Series (Part One) - "Building Your Story"
Why are women funded less than men: a crowdsourced conversation

Recently uploaded (20)

PDF
LMS bot: enhanced learning management systems for improved student learning e...
PDF
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
PDF
Ensemble model-based arrhythmia classification with local interpretable model...
PDF
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
PDF
giants, standing on the shoulders of - by Daniel Stenberg
PDF
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
PDF
SaaS reusability assessment using machine learning techniques
PDF
Introduction to MCP and A2A Protocols: Enabling Agent Communication
PDF
Connector Corner: Transform Unstructured Documents with Agentic Automation
PDF
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
PPTX
Module 1 Introduction to Web Programming .pptx
PPTX
Training Program for knowledge in solar cell and solar industry
PPTX
Internet of Everything -Basic concepts details
PDF
A symptom-driven medical diagnosis support model based on machine learning te...
PPTX
MuleSoft-Compete-Deck for midddleware integrations
PDF
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
PDF
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
PDF
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
PDF
NewMind AI Weekly Chronicles – August ’25 Week IV
PDF
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf
LMS bot: enhanced learning management systems for improved student learning e...
Accessing-Finance-in-Jordan-MENA 2024 2025.pdf
Ensemble model-based arrhythmia classification with local interpretable model...
The-2025-Engineering-Revolution-AI-Quality-and-DevOps-Convergence.pdf
giants, standing on the shoulders of - by Daniel Stenberg
zbrain.ai-Scope Key Metrics Configuration and Best Practices.pdf
SaaS reusability assessment using machine learning techniques
Introduction to MCP and A2A Protocols: Enabling Agent Communication
Connector Corner: Transform Unstructured Documents with Agentic Automation
5-Ways-AI-is-Revolutionizing-Telecom-Quality-Engineering.pdf
Module 1 Introduction to Web Programming .pptx
Training Program for knowledge in solar cell and solar industry
Internet of Everything -Basic concepts details
A symptom-driven medical diagnosis support model based on machine learning te...
MuleSoft-Compete-Deck for midddleware integrations
Transform-Your-Supply-Chain-with-AI-Driven-Quality-Engineering.pdf
A hybrid framework for wild animal classification using fine-tuned DenseNet12...
INTERSPEECH 2025 「Recent Advances and Future Directions in Voice Conversion」
NewMind AI Weekly Chronicles – August ’25 Week IV
Transform-Quality-Engineering-with-AI-A-60-Day-Blueprint-for-Digital-Success.pdf

A few Challenges to Make Machine Learning Easy

  • 1. June 3rd, 2013BigML Inc, 2013 Challenges to Make Machine Learning Easy ACM San Francisco Bay Area Professional Chapter Francisco J Martin, Ph.D. BigML Co-founder & CEO eBay Whitman Campus
  • 2. June 3rd, 2013BigML Inc, 2013 2 Expert: Published papers at KDD, ICML, NIPS, etc or developed own ML algorithms used at large scale. Sampling the Audience Aficionado: Understands pros/cons of different techniques and/or can tweak algorithms as needed. Newbie: Just taking Coursera ML class or reading an introductory book to ML. Absolute beginner: ML sounds like science fiction Practitioner: Very familiar with ML packages (Weka, Scikit, R, etc).
  • 3. June 3rd, 2013BigML Inc, 2013 3 Data, data everywhere A special report on managing information Why make ML easy? In the age of data, Machine Learning is the key component to: ‣ make data-driven decisions ‣ develop smart applications ‣ build predictive analytics
  • 4. June 3rd, 2013BigML Inc, 2013 However, Machine Learning is COMPLEX: ‣tools are complicated and do not scale well ‣solutions are costly ‣e x p e r t s w i t h i n d u s t r y experience are scarce 4 Why make ML easy? http://ttic.uchicago.edu/~samory/
  • 5. June 3rd, 2013BigML Inc, 2013 5 Why make ML easy?
  • 6. June 3rd, 2013BigML Inc, 2013 6 Why make ML easy?
  • 7. April, 2013BigML Inc, 2013 7 BigML A cloud-based service that makes Machine Learning SIMPLE $ bigmler --train customer2012.csv --test new_customers.csv --objective churn >>> from bigml.api import BigML >>> api = BigML() >>> source = api.create_source("s3://bigml-public/csv/sales.csv") >>> dataset = api.create_dataset(source) >>> model = api.create_model(dataset) $ curl https://bigml.io/model?$BIGML_AUTH -X POST -H "content-type: application/json" -d '{"dataset": "dataset/50ca447b3b56356ae0000029"}'
  • 8. June 3rd, 2013BigML Inc, 2013 8 Agenda BigML web-based interface (10-15 min) Questions (10-15 min) $ bigmler --train customer2012.csv --test new_customers.csv --objective churn >>> from bigml.api import BigML >>> api = BigML() >>> source = api.create_source("s3://bigml-public/csv/iris.csv") >>> dataset = api.create_dataset(source) >>> model = api.create_model(dataset) $ curl https://bigml.io/dataset?$BIGML_AUTH -X POST -H "content-type: application/json" -d '{"source": "source/50ca447b3b56356ae0000029"}' BigML API, API Bindings, BigMLer (5 min) Challenges (10-15 min) #1 Machine Learning Breadth and Depth #2 User Diversity #3 Simplicity #4 Scalability #5 Measuring Impact #6 Pricing
  • 9. June 3rd, 2013BigML Inc, 2013 9 How it works
  • 10. June 3rd, 2013BigML Inc, 2013 10 BigML Resources csv, arff, xls https, s3, azure, odata Sources local and remote Datasets Stream histograms Statistics Models Interactive Compoundable Random Decision Forests Actionable: exportable to rules, code, pmml Predictions Form-based Predictions Question by Question Local predictions Evaluations Classification Regression Comparison
  • 11. June 3rd, 2013BigML Inc, 2013 11 BigML API
  • 12. June 3rd, 2013BigML Inc, 2013 12 3,500+ users 35,000+ models BigML
  • 13. June 3rd, 2013BigML Inc, 2013 13 FREE subscription? mail your username to: [email protected]
  • 14. June 3rd, 2013BigML Inc, 2013 14 Challenges #1 Machine Learning breadth and depth #2 User Diversity #3 Simplicity #4 Scalability #5 Measuring Machine Learning Impact #6 Pricing
  • 15. June 3rd, 2013BigML Inc, 2013 15 ...or you can deal with that! #1 Supervised learning #2 Unsupervised learning #3 Semi-supervised learning #4 Reinforcement learning #5 Learning to Learn #1 machine learning breadth and depth
  • 16. June 3rd, 2013BigML Inc, 2013 16 ...or you can deal with that!#1 machine learning breadth and depth
  • 17. June 3rd, 2013BigML Inc, 2013 17 Phrase a problem as an ML task The stages of an ML application Data Wrangling Feature Engineering Learn from Data Pre-evaluate Measure Impact
  • 18. June 3rd, 2013BigML Inc, 2013 18 Problems Techniques Applications Classification Regression Clustering Density Estimation Manifold learning Active learning etc. Just solving a couple of problems and using a few techniques thousands of applications can be developed churn prevention, date matching, decision making, diagnostics, fraud detection, detecting tumors, detecting investment opportunities, human body pose estimation, pedestrian tracking, predictive analytics, recommendation systems, risk analysis, spam detection, etc #1 machine learning breadth and depth
  • 19. June 3rd, 2013BigML Inc, 2013 19 Understanding the past Predicting the future Why Trees first?
  • 20. June 3rd, 2013BigML Inc, 2013 20 Why Trees?
  • 21. June 3rd, 2013BigML Inc, 2013 21 A Machine Learning application requires more tasks (that are even more important) than just learning from data. Just solving one problem more will enable a huge number of applications more. What problem(s) to tackle next and which techniques to use? #1 machine learning breadth and depth
  • 22. June 3rd, 2013BigML Inc, 2013 22 Experts Aficionados Practitioner Newbies Absolute beginners #2 user diversity How to prioritize what to build next? More features for the expert or simplifying more for the newbies?
  • 23. June 3rd, 2013BigML Inc, 2013 23 Time-to-productivity + + Expertise #2 user diversity
  • 24. June 3rd, 2013BigML Inc, 2013 24 #2 user diversity MBs PBs MBs PBs Actual size Size Most users believe their data is much bigger than it really is
  • 25. June 3rd, 2013BigML Inc, 2013 25 NumberofJobs + + Size of Job #2 user diversity
  • 26. June 3rd, 2013BigML Inc, 2013 26 #3 simplicity
  • 27. June 3rd, 2013BigML Inc, 2013 27 “Any fool can make something complicated. It takes a genius to make it simple.” ― Woody Guthrie #3 simplicity
  • 28. June 3rd, 2013BigML Inc, 2013 28 ‣install ‣configure ‣use ‣train ‣understand ‣test ‣pre-evaluate ‣measure impact ‣deploy ‣scale ‣access programmatically (API) #3 simplicity Simple means much more than a easy-to-use interface
  • 29. June 3rd, 2013BigML Inc, 2013 29 #4 scalability N CONCURRENT JOBS from 1 CUSTOMER 1 JOB from 1 USER N JOBS from M CUSTOMERS
  • 30. June 3rd, 2013BigML Inc, 2013 30 Infrastructure
  • 31. June 3rd, 2013BigML Inc, 2013 31 #5 measuring machine learning impact
  • 32. June 3rd, 2013BigML Inc, 2013 32 Measuring “actual” impact is complex and goes beyond traditional performance evaluation. Imagine that an algorithm predicts that user Alice is going to buy a Magic Potion. ‣ But Magic Potions are out of stock. ‣ Should we blame ‣ the algorithm for the “false positive” prediction? ‣ the data scientist for not including that feature? ‣ operations for running out of stock on things that customers want to buy? #5 measuring machine learning impact
  • 33. June 3rd, 2013BigML Inc, 2013 33 Kiri Wagstaff, Machine Learning that Matters, ICML, 2012 The stages of an ML research program Very inspirational!!!
  • 34. June 3rd, 2013BigML Inc, 2013 34 Phrase a problem as an ML task Data Wrangling Learn from Data The stages of an ML application Feature Engineering Pre-evaluate Measure Impact !!!!!
  • 35. June 3rd, 2013BigML Inc, 2013 35 #6 pricing
  • 36. June 3rd, 2013BigML Inc, 2013 36 #6 pricing
  • 37. June 3rd, 2013BigML Inc, 2013 37 Pre-pay-as-you-go
  • 38. June 3rd, 2013BigML Inc, 2013 38 Subscriptions
  • 39. June 3rd, 2013BigML Inc, 2013 39 ...or you can deal with that! BigML 1-click model You can deal with this... Machine Learning made easy?
  • 40. June 3rd, 2013BigML Inc, 2013 40 BigML 1-click model You can deal with this... ...or you can deal with that! Machine Learning made easy?
  • 41. June 3rd, 2013BigML Inc, 2013 41 Ease-of-use + + 2013 Machine Learning made easy?
  • 42. June 3rd, 2013BigML Inc, 2013 42 Ease-of-use + + 2013 2014 2015 2016 2017 2018 Machine Learning made Easy!!!
  • 43. June 3rd, 2013BigML Inc, 2013 43 Questions
  • 44. June 3rd, 2013BigML Inc, 2013 44 Unknown Model f : X -> Y Example: ideal credit approval formula Models M Example: set of candidate credit approval formulas Learning from Data Learning Algorithm Based on Learning from Data by Y. Abu-Mostafa, M. Magdon-Ismail and H. Lin Final Model g ~ f Example: learned credit approval formula Training Examples (x1, l1), (x2, l2), ..., (xN, lN) Example: historical records of credit customers x1 xN labelf1 f2 fn