SlideShare a Scribd company logo
Building Custom
Machine Learning Algorithms
with Apache SystemML
Fred Reiss
Chief Architect, IBM Spark Technology Center
Member, IBM Academy of Technology
Roadmap
• What is Apache SystemML?
• Demo!
• How to get SystemML
What is Apache SystemML?
Origins of the SystemML Project
20162015
You are
here.
2014201320122011
200920082007
2007-2008: Multiple
projects at IBM
Research – Almaden
involving machine
learning on Hadoop.
2010
2009-2010: Through
engagements with
customers, we observe
how data scientists
create ML solutions.
2009: We form a
dedicated team
for scalable ML
Case Study: An Auto Manufacturer
Warranty
Claims
Repair
History
Diagnostic
Readouts
Predict
Reacquired
Cars
Case Study: An Auto Manufacturer
Warranty
Claims
Repair
History
Features
Labels
Predict
Reacquired
Cars
Machine
Learning
Algorithm
Algorithm
Algorithm
Algorithm
Result: 25x improvement
in precision!
False
Positives
Diagnostic
Readouts
The Iterative Development Process
Build a pipeline
Results
good
enough?
Yes
Customize part
of the pipeline
No
State-of-the-Art: Small Data
R or
Python
Data
Scientist
Personal
Computer
Data
Results
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
State-of-the-Art: Big Data
R or
Python
Data
Scientist
Results
Systems
Programmer
Scala
😞 Days or weeks per iteration
😞 Errors while translating
algorithms
The SystemML Vision
R or
Python
Data
Scientist
Results
SystemML
The SystemML Vision
R or
Python
Data
Scientist
Results
SystemML
😃 Fast iteration
😃 Same answer
200920082007
2007-2008: Multiple
projects at IBM
Research – Almaden
involving machine
learning on Hadoop.
2010
2009-2010: Through
engagements with
customers, we observe
how data scientists
create machine learning
algorithms.
2009: We form a
dedicated team for
scalable ML
2014201320122011
Research
20162015
Apache SystemML
June 2015: IBM
Announces open-
source SystemML
September 2015:
Code available on
Github
November 2015:
SystemML enters
Apache incubation
June 2016:
Second Apache
release (0.10)
February 2016:
First release (0.9) of
Apache SystemML
SystemML at
• Built algorithms for predicting treatment
outcomes
– Substantial improvement in accuracy
• Moved from Hadoop MapReduce to Spark
– SystemML supports both frameworks
– Exact same code
– 300X faster on 1/40th as many nodes
SystemML at Cadent Technology
“SystemML allows Cadent to
implement advanced numerical
programming methods in
Apache Spark, empowering us
to leverage specialized
algorithms in our predictive
analytics software.”
Michael Zargham
Chief Scientist
Cadent is a leading provider of TV
advertising and data solutions,
reaching over 140 million homes
and trusted by the world’s largest
service providers.
Demo!
Demo Scenario
• Application: Targeted ads using demographic
information tied to cookies
• Problem: The information is incomplete
• Solution: Estimate the missing values
– Treat the problem as a matrix completion problem
Data
• The U.S. Census Public Use Microdata Sample
(PUMS) data set for 2010
• 10% sample of the U.S. population
– We’ll use just California today
• Use this full data set to generate synthetic
incomplete data
Demo Scenario
• Application: Identify products that are
complementary (often purchased together)
• Problem: Customers are not currently buying
the best complements at the same time
• Solution: Suggest new product pairings
– Treat the problem as a matrix completion problem
Demographics
Users
i
j
Value of
demographic
field j for
customer i
Matrix Factorization
Top Factor
LeftFactor
Multiply these
two factors to
produce a less-
sparse matrix.
×
New nonzero
values become
interpolated
demographic
information
Demo Part 1: Data wrangling
Demo Part 2: Custom algorithm
Key Points
• SystemML, Spark, and Zeppelin work together
• Linear algebra is great for data science
• Customization is important
How to get Apache SystemML
The Apache SystemML Web Site
http://systemml.apache.org
Download the
binary release!
Try out
some
tutorials!
Browse the
source!
Contribute to
the project!
THANK YOU.
Please try out Apache SystemML!
http://systemml.apache.org
Special thanks to Nakul Jindal and Mike
Dusenberry for helping with the demo!

More Related Content

What's hot (20)

PDF
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
PDF
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
PPTX
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
PDF
Architecting for Data Science
Johann Schleier-Smith
 
PDF
DutchMLSchool. ML for Logistics
BigML, Inc
 
PDF
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Sri Ambati
 
PDF
Thomas Jensen. Machine Learning
Volha Banadyseva
 
PPTX
Machine Learning with Apache Spark
IBM Cloud Data Services
 
PDF
NoSQL (Not Only SQL)
Pouria Amirian
 
PDF
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Formulatedby
 
PDF
Data ops: Machine Learning in production
Stepan Pushkarev
 
PDF
DutchMLSchool. Automating Decision Making
BigML, Inc
 
PDF
H2O World - Building a Smarter Application - Tom Kraljevic
Sri Ambati
 
PDF
DutchMLSchool. Logistic Regression, Deepnets, Time Series
BigML, Inc
 
PPTX
Production and Beyond: Deploying and Managing Machine Learning Models
Turi, Inc.
 
PDF
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
MLconf
 
PDF
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
PDF
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
PDF
Azure Machine Learning
Mostafa
 
PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
Modern Machine Learning Infrastructure and Practices
Will Gardella
 
Building Custom Machine Learning Algorithms With Apache SystemML
Jen Aman
 
Design Patterns for Machine Learning in Production - Sergei Izrailev, Chief D...
Sri Ambati
 
Architecting for Data Science
Johann Schleier-Smith
 
DutchMLSchool. ML for Logistics
BigML, Inc
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
Sri Ambati
 
Thomas Jensen. Machine Learning
Volha Banadyseva
 
Machine Learning with Apache Spark
IBM Cloud Data Services
 
NoSQL (Not Only SQL)
Pouria Amirian
 
Data Science Salon: Kaggle 1st Place in 30 minutes: Putting AutoML to Work wi...
Formulatedby
 
Data ops: Machine Learning in production
Stepan Pushkarev
 
DutchMLSchool. Automating Decision Making
BigML, Inc
 
H2O World - Building a Smarter Application - Tom Kraljevic
Sri Ambati
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
BigML, Inc
 
Production and Beyond: Deploying and Managing Machine Learning Models
Turi, Inc.
 
Elena Grewal, Data Science Manager, Airbnb at MLconf SF 2016
MLconf
 
Experimental Design for Distributed Machine Learning with Myles Baker
Databricks
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
Robert Grossman
 
Azure Machine Learning
Mostafa
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 

Viewers also liked (7)

PDF
The Power of Declarative Analytics
Yunyao Li
 
PDF
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Yunyao Li
 
PPTX
Hyperparameter Optimization - Sven Hafeneger
sparktc
 
PDF
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Yunyao Li
 
PDF
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
PDF
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Douglas Bernardini
 
PDF
S1 DML Syntax and Invocation
Arvind Surve
 
The Power of Declarative Analytics
Yunyao Li
 
Polyglot: Multilingual Semantic Role Labeling with Unified Labels
Yunyao Li
 
Hyperparameter Optimization - Sven Hafeneger
sparktc
 
Transparent Machine Learning for Information Extraction: State-of-the-Art and...
Yunyao Li
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
Hadoop benchmark: Evaluating Cloudera, Hortonworks, and MapR
Douglas Bernardini
 
S1 DML Syntax and Invocation
Arvind Surve
 
Ad

Similar to Building Custom
Machine Learning Algorithms
with Apache SystemML (20)

PDF
Inside Apache SystemML by Frederick Reiss
Spark Summit
 
PDF
SystemML - Datapalooza Denver - 05.17.16 MWD
Mike Dusenberry
 
PPTX
System mldl meetup
Ganesan Narayanasamy
 
PDF
What's new in Apache SystemML - Declarative Machine Learning
Luciano Resende
 
PDF
Apache SystemML - Declarative Large-Scale Machine Learning
Romeo Kienzler
 
PPTX
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
PPTX
System mldl meetup
Ganesan Narayanasamy
 
PDF
Alpine Tech Talk: System ML by Berthold Reinwald
Chester Chen
 
PDF
SystemML - Declarative Machine Learning
Luciano Resende
 
PPTX
Inside Apache SystemML
Frederick Reiss
 
PDF
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
PDF
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
PDF
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 
PDF
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 
PDF
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
PDF
Zementis hortonworks-webinar-2014-09
Hortonworks
 
PPTX
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
PPTX
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
PPTX
[DSC Europe 22] Smart approach in development and deployment process for vari...
DataScienceConferenc1
 
PPTX
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
Inside Apache SystemML by Frederick Reiss
Spark Summit
 
SystemML - Datapalooza Denver - 05.17.16 MWD
Mike Dusenberry
 
System mldl meetup
Ganesan Narayanasamy
 
What's new in Apache SystemML - Declarative Machine Learning
Luciano Resende
 
Apache SystemML - Declarative Large-Scale Machine Learning
Romeo Kienzler
 
2018 03 25 system ml ai and openpower meetup
Ganesan Narayanasamy
 
System mldl meetup
Ganesan Narayanasamy
 
Alpine Tech Talk: System ML by Berthold Reinwald
Chester Chen
 
SystemML - Declarative Machine Learning
Luciano Resende
 
Inside Apache SystemML
Frederick Reiss
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
Overview of Apache SystemML by Berthold Reinwald and Nakul Jindal
Arvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 
Apache SystemML 2016 Summer class primer by Berthold Reinwald
Arvind Surve
 
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Zementis hortonworks-webinar-2014-09
Hortonworks
 
Introduction to Machine Learning - An overview and first step for candidate d...
Lucas Jellema
 
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
[DSC Europe 22] Smart approach in development and deployment process for vari...
DataScienceConferenc1
 
The Art of Intelligence – A Practical Introduction Machine Learning for Oracl...
Lucas Jellema
 
Ad

More from sparktc (12)

PDF
Apache Spark™ Applications the Easy Way - Pierre Borckmans
sparktc
 
PDF
Data Science Hub & the Data Science Community - Philippe Van Impe
sparktc
 
PDF
Data Science and Beer - Kris peeters
sparktc
 
PDF
Holden Karau - Spark ML for Custom Models
sparktc
 
PDF
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
sparktc
 
PDF
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
PDF
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
PPTX
The Internet of Everywhere — How The Weather Company Scales
sparktc
 
PPTX
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
sparktc
 
PDF
STC Design - Engage
sparktc
 
PPTX
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
sparktc
 
PDF
Spark Summit EU: IBM Keynote
sparktc
 
Apache Spark™ Applications the Easy Way - Pierre Borckmans
sparktc
 
Data Science Hub & the Data Science Community - Philippe Van Impe
sparktc
 
Data Science and Beer - Kris peeters
sparktc
 
Holden Karau - Spark ML for Custom Models
sparktc
 
Creating an end-to-end Recommender System with Apache Spark and Elasticsearch...
sparktc
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
DeepLearning4J and Spark: Successes and Challenges - François Garillot
sparktc
 
The Internet of Everywhere — How The Weather Company Scales
sparktc
 
GPU Support in Spark and GPU/CPU Mixed Resource Scheduling at Production Scale
sparktc
 
STC Design - Engage
sparktc
 
How Spark Enables the Internet of Things: Efficient Integration of Multiple ...
sparktc
 
Spark Summit EU: IBM Keynote
sparktc
 

Recently uploaded (20)

PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PDF
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PPTX
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PDF
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
4K Video Downloader Plus Pro Crack for MacOS New Download 2025
bashirkhan333g
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
ChiSquare Procedure in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
MiniTool Power Data Recovery 8.8 With Crack New Latest 2025
bashirkhan333g
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Foundations of Marketo Engage - Powering Campaigns with Marketo Personalization
bbedford2
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
TheFutureIsDynamic-BoxLang witch Luis Majano.pdf
Ortus Solutions, Corp
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 

Building Custom
Machine Learning Algorithms
with Apache SystemML

  • 1. Building Custom Machine Learning Algorithms with Apache SystemML Fred Reiss Chief Architect, IBM Spark Technology Center Member, IBM Academy of Technology
  • 2. Roadmap • What is Apache SystemML? • Demo! • How to get SystemML
  • 3. What is Apache SystemML?
  • 4. Origins of the SystemML Project 20162015 You are here.
  • 6. 200920082007 2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop. 2010 2009-2010: Through engagements with customers, we observe how data scientists create ML solutions. 2009: We form a dedicated team for scalable ML
  • 7. Case Study: An Auto Manufacturer Warranty Claims Repair History Diagnostic Readouts Predict Reacquired Cars
  • 8. Case Study: An Auto Manufacturer Warranty Claims Repair History Features Labels Predict Reacquired Cars Machine Learning Algorithm Algorithm Algorithm Algorithm Result: 25x improvement in precision! False Positives Diagnostic Readouts
  • 9. The Iterative Development Process Build a pipeline Results good enough? Yes Customize part of the pipeline No
  • 10. State-of-the-Art: Small Data R or Python Data Scientist Personal Computer Data Results
  • 11. State-of-the-Art: Big Data R or Python Data Scientist Results Systems Programmer Scala
  • 12. State-of-the-Art: Big Data R or Python Data Scientist Results Systems Programmer Scala 😞 Days or weeks per iteration 😞 Errors while translating algorithms
  • 13. The SystemML Vision R or Python Data Scientist Results SystemML
  • 14. The SystemML Vision R or Python Data Scientist Results SystemML 😃 Fast iteration 😃 Same answer
  • 15. 200920082007 2007-2008: Multiple projects at IBM Research – Almaden involving machine learning on Hadoop. 2010 2009-2010: Through engagements with customers, we observe how data scientists create machine learning algorithms. 2009: We form a dedicated team for scalable ML
  • 17. 20162015 Apache SystemML June 2015: IBM Announces open- source SystemML September 2015: Code available on Github November 2015: SystemML enters Apache incubation June 2016: Second Apache release (0.10) February 2016: First release (0.9) of Apache SystemML
  • 18. SystemML at • Built algorithms for predicting treatment outcomes – Substantial improvement in accuracy • Moved from Hadoop MapReduce to Spark – SystemML supports both frameworks – Exact same code – 300X faster on 1/40th as many nodes
  • 19. SystemML at Cadent Technology “SystemML allows Cadent to implement advanced numerical programming methods in Apache Spark, empowering us to leverage specialized algorithms in our predictive analytics software.” Michael Zargham Chief Scientist Cadent is a leading provider of TV advertising and data solutions, reaching over 140 million homes and trusted by the world’s largest service providers.
  • 20. Demo!
  • 21. Demo Scenario • Application: Targeted ads using demographic information tied to cookies • Problem: The information is incomplete • Solution: Estimate the missing values – Treat the problem as a matrix completion problem
  • 22. Data • The U.S. Census Public Use Microdata Sample (PUMS) data set for 2010 • 10% sample of the U.S. population – We’ll use just California today • Use this full data set to generate synthetic incomplete data
  • 23. Demo Scenario • Application: Identify products that are complementary (often purchased together) • Problem: Customers are not currently buying the best complements at the same time • Solution: Suggest new product pairings – Treat the problem as a matrix completion problem
  • 24. Demographics Users i j Value of demographic field j for customer i Matrix Factorization Top Factor LeftFactor Multiply these two factors to produce a less- sparse matrix. × New nonzero values become interpolated demographic information
  • 25. Demo Part 1: Data wrangling
  • 26. Demo Part 2: Custom algorithm
  • 27. Key Points • SystemML, Spark, and Zeppelin work together • Linear algebra is great for data science • Customization is important
  • 28. How to get Apache SystemML
  • 29. The Apache SystemML Web Site http://systemml.apache.org Download the binary release! Try out some tutorials! Browse the source! Contribute to the project!
  • 30. THANK YOU. Please try out Apache SystemML! http://systemml.apache.org Special thanks to Nakul Jindal and Mike Dusenberry for helping with the demo!