SlideShare a Scribd company logo
Bringing an AI Ecosystem to the
Domain Expert and Enterprise AI
Developer
Fred Reiss
Vijay Bommireddipalli
IBM Center for Open-Source Data & AI Technologies
(http://codait.org)
1
2
IBM’s history of strong AI leadership
1997: Deep Blue
• Deep Blue became the first machine to beat a world chess
champion in tournament play
2011: Jeopardy!
• Watson beat two top
Jeopardy! champions
1968, 2001: A Space Odyssey
• IBM was a technical
advisor
• HAL is “the latest in
machine intelligence”
2018: Open Tech, AI & emerging
standards
• New IBM centers of gravity for AI
• OS projects increasing exponentially
• Emerging global standards in AI
3
Center for Open Source
Data and AI Technologies
• CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
• Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
• codait (French)
• = coder/coded
• https://m.interglot.co
m/fr/en/codait
• CODAIT
• codait.org
4
CODAIT by the numb3rs
• The team contributes to over 10 open source projects.
These projects include - Spark, Tensorflow, Keras,
SystemML, Arrow, Bahir, Toree, Livy, Zeppelin, R4ML,
Stocator, Jupyter Enterprise Gateway
• 17 committers and many contributors in Apache projects-
Spark, Arrow, systemML, Bahir, Toree, Livy
• Over 980 JIRAs and 50,000 lines of code committed to
Apache Spark itself, and Over 65,000 LoC into
SystemML
– Established IBM as the number 1 contributor to Spark
Machine Learning in Spark 2.0 release
• Over 25 product lines within IBM leveraging Apache
Spark in some form or another. CODAIT engineers have
interacted and interlocked with many of them.
• Speakers at over 100 conferences, MeetUps, un-
conferences etc.
• codait (French)
• = coder/coded
• https://m.interglot.co
m/fr/en/codait
Spark code contribution growth by
week
• CODAIT
• codait.org
Improving the Enterprise AI lifecycle in Open Source
5
Center for Open Source
Data and AI Technologies
• Code - Build and improve practical
frameworks to enable more developers
to realize immediate value (e.g. FfDL,
Tensorflow Jupyter, Spark)
• Content – Showcase solutions to
complex and real world AI problems
• Community – Bring developers and
data scientists to engage with IBM (e.g.
MAX)
• codait (French)
• = coder/coded
• https://m.interglot.co
m/fr/en/codait
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
CODAIT
codait.org
6
Fabric for Deep Learning
https://github.com/IBM/FfDL
https://www.youtube.com/watch?v=nQsY
WmkfLP4
• FfDL provides a scalable, resilient, and fault tolerant
deep-learning framework
• Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’)
is an open source project which aims at making Deep
Learning easily accessible to the people it matters the
most i.e. Data Scientists, and AI developers.
• FfDL Provides a consistent way to deploy, train and
visualize Deep Learning jobs across multiple frameworks
like TensorFlow, Caffe, PyTorch, Keras etc.
• FfDL is being developed in close collaboration with IBM
Research and IBM Watson. It forms the core of
Watson`s Deep Learning service in open source.
• FfDL Github Page
https://github.com/IBM/FfDL
FfDL dwOpen Page
https://developer.ibm.com/code/open/proj
ects/fabric-for-deep-learning-ffdl/
FfDL Announcement Blog
http://developer.ibm.com/code/2018/03/20
/fabric-for-deep-learning
FfDL Technical Architecture Blog
http://developer.ibm.com/code/2018/03/20
/democratize-ai-with-fabric-for-deep-
learning
Deep Learning as a Service within
Watson Studio
https://www.ibm.com/cloud/deep-learning
• Research paper: “Scalable Multi-
Framework Management of Deep
Learning Training Jobs”
http://learningsys.org/nips17/assets/paper
s/paper_29.pdf
•
FfDL
Jupyter Enterprise
Gateway
March 30 2018 / © 2018 IBM Corporation
• Jupyter Enterprise Gateway at IBM Code
• https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/
• Jupyter Enterprise Gateway source code at
GitHub
• https://github.com/jupyter-incubator/enterprise_gateway
• Jupyter Enterprise Gateway Documentation
• http://jupyter-enterprise-gateway.readthedocs.io/en/latest/
7
• A lightweight, multi-tenant, scalable
and secure gateway that enables
Jupyter Notebooks to share
resources across an Apache Spark
or Kubernetes cluster for
Enterprise/Cloud use cases
Road Map
• Background: Deep Learning Models
• The IBM Code Model Asset Exchange
• Demo
• What’s Next
8
CODAIT: Enabling End-to-End AI in
the Enterprise
9
Gather
Data
Analyze
Data
Machine
Learning
Deep
Learning
Deploy
Model
Maintain
Model
Python
Data Science
Stack
Fabric for
Deep Learning
(FfDL)
Mleap +
PFA
Scikit-LearnPandas
Apache
Spark
Apache
Spark
Jupyter
Model
Asset
eXchange
Keras +
Tensorflow
10
Making AI as
Ubiquitous as
the Telephone
11
This talk is about enabling domain experts
to use deep learning in the enterprise.
Q: What is deep learning?
A: Machine learning using deep neural
networks.
Q: What is a deep neural network?
A: A neural network with multiple hidden
layers.
12
Q: What is a neural network?
What is a neural network?
13
! = #$%+ &$'+ &$(
x1
x2
x3
y
a
b
c
Linear regression
What is a neural network?
Multiple linear
regressions at the same
time
14
x1
x2
x3
y3
y1
y4
y2
What is a neural network?
15
Dense
(3×4)
Dense
(4×2)
Input
(3)
Output
(2)
Same network in a more
compact notation
Multilayer Perceptron
Neural Network
Second layer
of linear
regressions
16
Dense
(3×4)
Dense
(4×2)
Input
(3)
Output
(2)
Q: What is a deep neural network?
A: A neural network with multiple
hidden layers.
Q: What is a deep neural network?
A: A neural network with multiple
hidden layers.
17
Dense
(3×8)
Dense
(8×6)
Input
(3)
Output
(2)Dense
(6×4)
Dense
(4×2)
Q: What is
deep learning?
A: Machine
learning using
deep neural
networks.
18
InceptionV3 Convolutional Neural Net
(A “medium-sized” deep learning model)
Image Source:
https://github.com/tensorflow/models/blob/master/research/inception/g3doc
/inception_v3_architecture.png
Characteristics of Deep Learning (1)
• State-of-the-Art
prediction quality in
many domains
– Image classification
– Machine translation
– Facial recognition
– Time series
prediction
– Many more
19
Characteristics of Deep Learning (2)
• Large, complex models
– Model size generally determined by “how big a model can
you fit on your device?”
20
Each box ≈ between
32 and 768 linear
regression models
InceptionV3 Convolutional Neural Net
(A “medium-sized” deep learning model)
Characteristics of Deep Learning (3)
Poorly understood today
…even by experts
– Why do the models
converge?
– Why do the models
converge with low loss?
– Why do the models
generalize?
21
Focus of this Talk
Incorporating well-
understood deep
learning models into
enterprise applications.
22
23
Sounds easy!
“cat”
The Parts of a Deep Learning Model
24
Dense
(3×8)
Dense
(8×6)
Input
(3)
Output
(2)Dense
(6×4)
Dense
(4×2)
Neural Network
Graph
Weights
(not to scale)
Driver Program
Example: Get an Image Classifier
Step 1: Find a suitable
neural network graph.
– Need to read some
papers
25
Example: Get an Image Classifier
Step 2: Find code to generate the neural network
graph
26
TensorFlow code to build ResNet50 neural network graph
Example: Get an Image Classifier
Step 3: Find some pre-
trained weights for your
graph
27
Caffe2 ResNet50 model weights*
* Caffe2 only. Find a different binary file if your framework is not Caffe2.
Example: Get an Image Classifier
Step 4: Find example code that performs model
inference
28
TensorFlow code for training and batch inference* on ResNet50
* Single-crop inference only. Additional code required to use multiple crops.
Example: Get an Image Classifier
Step 5: Write your own code to perform model
inference on one image at a time
Step 6: Package your inference code, graph
creation code, and pre-trained weights together
Step 7: Deploy your package
29
Model Marketplaces
• Collections of well-understood deep learning
models
• Provide a central place to find known-good
implementations of these models
30
The IBM Code Model Asset eXchange
• Free, open-source models.
• Wide variety of domains.
• Multiple deep learning
frameworks.
• Vetted and tested code and
IP.
• Build and deploy a web
service in 30 seconds.
• Start training on Watson
Studio in minutes.
31
32
• Demo!
Model Asset eXchange: Summary
• Free, open-source models.
• Wide variety of domains.
• Multiple deep learning
frameworks.
• Vetted and tested code and
IP.
• Build and deploy a web
service in 30 seconds.
• Start training on Watson
Studio in minutes.
33
Model Asset eXchange: What’s Next
• More models
• More deployment
options
• Code Patterns
showing how to use
the models (including
today’s demo!)
34
35
Thank you!
• http://codait.org
• https://developer.ibm.com/code/
exchanges/models/
• github.com/codait
• developer.ibm.com/code
Call for Code inspires developers
to solve pressing global problems
with sustainable software
solutions, delivering
on their vast potential to do good.
Bringing together NGOs, academic
institutions, enterprises, and
startup developers to compete
build effective disaster mitigation
solutions, with a focus on health
and well-being.
International Federation of Red
Cross/Red Crescent, The
American Red Cross, and the
United Nations Office of Human
Rights combine for the Call for
Code Award to elevate the profile
of developers.
Award winners will receive long-term
support through open source
foundations, financial prizes, the
opportunity to present their solution
to leading VCs, and will deploy their
solution through IBM’s Corporate
Service Corps.
Developers will jump-start their
project with dedicated IBM Code
Patterns, combined with optional
enterprise technology to build
projects over the course of three
months.
Judged by the world’s most renowned
technologists, the grand prize will be
presented in October at an Award
Event.
developer.ibm.com/callforcode
Date, Time, Location & Duration Session title and Speaker
Tue, June 5 | 11 AM
2010-2012, 30 mins
Productionizing Spark ML Pipelines with the Portable Format for Analytics
Nick Pentreath (IBM)
Tue, June 5 | 2 PM
2018, 30 mins
Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing!
Holden Karau (Google) Bryan Cutler (IBM)
Tue, June 5 | 2 PM
Nook by 2001, 30 mins
Making Data and AI Accessible for All
Armand Ruiz Gabernet (IBM)
Tue, June 5 | 2:40 PM
2002-2004, 30 mins
Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System
Rajesh Bordawekar (IBM T.J. Watson Research Center)
Tue, June 5 | 3:20 PM
3016-3022, 30 mins
Dynamic Priorities for Apache Spark Application’s Resource Allocations
Michael Feiman (IBM Spectrum Computing) Shinnosuke Okada (IBM Canada Ltd.)
Tue, June 5 | 3:20 PM
2001-2005, 30 mins
Model Parallelism in Spark ML Cross-Validation
Nick Pentreath (IBM) Bryan Cutler (IBM)
Tue, June 5 | 3:20 PM
2007, 30 mins
Serverless Machine Learning on Modern Hardware Using Apache Spark
Patrick Stuedi (IBM)
Tue, June 5 | 5:40 PM
2002-2004, 30 mins
Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation Engine;
Sourav Mazumder (IBM Analytics) Aradhna Tiwari (University of South Florida)
Tue, June 5 | 5:40 PM
2007, 30 mins
Transparent GPU Exploitation on Apache Spark
Dr. Kazuaki Ishizaki (IBM) Madhusudanan Kandasamy (IBM)
Tue, June 5 | 5:40 PM
2009-2011, 30 mins
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for Deep Neural Networks
Yonggang Hu (IBM) Chao Xue (IBM)
IBM Sessions at Spark+AI Summit 2018 (Tuesday, June 5)
37
Date, Time, Location & Duration Session title and Speaker
Wed, June 6 | 12:50 PM Birds of a Feather: Apache Arrow in Spark and More
Bryan Cutler (IBM) Li Jin (Two Sigma Investments, LP)
Wed, June 6 | 2 PM
2002-2004, 30 mins
Deep Learning for Recommender Systems
Nick Pentreath (IBM) )
Wed, June 6 | 3:20 PM
2018, 30 mins
Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer
Frederick Reiss (IBM) Vijay Bommireddipalli (IBM Center for Open-Source Data & AI Technologies)
IBM Sessions at Spark+AI Summit 2018 (Wednesday, June 6)
38
Meet us at IBM booth in the Expo area.
39
Thank you!
http://codait.org
https://developer.ibm.com/code/exchan
ges/models/
https://www.linkedin.com/in/fred-reiss/
https://www.linkedin.com/in/vijayrb
github.com/codait
developer.ibm.com/code
40

More Related Content

PDF
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
PDF
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Databricks
 
PDF
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
PDF
Fast and Scalable Python
Travis Oliphant
 
PDF
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
 
PPTX
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
PDF
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
PPTX
Python in the Hadoop Ecosystem (Rock Health presentation)
Uri Laserson
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
Speeding Up Spark with Data Compression on Xeon+FPGA with David Ojika
Databricks
 
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Fast and Scalable Python
Travis Oliphant
 
An Online Spark Pipeline: Semi-Supervised Learning and Automatic Retraining w...
Databricks
 
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
Scalable Ensemble Machine Learning @ Harvard Health Policy Data Science Lab
Sri Ambati
 
Python in the Hadoop Ecosystem (Rock Health presentation)
Uri Laserson
 

What's hot (16)

PDF
New Developments in H2O: April 2017 Edition
Sri Ambati
 
PDF
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
 
PDF
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 
PDF
Accelerating Data Analysis of Brain Tissue Simulations with Apache Spark with...
Databricks
 
PDF
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Databricks
 
PDF
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
PDF
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
 
PDF
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Databricks
 
PDF
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
PDF
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
PPTX
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
PDF
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
PDF
Introduction to GPUs for Machine Learning
Sri Ambati
 
PPTX
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
PDF
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Databricks
 
PDF
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
 
New Developments in H2O: April 2017 Edition
Sri Ambati
 
Pandas UDF: Scalable Analysis with Python and PySpark
Li Jin
 
Building a Unified Data Pipeline with Apache Spark and XGBoost with Nan Zhu
Databricks
 
Accelerating Data Analysis of Brain Tissue Simulations with Apache Spark with...
Databricks
 
Deep Learning to Big Data Analytics on Apache Spark Using BigDL with Xianyan ...
Databricks
 
Writing Continuous Applications with Structured Streaming in PySpark
Databricks
 
What's New in Apache Spark 2.3 & Why Should You Care
Databricks
 
Just-in-Time Analytics and the Need for Autonomous Database Administration wi...
Databricks
 
BigDL: Bringing Ease of Use of Deep Learning for Apache Spark with Jason Dai ...
Databricks
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark with Ma...
Databricks
 
Distributed Deep Learning on Hadoop Clusters
DataWorks Summit/Hadoop Summit
 
Fast and Reliable Apache Spark SQL Releases
DataWorks Summit
 
Introduction to GPUs for Machine Learning
Sri Ambati
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
Build Large-Scale Data Analytics and AI Pipeline Using RayDP
Databricks
 
How to use Parquet as a Sasis for ETL and Analytics
DataWorks Summit
 
Ad

Similar to Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer with Fred Reiss and Vijay Bommireddipalli (20)

PDF
Open Source AI - News and examples
Luciano Resende
 
PPTX
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Nick Pentreath
 
PPTX
IBM Developer Model Asset eXchange
Nick Pentreath
 
PPTX
Defend against adversarial AI using Adversarial Robustness Toolbox
Animesh Singh
 
PPTX
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
PDF
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Luciano Resende
 
PPTX
Optimizing your SparkML pipelines using the latest features in Spark 2.3
DataWorks Summit
 
PPTX
AI and Spark - IBM Community AI Day
Nick Pentreath
 
PDF
Deploying End-to-End Deep Learning Pipelines with ONNX
Databricks
 
PPTX
End-to-End Deep Learning Deployment with ONNX
Nick Pentreath
 
PDF
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
PPTX
A leap around AI
Dennis Vroegop
 
PDF
Continuous Deployment for Deep Learning
Databricks
 
PDF
Fabric for Deep Learning
Animesh Singh
 
PPTX
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
PDF
Scaling up Deep Learning by Scaling Down
Databricks
 
PPTX
Scaling up deep learning by scaling down
Nick Pentreath
 
PDF
Introduction to Deep Learning: Concepts, Architectures, and Applications
Amr Rashed
 
PPTX
Intel 20180608 v2
home
 
PPTX
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera, Inc.
 
Open Source AI - News and examples
Luciano Resende
 
IBM Developer Model Asset eXchange - Deep Learning for Everyone
Nick Pentreath
 
IBM Developer Model Asset eXchange
Nick Pentreath
 
Defend against adversarial AI using Adversarial Robustness Toolbox
Animesh Singh
 
Inteligencia artificial, open source e IBM Call for Code
Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
Luciano Resende
 
Optimizing your SparkML pipelines using the latest features in Spark 2.3
DataWorks Summit
 
AI and Spark - IBM Community AI Day
Nick Pentreath
 
Deploying End-to-End Deep Learning Pipelines with ONNX
Databricks
 
End-to-End Deep Learning Deployment with ONNX
Nick Pentreath
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Maurice Nsabimana
 
A leap around AI
Dennis Vroegop
 
Continuous Deployment for Deep Learning
Databricks
 
Fabric for Deep Learning
Animesh Singh
 
Automatic Attendace using convolutional neural network Face Recognition
vatsal199567
 
Scaling up Deep Learning by Scaling Down
Databricks
 
Scaling up deep learning by scaling down
Nick Pentreath
 
Introduction to Deep Learning: Concepts, Architectures, and Applications
Amr Rashed
 
Intel 20180608 v2
home
 
Spark and Deep Learning Frameworks at Scale 7.19.18
Cloudera, Inc.
 
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
Databricks
 
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
PPT
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 2
Databricks
 
PPTX
Data Lakehouse Symposium | Day 4
Databricks
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
Democratizing Data Quality Through a Centralized Platform
Databricks
 
PDF
Learn to Use Databricks for Data Science
Databricks
 
PDF
Why APM Is Not the Same As ML Monitoring
Databricks
 
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
PDF
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
PDF
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
PDF
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
PDF
Massive Data Processing in Adobe Using Delta Lake
Databricks
 
DW Migration Webinar-March 2022.pptx
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Databricks
 
Data Lakehouse Symposium | Day 2
Databricks
 
Data Lakehouse Symposium | Day 4
Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Learn to Use Databricks for Data Science
Databricks
 
Why APM Is Not the Same As ML Monitoring
Databricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Databricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Databricks
 
Re-imagine Data Monitoring with whylogs and Spark
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Massive Data Processing in Adobe Using Delta Lake
Databricks
 

Recently uploaded (20)

PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PDF
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPTX
Web dev -ppt that helps us understand web technology
shubhragoyal12
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
World-population.pptx fire bunberbpeople
umutunsalnsl4402
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
short term project on AI Driven Data Analytics
JMJCollegeComputerde
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
202501214233242351219 QASS Session 2.pdf
lauramejiamillan
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
Fuzzy_Membership_Functions_Presentation.pptx
pythoncrazy2024
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Fluvial_Civilizations_Presentation (1).pptx
alisslovemendoza7
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
IP_Journal_Articles_2025IP_Journal_Articles_2025
mishell212144
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Web dev -ppt that helps us understand web technology
shubhragoyal12
 

Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer with Fred Reiss and Vijay Bommireddipalli

  • 1. Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer Fred Reiss Vijay Bommireddipalli IBM Center for Open-Source Data & AI Technologies (http://codait.org) 1
  • 2. 2 IBM’s history of strong AI leadership 1997: Deep Blue • Deep Blue became the first machine to beat a world chess champion in tournament play 2011: Jeopardy! • Watson beat two top Jeopardy! champions 1968, 2001: A Space Odyssey • IBM was a technical advisor • HAL is “the latest in machine intelligence” 2018: Open Tech, AI & emerging standards • New IBM centers of gravity for AI • OS projects increasing exponentially • Emerging global standards in AI
  • 3. 3 Center for Open Source Data and AI Technologies • CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise • Relaunch of the Spark Technology Center (STC) to reflect expanded mission • codait (French) • = coder/coded • https://m.interglot.co m/fr/en/codait • CODAIT • codait.org
  • 4. 4 CODAIT by the numb3rs • The team contributes to over 10 open source projects. These projects include - Spark, Tensorflow, Keras, SystemML, Arrow, Bahir, Toree, Livy, Zeppelin, R4ML, Stocator, Jupyter Enterprise Gateway • 17 committers and many contributors in Apache projects- Spark, Arrow, systemML, Bahir, Toree, Livy • Over 980 JIRAs and 50,000 lines of code committed to Apache Spark itself, and Over 65,000 LoC into SystemML – Established IBM as the number 1 contributor to Spark Machine Learning in Spark 2.0 release • Over 25 product lines within IBM leveraging Apache Spark in some form or another. CODAIT engineers have interacted and interlocked with many of them. • Speakers at over 100 conferences, MeetUps, un- conferences etc. • codait (French) • = coder/coded • https://m.interglot.co m/fr/en/codait Spark code contribution growth by week • CODAIT • codait.org
  • 5. Improving the Enterprise AI lifecycle in Open Source 5 Center for Open Source Data and AI Technologies • Code - Build and improve practical frameworks to enable more developers to realize immediate value (e.g. FfDL, Tensorflow Jupyter, Spark) • Content – Showcase solutions to complex and real world AI problems • Community – Bring developers and data scientists to engage with IBM (e.g. MAX) • codait (French) • = coder/coded • https://m.interglot.co m/fr/en/codait Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow CODAIT codait.org
  • 6. 6 Fabric for Deep Learning https://github.com/IBM/FfDL https://www.youtube.com/watch?v=nQsY WmkfLP4 • FfDL provides a scalable, resilient, and fault tolerant deep-learning framework • Fabric for Deep Learning or FfDL (pronounced as ‘fiddle’) is an open source project which aims at making Deep Learning easily accessible to the people it matters the most i.e. Data Scientists, and AI developers. • FfDL Provides a consistent way to deploy, train and visualize Deep Learning jobs across multiple frameworks like TensorFlow, Caffe, PyTorch, Keras etc. • FfDL is being developed in close collaboration with IBM Research and IBM Watson. It forms the core of Watson`s Deep Learning service in open source. • FfDL Github Page https://github.com/IBM/FfDL FfDL dwOpen Page https://developer.ibm.com/code/open/proj ects/fabric-for-deep-learning-ffdl/ FfDL Announcement Blog http://developer.ibm.com/code/2018/03/20 /fabric-for-deep-learning FfDL Technical Architecture Blog http://developer.ibm.com/code/2018/03/20 /democratize-ai-with-fabric-for-deep- learning Deep Learning as a Service within Watson Studio https://www.ibm.com/cloud/deep-learning • Research paper: “Scalable Multi- Framework Management of Deep Learning Training Jobs” http://learningsys.org/nips17/assets/paper s/paper_29.pdf • FfDL
  • 7. Jupyter Enterprise Gateway March 30 2018 / © 2018 IBM Corporation • Jupyter Enterprise Gateway at IBM Code • https://developer.ibm.com/code/openprojects/jupyter-enterprise-gateway/ • Jupyter Enterprise Gateway source code at GitHub • https://github.com/jupyter-incubator/enterprise_gateway • Jupyter Enterprise Gateway Documentation • http://jupyter-enterprise-gateway.readthedocs.io/en/latest/ 7 • A lightweight, multi-tenant, scalable and secure gateway that enables Jupyter Notebooks to share resources across an Apache Spark or Kubernetes cluster for Enterprise/Cloud use cases
  • 8. Road Map • Background: Deep Learning Models • The IBM Code Model Asset Exchange • Demo • What’s Next 8
  • 9. CODAIT: Enabling End-to-End AI in the Enterprise 9 Gather Data Analyze Data Machine Learning Deep Learning Deploy Model Maintain Model Python Data Science Stack Fabric for Deep Learning (FfDL) Mleap + PFA Scikit-LearnPandas Apache Spark Apache Spark Jupyter Model Asset eXchange Keras + Tensorflow
  • 10. 10 Making AI as Ubiquitous as the Telephone
  • 11. 11 This talk is about enabling domain experts to use deep learning in the enterprise. Q: What is deep learning? A: Machine learning using deep neural networks. Q: What is a deep neural network? A: A neural network with multiple hidden layers.
  • 12. 12 Q: What is a neural network?
  • 13. What is a neural network? 13 ! = #$%+ &$'+ &$( x1 x2 x3 y a b c Linear regression
  • 14. What is a neural network? Multiple linear regressions at the same time 14 x1 x2 x3 y3 y1 y4 y2
  • 15. What is a neural network? 15 Dense (3×4) Dense (4×2) Input (3) Output (2) Same network in a more compact notation Multilayer Perceptron Neural Network Second layer of linear regressions
  • 16. 16 Dense (3×4) Dense (4×2) Input (3) Output (2) Q: What is a deep neural network? A: A neural network with multiple hidden layers.
  • 17. Q: What is a deep neural network? A: A neural network with multiple hidden layers. 17 Dense (3×8) Dense (8×6) Input (3) Output (2)Dense (6×4) Dense (4×2)
  • 18. Q: What is deep learning? A: Machine learning using deep neural networks. 18 InceptionV3 Convolutional Neural Net (A “medium-sized” deep learning model) Image Source: https://github.com/tensorflow/models/blob/master/research/inception/g3doc /inception_v3_architecture.png
  • 19. Characteristics of Deep Learning (1) • State-of-the-Art prediction quality in many domains – Image classification – Machine translation – Facial recognition – Time series prediction – Many more 19
  • 20. Characteristics of Deep Learning (2) • Large, complex models – Model size generally determined by “how big a model can you fit on your device?” 20 Each box ≈ between 32 and 768 linear regression models InceptionV3 Convolutional Neural Net (A “medium-sized” deep learning model)
  • 21. Characteristics of Deep Learning (3) Poorly understood today …even by experts – Why do the models converge? – Why do the models converge with low loss? – Why do the models generalize? 21
  • 22. Focus of this Talk Incorporating well- understood deep learning models into enterprise applications. 22
  • 24. “cat” The Parts of a Deep Learning Model 24 Dense (3×8) Dense (8×6) Input (3) Output (2)Dense (6×4) Dense (4×2) Neural Network Graph Weights (not to scale) Driver Program
  • 25. Example: Get an Image Classifier Step 1: Find a suitable neural network graph. – Need to read some papers 25
  • 26. Example: Get an Image Classifier Step 2: Find code to generate the neural network graph 26 TensorFlow code to build ResNet50 neural network graph
  • 27. Example: Get an Image Classifier Step 3: Find some pre- trained weights for your graph 27 Caffe2 ResNet50 model weights* * Caffe2 only. Find a different binary file if your framework is not Caffe2.
  • 28. Example: Get an Image Classifier Step 4: Find example code that performs model inference 28 TensorFlow code for training and batch inference* on ResNet50 * Single-crop inference only. Additional code required to use multiple crops.
  • 29. Example: Get an Image Classifier Step 5: Write your own code to perform model inference on one image at a time Step 6: Package your inference code, graph creation code, and pre-trained weights together Step 7: Deploy your package 29
  • 30. Model Marketplaces • Collections of well-understood deep learning models • Provide a central place to find known-good implementations of these models 30
  • 31. The IBM Code Model Asset eXchange • Free, open-source models. • Wide variety of domains. • Multiple deep learning frameworks. • Vetted and tested code and IP. • Build and deploy a web service in 30 seconds. • Start training on Watson Studio in minutes. 31
  • 33. Model Asset eXchange: Summary • Free, open-source models. • Wide variety of domains. • Multiple deep learning frameworks. • Vetted and tested code and IP. • Build and deploy a web service in 30 seconds. • Start training on Watson Studio in minutes. 33
  • 34. Model Asset eXchange: What’s Next • More models • More deployment options • Code Patterns showing how to use the models (including today’s demo!) 34
  • 35. 35 Thank you! • http://codait.org • https://developer.ibm.com/code/ exchanges/models/ • github.com/codait • developer.ibm.com/code
  • 36. Call for Code inspires developers to solve pressing global problems with sustainable software solutions, delivering on their vast potential to do good. Bringing together NGOs, academic institutions, enterprises, and startup developers to compete build effective disaster mitigation solutions, with a focus on health and well-being. International Federation of Red Cross/Red Crescent, The American Red Cross, and the United Nations Office of Human Rights combine for the Call for Code Award to elevate the profile of developers. Award winners will receive long-term support through open source foundations, financial prizes, the opportunity to present their solution to leading VCs, and will deploy their solution through IBM’s Corporate Service Corps. Developers will jump-start their project with dedicated IBM Code Patterns, combined with optional enterprise technology to build projects over the course of three months. Judged by the world’s most renowned technologists, the grand prize will be presented in October at an Award Event. developer.ibm.com/callforcode
  • 37. Date, Time, Location & Duration Session title and Speaker Tue, June 5 | 11 AM 2010-2012, 30 mins Productionizing Spark ML Pipelines with the Portable Format for Analytics Nick Pentreath (IBM) Tue, June 5 | 2 PM 2018, 30 mins Making PySpark Amazing—From Faster UDFs to Dependency Management and Graphing! Holden Karau (Google) Bryan Cutler (IBM) Tue, June 5 | 2 PM Nook by 2001, 30 mins Making Data and AI Accessible for All Armand Ruiz Gabernet (IBM) Tue, June 5 | 2:40 PM 2002-2004, 30 mins Cognitive Database: An Apache Spark-Based AI-Enabled Relational Database System Rajesh Bordawekar (IBM T.J. Watson Research Center) Tue, June 5 | 3:20 PM 3016-3022, 30 mins Dynamic Priorities for Apache Spark Application’s Resource Allocations Michael Feiman (IBM Spectrum Computing) Shinnosuke Okada (IBM Canada Ltd.) Tue, June 5 | 3:20 PM 2001-2005, 30 mins Model Parallelism in Spark ML Cross-Validation Nick Pentreath (IBM) Bryan Cutler (IBM) Tue, June 5 | 3:20 PM 2007, 30 mins Serverless Machine Learning on Modern Hardware Using Apache Spark Patrick Stuedi (IBM) Tue, June 5 | 5:40 PM 2002-2004, 30 mins Create a Loyal Customer Base by Knowing Their Personality Using AI-Based Personality Recommendation Engine; Sourav Mazumder (IBM Analytics) Aradhna Tiwari (University of South Florida) Tue, June 5 | 5:40 PM 2007, 30 mins Transparent GPU Exploitation on Apache Spark Dr. Kazuaki Ishizaki (IBM) Madhusudanan Kandasamy (IBM) Tue, June 5 | 5:40 PM 2009-2011, 30 mins Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for Deep Neural Networks Yonggang Hu (IBM) Chao Xue (IBM) IBM Sessions at Spark+AI Summit 2018 (Tuesday, June 5) 37
  • 38. Date, Time, Location & Duration Session title and Speaker Wed, June 6 | 12:50 PM Birds of a Feather: Apache Arrow in Spark and More Bryan Cutler (IBM) Li Jin (Two Sigma Investments, LP) Wed, June 6 | 2 PM 2002-2004, 30 mins Deep Learning for Recommender Systems Nick Pentreath (IBM) ) Wed, June 6 | 3:20 PM 2018, 30 mins Bringing an AI Ecosystem to the Domain Expert and Enterprise AI Developer Frederick Reiss (IBM) Vijay Bommireddipalli (IBM Center for Open-Source Data & AI Technologies) IBM Sessions at Spark+AI Summit 2018 (Wednesday, June 6) 38 Meet us at IBM booth in the Expo area.
  • 40. 40