SlideShare a Scribd company logo
Bringing Deep Learning into production
Brief introduction
• CTO & co-founder of Agile Lab
• Data & Tech addicted
• Contributor of Spark Notebook
• Spark early adopter
• Certified Cassandra Architect
• DeepLearning enthusiast
Who is Agile Lab ?
GO BIG (data) or GO HOME
http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/
What we do
Applications
High scalability
Decision Support
Systems
data engineering, data mining and data
«meaning»
Big Data Strategies
Training
Reactive, NoSQL, Big Data, Machine
learning
Why Deep Learning
Deep Learning is trending
What is Deep Learning
• Deep learning is just another name for artificial neural networks
• An algorithm is deep if the input is passed through several non-li
nearity before being output
• Deep learning is discovering the features that best represent the
problem, rather than just a way to combine them
Deep Learning: Use cases
Do you want start with Deep
Learning ?
Let’s choose the right tools !!
Deep Learning Frameworks
• Deeplearning4J
• TensorFlow
• Caffe
• Theano
• Torch
• Spark ML MultilayerPerceptrons
• H2O
• CNTK
• MatLab
• maxDNN
And many others
How to choose
Background
Target Environment
Vision
Background
Productivity !!
• Scala
• Java
Big Data
Engineer
• Java
• Python
Math
Engineer
• R
• Python
Statistician
Target Environment
Trained model should
be deployable !! Trained
Model
Dev Env
Prod Env
Target Environment
Prod Env Dev Env
Training
Data
Cleaning
ETLScheduling
ML Pipeline
- Track model performance over time
- Care about SLA
- Continous tweaks
Enterprise Architecture
HADOOP
Online
DataStore
Enterprise Service BUS
DataIntegrationLayer
Data Integration Layer
DataIntegrationLayer
External
Sources
ANALYTICS
VALUE
ADDED
SERVICES
API
SERVICES
Internal
Business
Sources
Internal
System
Sources
DeepLearning
Easy Wins
Training pipeline should run
on Spark or Hadoop
Trained Model should be
represented in Java objects
Vision: keep in mind Scaling
High Level dynamic languages
are incredibly productive for
prototyping and data exploration
Scaling on larger data sets
quickly runs into performance
limitations
Keep in mind scaling
requirements from beginning
Vision: simplify the pipeline
Copy & Sample data from Dev Env to Data
Scientist Env
Prototype in Python or R
Train model
Predict on validation Data
Translate Model to match Prod Env 
Java, MapReduce, Spark
Deploy training pipeline and model
Easy Wins
Datascientists should work
directly on distributed
environment
Datascientist and big data
engineers should co-operate
on the same platform
SWOT Analysis
Tensor Flow
Strenghts:
- Powered By Google
- Nice UI
Weaknesses:
- Powered By Google
- No support for “inline” matrix operations
 Slow
Opportunities:
- Awesome community
Threats:
- No Scala or Java integration
- No commercial support
Theano
Strenghts:
- Grand Daddy of deep learning
- RNN and CNN
- Computational graph abstraction
- Python
Weaknesses:
- No support for Hadoop or Spark
- No plug & play nets
Opportunities:
- Great community
Threats:
- No Scala or Java integration
- No commercial support
Torch
Strenghts:
- GPU support
- Lots of pretrained models and packages
- Easy to use
Weaknesses:
- Lua language
Opportunities:
- Backed by DeepMind and Facebook
Threats:
- No Scala or Java integration
- No commercial support
Caffè
Strenghts:
- C++ & Python
- Good Performance
- GPU Support
Weaknesses:
- Focused on image processing
Opportunities:
- Backed by Yahoo for Spark integration
- Gpu Clustering
Threats:
- No commercial support
DeepLearning4j
Strenghts:
- GPU support
- Java and Scala
- Full DNN set
- Support Hadoop, Spark & Akka
Weaknesses:
- Not for dummies
Opportunities:
- Commercial support - SkyMind
Threats:
- Not so sexy for DataScientist because of
Java/Scala
H2O
• Easy to use Web UI
• Multi language API
• Run directly on HDFS or S3
• Model is Java PoJo
• Big Data Ready
• Really Fast
• Compressed data
• Regularization
• Grid Search
• GPU is still on roadmap
• CNN and RNN too
H2O - Flow
H20 – Sparkling Water
• Python, R and Scala API
• Best Kagglers use H20
• Tons of tools for profiling and tu
ning
• Spark leverage
• Best in class algorithms – battle
tested
• Regolarization
• Grid search
H20 – Sparkling Water
Workflow
POJO Java
Training Set
Embeddable in:
• J2EE App
• Spark Job
• MR Job
• DWH as UDF
training
Spark as middleware
Using Spark as middleware, you can leverage :
• Deeplearning4J
• H2O
• TensorFlow ( Arimo Extension)
• Caffe ( Yahoo Extension )
• ML MultilayerPerceptrons and future implementations
NO tech provider Lock-in
Our Stack for Enterprise
• Ready for Enterprise and Hadoop World
• Deployable into Java Env
• Notebook ( Flow )
• H2O for out of the box algorithms
• DeepLearning 4J for advanced DNN and
n-dimension array manipulation
• Good usability for both DataScientists and
Big Data Engineers
• Enterprise Support along the whole stack
Thanks!
We are hiring !
paolo.platter@agilelab.it

More Related Content

PDF
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
PDF
Deep learning in production with the best
Adam Gibson
 
PDF
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
PDF
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Databricks
 
PDF
Best Practices for Engineering Production-Ready Software with Apache Spark
Databricks
 
PDF
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
PDF
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
PDF
Productionizing Deep Reinforcement Learning with Spark and MLflow
Databricks
 
Agile Lab_BigData_Meetup_AKKA
Paolo Platter
 
Deep learning in production with the best
Adam Gibson
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Query or Not to Query? Using Apache Spark Metrics to Highlight Potentially Pr...
Databricks
 
Best Practices for Engineering Production-Ready Software with Apache Spark
Databricks
 
Tuning ML Models: Scaling, Workflows, and Architecture
Databricks
 
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
Productionizing Deep Reinforcement Learning with Spark and MLflow
Databricks
 

What's hot (20)

PDF
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Databricks
 
PPTX
Skymind Open Power Summit ISV Round Table
Adam Gibson
 
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
PDF
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
Sri Ambati
 
PDF
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
PDF
Machine Learning Pipelines
jeykottalam
 
PDF
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
PDF
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
PDF
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Databricks
 
PPTX
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
PDF
Building an ML Platform with Ray and MLflow
Databricks
 
PDF
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
PDF
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
PDF
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Databricks
 
PDF
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Databricks
 
PDF
Sawtooth Windows for Feature Aggregations
Databricks
 
PDF
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
PDF
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
PDF
Semantic Image Logging Using Approximate Statistics & MLflow
Databricks
 
PDF
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Databricks
 
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Databricks
 
Skymind Open Power Summit ISV Round Table
Adam Gibson
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
Sri Ambati
 
Using PySpark to Process Boat Loads of Data
Robert Dempsey
 
Machine Learning Pipelines
jeykottalam
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
Databricks
 
H2O Rains with Databricks Cloud - NY 02.16.16
Sri Ambati
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Databricks
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Ido Shilon
 
Building an ML Platform with Ray and MLflow
Databricks
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Databricks
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Databricks
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Databricks
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Databricks
 
Sawtooth Windows for Feature Aggregations
Databricks
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
Scalable Automatic Machine Learning in H2O
Sri Ambati
 
Semantic Image Logging Using Approximate Statistics & MLflow
Databricks
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Databricks
 
Ad

Similar to Bringing Deep Learning into production (20)

PPTX
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
PPTX
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Databricks
 
PPTX
Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018
Codemotion
 
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
PDF
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
PPTX
Big Data Introduction - Solix empower
Durga Gadiraju
 
PPTX
A practical guidance of the enterprise machine learning
Jesus Rodriguez
 
PDF
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
 
PDF
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
PDF
Deep learning and Apache Spark
QuantUniversity
 
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Databricks
 
PPTX
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
PPSX
Open Source Lambda Architecture for deep learning
Patrick Nicolas
 
PDF
Power Software Development with Apache Spark
OpenPOWERorg
 
PDF
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
PDF
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
 
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Databricks
 
Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018
Codemotion
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
Big Data Introduction - Solix empower
Durga Gadiraju
 
A practical guidance of the enterprise machine learning
Jesus Rodriguez
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
 
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
Deep learning and Apache Spark
QuantUniversity
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Databricks
 
BigDL Deep Learning in Apache Spark - AWS re:invent 2017
Dave Nielsen
 
Open Source Lambda Architecture for deep learning
Patrick Nicolas
 
Power Software Development with Apache Spark
OpenPOWERorg
 
Very large scale distributed deep learning on BigDL
DESMOND YUEN
 
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Databricks
 
Ad

More from Paolo Platter (11)

PPTX
Witboost Platform for decentralization of data management
Paolo Platter
 
PPTX
Platform Strategy for decentralization.pptx
Paolo Platter
 
PPTX
DAMA Norway - Computational Governance Model
Paolo Platter
 
PPTX
The role of Dremio in a data mesh architecture
Paolo Platter
 
PPTX
Data Mesh Implementation - a practical journey
Paolo Platter
 
PPTX
kafka simplicity and complexity
Paolo Platter
 
PDF
Wasp2 - IoT and Streaming Platform
Paolo Platter
 
PPTX
Meetup tensorframes
Paolo Platter
 
PDF
Agile Lab_BigData_Meetup
Paolo Platter
 
PDF
Massive Streaming Analytics with Spark Streaming
Paolo Platter
 
PDF
Scala Intro
Paolo Platter
 
Witboost Platform for decentralization of data management
Paolo Platter
 
Platform Strategy for decentralization.pptx
Paolo Platter
 
DAMA Norway - Computational Governance Model
Paolo Platter
 
The role of Dremio in a data mesh architecture
Paolo Platter
 
Data Mesh Implementation - a practical journey
Paolo Platter
 
kafka simplicity and complexity
Paolo Platter
 
Wasp2 - IoT and Streaming Platform
Paolo Platter
 
Meetup tensorframes
Paolo Platter
 
Agile Lab_BigData_Meetup
Paolo Platter
 
Massive Streaming Analytics with Spark Streaming
Paolo Platter
 
Scala Intro
Paolo Platter
 

Recently uploaded (20)

PPT
Activate_Methodology_Summary presentatio
annapureddyn
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PDF
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
PDF
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
PPTX
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
PPTX
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
PDF
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
PPTX
Role Of Python In Programing Language.pptx
jaykoshti048
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
PDF
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
PPTX
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
PPTX
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 
Activate_Methodology_Summary presentatio
annapureddyn
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
49784907924775488180_LRN2959_Data_Pump_23ai.pdf
Abilash868456
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
Generating Union types w/ Static Analysis
K. Matthew Dupree
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Summary Of Odoo 18.1 to 18.4 : The Way For Odoo 19
CandidRoot Solutions Private Limited
 
49785682629390197565_LRN3014_Migrating_the_Beast.pdf
Abilash868456
 
The-Dawn-of-AI-Reshaping-Our-World.pptxx
parthbhanushali307
 
AI-Ready Handoff: Auto-Summaries & Draft Emails from MQL to Slack in One Flow
bbedford2
 
An Experience-Based Look at AI Lead Generation Pricing, Features & B2B Results
Thomas albart
 
Role Of Python In Programing Language.pptx
jaykoshti048
 
Bandai Playdia The Book - David Glotz
BluePanther6
 
10 posting ideas for community engagement with AI prompts
Pankaj Taneja
 
TRAVEL APIs | WHITE LABEL TRAVEL API | TOP TRAVEL APIs
philipnathen82
 
Can You Build Dashboards Using Open Source Visualization Tool.pptx
Varsha Nayak
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
Balancing Resource Capacity and Workloads with OnePlan – Avoid Overloading Te...
OnePlan Solutions
 

Bringing Deep Learning into production

  • 2. Brief introduction • CTO & co-founder of Agile Lab • Data & Tech addicted • Contributor of Spark Notebook • Spark early adopter • Certified Cassandra Architect • DeepLearning enthusiast
  • 3. Who is Agile Lab ? GO BIG (data) or GO HOME http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/
  • 4. What we do Applications High scalability Decision Support Systems data engineering, data mining and data «meaning» Big Data Strategies Training Reactive, NoSQL, Big Data, Machine learning
  • 6. Deep Learning is trending
  • 7. What is Deep Learning • Deep learning is just another name for artificial neural networks • An algorithm is deep if the input is passed through several non-li nearity before being output • Deep learning is discovering the features that best represent the problem, rather than just a way to combine them
  • 9. Do you want start with Deep Learning ? Let’s choose the right tools !!
  • 10. Deep Learning Frameworks • Deeplearning4J • TensorFlow • Caffe • Theano • Torch • Spark ML MultilayerPerceptrons • H2O • CNTK • MatLab • maxDNN And many others
  • 11. How to choose Background Target Environment Vision
  • 12. Background Productivity !! • Scala • Java Big Data Engineer • Java • Python Math Engineer • R • Python Statistician
  • 13. Target Environment Trained model should be deployable !! Trained Model Dev Env Prod Env
  • 14. Target Environment Prod Env Dev Env Training Data Cleaning ETLScheduling ML Pipeline - Track model performance over time - Care about SLA - Continous tweaks
  • 15. Enterprise Architecture HADOOP Online DataStore Enterprise Service BUS DataIntegrationLayer Data Integration Layer DataIntegrationLayer External Sources ANALYTICS VALUE ADDED SERVICES API SERVICES Internal Business Sources Internal System Sources DeepLearning
  • 16. Easy Wins Training pipeline should run on Spark or Hadoop Trained Model should be represented in Java objects
  • 17. Vision: keep in mind Scaling High Level dynamic languages are incredibly productive for prototyping and data exploration Scaling on larger data sets quickly runs into performance limitations Keep in mind scaling requirements from beginning
  • 18. Vision: simplify the pipeline Copy & Sample data from Dev Env to Data Scientist Env Prototype in Python or R Train model Predict on validation Data Translate Model to match Prod Env  Java, MapReduce, Spark Deploy training pipeline and model
  • 19. Easy Wins Datascientists should work directly on distributed environment Datascientist and big data engineers should co-operate on the same platform
  • 21. Tensor Flow Strenghts: - Powered By Google - Nice UI Weaknesses: - Powered By Google - No support for “inline” matrix operations  Slow Opportunities: - Awesome community Threats: - No Scala or Java integration - No commercial support
  • 22. Theano Strenghts: - Grand Daddy of deep learning - RNN and CNN - Computational graph abstraction - Python Weaknesses: - No support for Hadoop or Spark - No plug & play nets Opportunities: - Great community Threats: - No Scala or Java integration - No commercial support
  • 23. Torch Strenghts: - GPU support - Lots of pretrained models and packages - Easy to use Weaknesses: - Lua language Opportunities: - Backed by DeepMind and Facebook Threats: - No Scala or Java integration - No commercial support
  • 24. Caffè Strenghts: - C++ & Python - Good Performance - GPU Support Weaknesses: - Focused on image processing Opportunities: - Backed by Yahoo for Spark integration - Gpu Clustering Threats: - No commercial support
  • 25. DeepLearning4j Strenghts: - GPU support - Java and Scala - Full DNN set - Support Hadoop, Spark & Akka Weaknesses: - Not for dummies Opportunities: - Commercial support - SkyMind Threats: - Not so sexy for DataScientist because of Java/Scala
  • 26. H2O • Easy to use Web UI • Multi language API • Run directly on HDFS or S3 • Model is Java PoJo • Big Data Ready • Really Fast • Compressed data • Regularization • Grid Search • GPU is still on roadmap • CNN and RNN too
  • 28. H20 – Sparkling Water • Python, R and Scala API • Best Kagglers use H20 • Tons of tools for profiling and tu ning • Spark leverage • Best in class algorithms – battle tested • Regolarization • Grid search
  • 30. Workflow POJO Java Training Set Embeddable in: • J2EE App • Spark Job • MR Job • DWH as UDF training
  • 31. Spark as middleware Using Spark as middleware, you can leverage : • Deeplearning4J • H2O • TensorFlow ( Arimo Extension) • Caffe ( Yahoo Extension ) • ML MultilayerPerceptrons and future implementations NO tech provider Lock-in
  • 32. Our Stack for Enterprise • Ready for Enterprise and Hadoop World • Deployable into Java Env • Notebook ( Flow ) • H2O for out of the box algorithms • DeepLearning 4J for advanced DNN and n-dimension array manipulation • Good usability for both DataScientists and Big Data Engineers • Enterprise Support along the whole stack