SlideShare a Scribd company logo
Why Scala for Data
Science?
HELLO!
I am Guglielmo
Iozzia
I am here because I love AI and the
With the Best conference series
You can follow me at
@GuglielmoIozzia
2
Something about me
✘ Big Data Delivery Lead at
(UHG)
✘ Previously at and of the UN
✘ Current fields of expertise are Big
Data, ML/DL and DevOps
✘ Author of the upcoming book “Hands-
on Deep Learning with Apache Spark”
✘ I love preparing
home-made pizza3
What is Scala?
Let’s get everyone on the same
page
The Scala PL
Scala is a programming language
that blends object-oriented and
functional programming concepts on
the JVM.
5
Functional Programming
✘ In FP you write pure functions.
✘ Given the same input, a function
always return the same output,
producing no side effect.
✘ A function is first-class: it can be used
like any other type.
✘ That means that it can be assigned to
a variable, passed as a parameter to
another function or returned by a
function.6
Place your screenshot here
Functional Programming
in Scala
An example of
functional
programming in
Scala.
7
Why Scala for Data
Science?
Let’s move towards the main topic
of this talk
The Python’s Temptation
When it comes to Data Science the first programming
language people take into consideration is Python.
9
Here are three valid reasons to
consider Scala.
10
#1 Robustness
Robustness and performance when it
comes to production system and
large datasets.
11
#2 Integration
Most part of the systems/tools in the
Big Data/ML space run on the JVM.
12
Think about these systems
you most probably have in
your production tech stack.
They all run in JVMs.
13
#3 Libraries
Good availability of ready to
production Open Source ML/DL
frameworks and libraries.
14
Scala Open Source Projects for AI/ML/DL
✘ Spark MLlib: Spark’s library for ML
algorithms, feature extraction,
dimensionality reduction, linear
algebra, etc.
✘ ND4J: a linear algebra and matrix
manipulation library which supports n-
dimensional arrays and it is integrated
with Apache Hadoop and Spark.
15
Scala Open Source Projects for AI/ML/DL
✘ DeepLearning4J: a distributed deep-
learning framework written for Java
and Scala. It is integrated with Hadoop
and Apache Spark, for use on
distributed GPUs and CPUs.
✘ BigDL: a distributed deep learning
framework for Apache Spark, created
at Intel.
16
Scala Open Source Projects for AI/ML/DL
✘ XGBoost: a scalable, portable and
distributed Gradient Boosting library.
✘ PredictionIO: an Apache template
system for creating machine learning
engines.
✘ Smile: a fast and comprehensive
machine learning system.
✘ Saddle: a high-performance data
manipulation library.17
Scala Open Source Projects for AI/ML/DL
✘ Deeplearning.scala: a simple library
for creating complex neural networks.
It can be used either in standalone
JVM applications or Jupyter
Notebooks.
✘ ScalaNLP: a suite of ML and
numerical computing libraries. It
includes Breeze and Epic.
18
Code Examples
Let’s get practical!
object Nd4JScalaSample {
def main (args: Array[String]) {
// Create arrays using the numpy syntax
var arr1 = Nd4j.create(4)
val arr2 = Nd4j.linspace(1, 10, 10)
// Fill an array with the value 5 (equivalent to fill method in numpy)
println(arr1.assign(5) + "Assigned value of 5 to the array")
// Basic stats methods
println(Nd4j.mean(arr1) + "Calculate mean of array")
println(Nd4j.std(arr2) + "Calculate standard deviation of array")
println(Nd4j.`var`(arr2), "Calculate variance")
...
ND4J Example
ND4J tries to fill the
gap between JVM
languages and
Python
programmers in
terms of availability
of powerful data
analysis tools.
20
Place your screenshot here
DL4J Example (1 of 3)
Multilayer Neural
Network
configuration in
Scala with DL4J.
21
Place your screenshot here
DL4J Example (2 of 3)
Network
initialization and
training in Scala
with DL4J.
22
Place your screenshot here
DL4J Example (3 of 3)
The DL4J web UI
(training time).
23
Can Scala and Python
co-exist in Data Science
projects?
Is there any bridge between this
two worlds?
139,000
The result of a search on Google about MNN models
implemented through Tensorflow
8,330,000
The result of a generic search on Google about models
implemented through Tensorflow
120,000
The result of a search on Google about MNN examples
implemented through Tensorflow
25
Tensorflow Pros and Cons
✘ Big community
✘ Lots of models, example and use
cases available
✘ Stunning features
Mostly Python. The Java API is currently
experimental and is not covered by the
TensorFlow API stability guarantees.
26
Keras to the Rescue
✘ It is an open source neural network
library written in Python
✘ It can run on top of TensorFlow (and
other backend engines)
✘ Easy prototyping
✘ Lightweight
✘ Can be used to import Python models
to DL4J
27
TensorFlow + Keras + DL4J
28
Place your screenshot here
Importing Keras Models
into DL4J: example
DL4J provides
Java/Scala API to
import a pre-trained
TensorFlow model
through Keras.
29
Place your screenshot here
Importing Keras Models
into DL4J: example
The imported model
can then be used in
a DL4J application
implemented
through Java or
Scala only.
30
Conclusion
Bridging the Gap between Data
Engineers and Data Scientists
The Missing Link
Data Engineers
• Scala/Java skills
and experience
• Hands-on Big Data
and Streaming tools
(Hadoop, HBase,
Spark, Kafka, Beam,
etc.)
• DevOps mindset
• Attention on testing,
performance,
scalability
• Containerization
• Often no skills in
ML/DL
Data Scientist
• Strong ML/DL skills
• Python and R users
• Good data
understanding
• Model training and
evaluating strategies
• Probably knowledge
on Big Data and
Streaming tools
• No DevOps mindset
• Research more than
production
32
To Leaverage the Specific Skills of Each Team
DL4J
Keras
TensorFlow
Data Engineers Data Scientists
33
To Leaverage the Specific Skills of Each Team
Keras
Scala
(DL4J)
TensorFlow
(Python)
34
Place your screenshot here
Hands-on Deep Learning
with Apache Spark
More on some topics
covered in this talk
can be found in this
book.
https://tinyurl.com/y9jkvtuy
35
THANK
YOU!
Any questions?
You can find me at
✘ @GuglielmoIozzia
✘ https://ie.linkedin.com/in/giozzia
✘ googlielmo.blogspot.com/
✘ https://dzone.com/users/253294
8/virtualramblas.html
36
Credits
Special thanks to all the people who made
and released these awesome resources for
free:
✘ Presentation template by SlidesCarnival
✘ The painting in slide 9 is a detail of “Eve
Tempted” (1887) by John Roddam
Spencer Stanhope
37

More Related Content

What's hot (20)

PDF
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
PDF
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
PPTX
Hadoop summit 2016
Adam Gibson
 
PDF
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
PDF
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
PDF
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
PPTX
Big data Processing with Apache Spark & Scala
Edureka!
 
PDF
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Spark Summit
 
PDF
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 
PDF
EclairJS = Node.Js + Apache Spark
Jen Aman
 
PDF
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
ODP
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
PDF
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Databricks
 
PDF
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Spark Summit
 
PDF
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
PDF
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit
 
PDF
Scalable Scientific Computing with Dask
Uwe Korn
 
PDF
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Databricks
 
PDF
High Performance Python on Apache Spark
Wes McKinney
 
PDF
Operationalize Apache Spark Analytics
Databricks
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Project Hydrogen: State-of-the-Art Deep Learning on Apache Spark
Databricks
 
Hadoop summit 2016
Adam Gibson
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
Databricks
 
Strata San Jose 2016: Scalable Ensemble Learning with H2O
Sri Ambati
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Jen Aman
 
Big data Processing with Apache Spark & Scala
Edureka!
 
Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spa...
Spark Summit
 
Deep Learning with GPUs in Production - AI By the Bay
Adam Gibson
 
EclairJS = Node.Js + Apache Spark
Jen Aman
 
Spark Summit EU talk by Debasish Das and Pramod Narasimha
Spark Summit
 
Self driving computers active learning workflows with human interpretable ve...
Adam Gibson
 
Productionizing Machine Learning Pipelines with Databricks and Azure ML
Databricks
 
Scaling Apache Spark MLlib to Billions of Parameters: Spark Summit East talk ...
Spark Summit
 
Spark Summit EU talk by Oscar Castaneda
Spark Summit
 
Spark Summit EU talk by Kent Buenaventura and Willaim Lau
Spark Summit
 
Scalable Scientific Computing with Dask
Uwe Korn
 
Benchmark Tests and How-Tos of Convolutional Neural Network on HorovodRunner ...
Databricks
 
High Performance Python on Apache Spark
Wes McKinney
 
Operationalize Apache Spark Analytics
Databricks
 

Similar to Why scala for data science (20)

PDF
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
PDF
DL4J at Workday Meetup
David Kale
 
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
PDF
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
PDF
Bringing Deep Learning into production
Paolo Platter
 
PDF
Deep Learning for Java Developer - Getting Started
Suyash Joshi
 
PDF
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
PPTX
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
PPTX
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
PDF
Hands on image recognition with scala spark and deep learning4j
Guglielmo Iozzia
 
PDF
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
PDF
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
PPSX
Open Source Lambda Architecture for deep learning
Patrick Nicolas
 
PDF
Scala: the unpredicted lingua franca for data science
Andy Petrella
 
PDF
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
PPTX
Creating a Machine Learning Model on the Cloud
Alexander Al Basosi
 
PDF
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Databricks
 
PPTX
Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018
Codemotion
 
PPTX
Building Deep Learning Workflows with DL4J
Josh Patterson
 
PDF
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
Integrating Deep Learning Libraries with Apache Spark
Databricks
 
DL4J at Workday Meetup
David Kale
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Databricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Jen Aman
 
Bringing Deep Learning into production
Paolo Platter
 
Deep Learning for Java Developer - Getting Started
Suyash Joshi
 
Build a deep learning pipeline on apache spark for ads optimization
Craig Chao
 
Combining Machine Learning Frameworks with Apache Spark
Databricks
 
Combining Machine Learning frameworks with Apache Spark
DataWorks Summit/Hadoop Summit
 
Hands on image recognition with scala spark and deep learning4j
Guglielmo Iozzia
 
Jeremy Nixon, Machine Learning Engineer, Spark Technology Center at MLconf AT...
MLconf
 
Austin,TX Meetup presentation tensorflow final oct 26 2017
Clarisse Hedglin
 
Open Source Lambda Architecture for deep learning
Patrick Nicolas
 
Scala: the unpredicted lingua franca for data science
Andy Petrella
 
BigDL webinar - Deep Learning Library for Spark
DESMOND YUEN
 
Creating a Machine Learning Model on the Cloud
Alexander Al Basosi
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease Using Apache Spark
Databricks
 
Emiliano Martinez | Deep learning in Spark Slides | Codemotion Madrid 2018
Codemotion
 
Building Deep Learning Workflows with DL4J
Josh Patterson
 
Machine learning at scale challenges and solutions
Stavros Kontopoulos
 
Ad

Recently uploaded (20)

PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
"Beyond English: Navigating the Challenges of Building a Ukrainian-language R...
Fwdays
 
Biography of Daniel Podor.pdf
Daniel Podor
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Ad

Why scala for data science

  • 1. Why Scala for Data Science?
  • 2. HELLO! I am Guglielmo Iozzia I am here because I love AI and the With the Best conference series You can follow me at @GuglielmoIozzia 2
  • 3. Something about me ✘ Big Data Delivery Lead at (UHG) ✘ Previously at and of the UN ✘ Current fields of expertise are Big Data, ML/DL and DevOps ✘ Author of the upcoming book “Hands- on Deep Learning with Apache Spark” ✘ I love preparing home-made pizza3
  • 4. What is Scala? Let’s get everyone on the same page
  • 5. The Scala PL Scala is a programming language that blends object-oriented and functional programming concepts on the JVM. 5
  • 6. Functional Programming ✘ In FP you write pure functions. ✘ Given the same input, a function always return the same output, producing no side effect. ✘ A function is first-class: it can be used like any other type. ✘ That means that it can be assigned to a variable, passed as a parameter to another function or returned by a function.6
  • 7. Place your screenshot here Functional Programming in Scala An example of functional programming in Scala. 7
  • 8. Why Scala for Data Science? Let’s move towards the main topic of this talk
  • 9. The Python’s Temptation When it comes to Data Science the first programming language people take into consideration is Python. 9
  • 10. Here are three valid reasons to consider Scala. 10
  • 11. #1 Robustness Robustness and performance when it comes to production system and large datasets. 11
  • 12. #2 Integration Most part of the systems/tools in the Big Data/ML space run on the JVM. 12
  • 13. Think about these systems you most probably have in your production tech stack. They all run in JVMs. 13
  • 14. #3 Libraries Good availability of ready to production Open Source ML/DL frameworks and libraries. 14
  • 15. Scala Open Source Projects for AI/ML/DL ✘ Spark MLlib: Spark’s library for ML algorithms, feature extraction, dimensionality reduction, linear algebra, etc. ✘ ND4J: a linear algebra and matrix manipulation library which supports n- dimensional arrays and it is integrated with Apache Hadoop and Spark. 15
  • 16. Scala Open Source Projects for AI/ML/DL ✘ DeepLearning4J: a distributed deep- learning framework written for Java and Scala. It is integrated with Hadoop and Apache Spark, for use on distributed GPUs and CPUs. ✘ BigDL: a distributed deep learning framework for Apache Spark, created at Intel. 16
  • 17. Scala Open Source Projects for AI/ML/DL ✘ XGBoost: a scalable, portable and distributed Gradient Boosting library. ✘ PredictionIO: an Apache template system for creating machine learning engines. ✘ Smile: a fast and comprehensive machine learning system. ✘ Saddle: a high-performance data manipulation library.17
  • 18. Scala Open Source Projects for AI/ML/DL ✘ Deeplearning.scala: a simple library for creating complex neural networks. It can be used either in standalone JVM applications or Jupyter Notebooks. ✘ ScalaNLP: a suite of ML and numerical computing libraries. It includes Breeze and Epic. 18
  • 20. object Nd4JScalaSample { def main (args: Array[String]) { // Create arrays using the numpy syntax var arr1 = Nd4j.create(4) val arr2 = Nd4j.linspace(1, 10, 10) // Fill an array with the value 5 (equivalent to fill method in numpy) println(arr1.assign(5) + "Assigned value of 5 to the array") // Basic stats methods println(Nd4j.mean(arr1) + "Calculate mean of array") println(Nd4j.std(arr2) + "Calculate standard deviation of array") println(Nd4j.`var`(arr2), "Calculate variance") ... ND4J Example ND4J tries to fill the gap between JVM languages and Python programmers in terms of availability of powerful data analysis tools. 20
  • 21. Place your screenshot here DL4J Example (1 of 3) Multilayer Neural Network configuration in Scala with DL4J. 21
  • 22. Place your screenshot here DL4J Example (2 of 3) Network initialization and training in Scala with DL4J. 22
  • 23. Place your screenshot here DL4J Example (3 of 3) The DL4J web UI (training time). 23
  • 24. Can Scala and Python co-exist in Data Science projects? Is there any bridge between this two worlds?
  • 25. 139,000 The result of a search on Google about MNN models implemented through Tensorflow 8,330,000 The result of a generic search on Google about models implemented through Tensorflow 120,000 The result of a search on Google about MNN examples implemented through Tensorflow 25
  • 26. Tensorflow Pros and Cons ✘ Big community ✘ Lots of models, example and use cases available ✘ Stunning features Mostly Python. The Java API is currently experimental and is not covered by the TensorFlow API stability guarantees. 26
  • 27. Keras to the Rescue ✘ It is an open source neural network library written in Python ✘ It can run on top of TensorFlow (and other backend engines) ✘ Easy prototyping ✘ Lightweight ✘ Can be used to import Python models to DL4J 27
  • 28. TensorFlow + Keras + DL4J 28
  • 29. Place your screenshot here Importing Keras Models into DL4J: example DL4J provides Java/Scala API to import a pre-trained TensorFlow model through Keras. 29
  • 30. Place your screenshot here Importing Keras Models into DL4J: example The imported model can then be used in a DL4J application implemented through Java or Scala only. 30
  • 31. Conclusion Bridging the Gap between Data Engineers and Data Scientists
  • 32. The Missing Link Data Engineers • Scala/Java skills and experience • Hands-on Big Data and Streaming tools (Hadoop, HBase, Spark, Kafka, Beam, etc.) • DevOps mindset • Attention on testing, performance, scalability • Containerization • Often no skills in ML/DL Data Scientist • Strong ML/DL skills • Python and R users • Good data understanding • Model training and evaluating strategies • Probably knowledge on Big Data and Streaming tools • No DevOps mindset • Research more than production 32
  • 33. To Leaverage the Specific Skills of Each Team DL4J Keras TensorFlow Data Engineers Data Scientists 33
  • 34. To Leaverage the Specific Skills of Each Team Keras Scala (DL4J) TensorFlow (Python) 34
  • 35. Place your screenshot here Hands-on Deep Learning with Apache Spark More on some topics covered in this talk can be found in this book. https://tinyurl.com/y9jkvtuy 35
  • 36. THANK YOU! Any questions? You can find me at ✘ @GuglielmoIozzia ✘ https://ie.linkedin.com/in/giozzia ✘ googlielmo.blogspot.com/ ✘ https://dzone.com/users/253294 8/virtualramblas.html 36
  • 37. Credits Special thanks to all the people who made and released these awesome resources for free: ✘ Presentation template by SlidesCarnival ✘ The painting in slide 9 is a detail of “Eve Tempted” (1887) by John Roddam Spencer Stanhope 37