SlideShare a Scribd company logo
DATA SCIENCE UND MACHINE
LEARNING IM KUBERNETES-
ÖKOSYSTEM
Hans-Peter Zorn, Stefan Igel Heidelberg, 26. September 2018
● Use-case: Analyse von bildgebender
Massenspektronomie
● Data Science Workflows & ML Plattformen
● K8S als Basis für ML Plattformen
● Tools & Komponenten für DS-Workflows
● Ausblick
Agenda
› Expertensystem zur
Qualitätsbewertung und Auswertung
3-dimensionaler Massenspektroskopiedaten
› F&E-Projekt von
Hochschule Mannheim
und inovex
› Laufzeit:
01.11.2017 - 31.10.2019
Use Case: EMQ
Projekt Setup
Data acquisition
4 von x
Image Sources:
Nature Reviews Cancer 10, 639-646 09/2010
Molecular Oncology 4, Issue 6, 529-538 12/2010
Bruker Rapiflex
MALDI-TOF/TOF
Mass spectrometer
Kidney tissue
slice
Microscopic
image
Typical applications
• Clinical diagnostic
• Pharmaceutical monitoring
• Histological research
MALDI Mass Spectrometry
Basic workflow & application
5 von x
MSI Datacubes
A state of the art MALDI-imaging dataset comprises a huge amount of spectra (up to 100k
spectra) with each raw spectrum representing intensities (usually 10k – 100k) of small m/z bins
and describing up to hundreds of different molecules.
Data generation time: sample preparation (30 – 90 min), data acquisition (2 pixels / sec ~ 14 h,
currently with the next generation MALDI system up to 50 pixels / sec ~ 30 – 50 min), Data analysis
(~ 1 h) → Total time ~ 2 – 3.5 h / tissue sample.
Jones, Emrys A., et al. Journal of proteomics 75.16 (2012): 4962-4989.
1. support data science team processes
2. democratization of data
3. democratization of machine learning
Data Science / Machine Learning Plattformen
Ziel: Professionalisieren von Data Science
› Scalable
› Reliable
› Reproducible
› Easy-to-use
› Flexible
› Automated
› Offline and online
Data Science / Machine Learning Plattformen
unterstützen Machine Learning Workflows:
https://eng.uber.com/michelangelo/
Manage
Data
Train
Models
Evaluate
Models
Deploy
Models
Make
Predictions
Monitor
Predictions
EMQ Machine Learning Platform
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
EMQ Machine Learning Platform
Runtime Environment
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
Scalable? Sounds like Big Data ...
Is there anything beyond Hadoop?
Linux Kernel
YARN, Zookeeper CoreOS, Kubernetes
HDFS S3, NFS, Ceph, Quobyte, ...
JVM Docker
MapReduce, Tez, Spark, ... Spark, Tensorflow, ...
Hadoop Stack Kubernetes Stack
Distributed Processing
Operating System
Cluster Management
Distributed Storage
Processing Core Unit
HBaseDistributed Serving elastic, Cassandra, Druid, ...
Scalable? Sounds like Big Data ...
Is there anything beyond Hadoop?
Linux Kernel
YARN, Zookeeper CoreOS, Kubernetes
HDFS S3, NFS, Ceph, Quobyte, ...
JVM Docker
MapReduce, Tez, Spark, ... Spark, Tensorflow, ...
Hadoop Stack Kubernetes Stack
Distributed Processing
Operating System
Cluster Management
Distributed Storage
Processing Core Unit
HBaseDistributed Serving elastic, Cassandra, Druid, ...
› everything you need to build and scale
› build, ship and run any app, anywhere
› container orchestration, automated
management, deployment, scaling
› package manager for K8S Apps
Ingredients for K8S Solutions
Bare Metal, Public & Private Cloud
https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
● Meistverbreitetes Containerformat
● Leichtgewichtig
● Resource Limitation
● Verfügbarkeit von Registries
Packaging
Docker, weil…
https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
● Hardware-Abstraktion
● Container Scheduling und Management
● Service Discovery & Networking
● Konfigurationsmanagement
● Monitoring
● Load Balancing
● Rolling upgrades
Deployment
Kubernetes, wegen…
https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
● Paketmanager
● Convenience
● Zahlreiche Vorlagen
● Templating Funktionalität
Dependency Management
Helm, für...
https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
› Infrastructure as Code
› Cloud Provider agnostic
› Software Defined Networking
› Disposable Environments
Continuous Integration
Terraform, weil ...
• Integration mit Gitlab
• Einfach zu definierende
CI-Pipelines
• Integrierte Docker Registry
Continuous Integration
Gitlab-CI, weil
https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
CI / CD Pipeline
https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
18
Gitlab
docker push
git push
helm install
Service
Deployment / Statefull Setkubectl
docker
pull
PodPod
EMQ Machine Learning Platform
Ingest & Store
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
Distributed File System
Ingest & Store
Data Lake
Stream
Processing NoSQL DB
File
Transfer
Runtime Environment
Msg
Online - Streaming
Offline - Batch
NoSQL DB
Kubernetes auf OpenstackKubernetes in der Cloud
Kubernetes neben Hadoop
HDFS Kubernetes
(managed) kubernetes
Kubernetes neben MapR-FS
EMQ Machine Learning Platform
(Pre-)Processing
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
• integrate legacy
algorithms
• different
programming
languages
(C++, R, Python, ...)
• different base images
(Pre-)Processing
Standardized Data Processing
(Pre-)Processing
Orchestrate data processing steps
● reproducible
● flexible
● scalable
(Pre-)Processing
argo Architecture
› Kubernetes API
Erweiterung (CRD)
› Batch Job Pattern
› Data Handling per
Buckets (S3)
EMQ Machine Learning Platform
Explore & Analyze
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
› Jupyter notebooks
› Language of choice (Python, R, Scala, ...
› Notebooks can be shared (git, ...)
› Big data integration (Apache Spark)
› pandas, scikit-learn, ggplot2, TensorFlow
› Jupyter Hub
› Multi-user Hub for Data Science Workgroups
› spawns, manages, and proxies multiple instances of the
single-user Jupyter notebook server.
Train Models
Jupyter Hub
› multi-user Hub (tornado process)
› configurable http proxy
(node-http-proxy)
› multiple single-user Jupyter
notebook servers
(Python/Jupyter/tornado)
› REST API for administration
of the Hub and its users.
Train Models
Jupyter Hub
https://github.com/jupyterhub/jupyterhub https://jupyterhub.readthedocs.io/en/stable/
EMQ Machine Learning Platform
Model Training & Inference
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
› Herbst 2015, Google
› “library for high performance
numerical computation”
› ML/ DL support
› TensorBoard
Deep Learning
https://www.inovex.de/fileadmin/files/Vortraege/2018/skalieren-von-deep-learning-frameworks-m3-26.04.2018.pdf
Tensorflow
› Parameter Server
› multi CPU/ GPU, multi Node
› Infrastruktur:
keine Voraussetzungen
› IP-Adressen/ Hostnamen + Port
Deep Learning
Scaling Tensorflow
Carnegie Mellon University, Baidu, Google: “Scaling Distributed Machine Learning with the Parameter Server” (2014)
Worker Worker Worker
Parameter Server
› Distributed (Deep) Machine Learning Community
(DMLC)
› “A flexible and efficient library for deep learning.”
› Amazons Framework der Wahl
› (TensorBoard Support)
Deep Learning
Apache MXNet
https://www.inovex.de/fileadmin/files/Vortraege/2018/skalieren-von-deep-learning-frameworks-m3-26.04.2018.pdf
› verteilter KVStore
› multi CPU/ GPU, multi Node
› Infrastruktur:
SSH / MPI / YARN / SGE
› Hostfile mit
IP-Adressen/ Hostnamen
Deep Learning
Scaling Apache MXNet
T. Chen et al.: “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems”
(2015)
GPU
1
GPU
2
GPU
1
GPU
2
› DevicePlugin installieren
› Base Image: nvidia/cuda
› GPU Ressourcen verwenden
Deep Learning
GPU Support mit Kubernetes
https://www.inovex.de/fileadmin/files/Vortraege/2018/skalieren-von-deep-learning-frameworks-m3-26.04.2018.pdf
1 resources:
2 limits:
3 nvidia.com/gpu: {{ $numGpus }}
3 Ways to run Spark on k8s:
● Spark in standalone mode:
https://github.com/helm/charts/tree/master/stable/spark
● Spark operator on Kubernetes:
https://github.com/GoogleCloudPlatform/spark-on-k8s-operator
● Using spark-submit:
https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html
Train Models
Distributed Machine Learning
spark-submit:
● Spark creates a Spark driver
running within a k8s pod.
● The driver creates executors
running within k8s pods, connects
to them, and executes application
code.
Train Models
Distributed Machine Learning
https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html
EMQ Machine Learning Platform
Logging & Monitoring
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
Logging & Monitoring
}
}
}
}
Buffering und
Transformation
Sammeln von Logs
Datenbank
Frontend
Logging & Monitoring
}
}
Sammeln von Metriken
Frontend
} Datenbank
EMQ Machine Learning Platform
Metadata Management
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
● über die Umgebung
● über die Daten
● über die Workflows
● über die Modelle
● über die Fachlichkeit
● ...
Metadata
… Daten über Daten
EMQ Machine Learning Platform
Putting it all together
Explore
(Pre-)
Process
Train
Raw
Data
Ingest
Prep.
Data Set
Training
Set
Infere
Model
Control
Result
MonitoringLogging Metadata
Runtime Environment
› Platform hardening
› Adaption und Erweiterung für neue use-cases
› NLP/Semantische Suche
› IIoT
› Metadaten
› Modell-Management
› Verbreitung
Ausblick
Manage
Data
Train
Models
Evaluat
e
Models
Deploy
Models
Make
Predicti
ons
Monitor
Predicti
ons
› Sebastian Schmidt
› Alexander Grizschancew
› Sebastian Jäger
› Alexander Lontke
› Julien Heitmann
› Marcel Hofmann
› Kevin Exel
› David Waidner
Das Team
… ohne das es das alles bei uns nicht gäbe
› Matthias Schwartz
› Stanislav Frolov
› David Schmidt
› Daniel Bäurer
› Nils Domrose
› Hans-Peter Zorn
› Stefan Igel
Vielen Dank
Hans-Peter Zorn
Head of Machine
Perception & AI
hzorn@inovex.de
Dr. Stefan Igel
Head of Big Data Solutions
sigel@inovex.de

More Related Content

What's hot (20)

PDF
Deep Learning Update May 2016
Frédéric Parienté
 
PPTX
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
PPTX
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
PPTX
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
PDF
State of Containers and the Convergence of HPC and BigData
inside-BigData.com
 
PPTX
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
PDF
Hadoop & Big Data benchmarking
Bart Vandewoestyne
 
PDF
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
 
PPT
Hadoop for Scientific Workloads__HadoopSummit2010
Yahoo Developer Network
 
PDF
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
 
PDF
Fast and Scalable Python
Travis Oliphant
 
PDF
Treasure Data on The YARN - Hadoop Conference Japan 2014
Ryu Kobayashi
 
PDF
May 2013 HUG: HCatalog/Hive Data Out
Yahoo Developer Network
 
PPTX
MapReduce: A useful parallel tool that still has room for improvement
Kyong-Ha Lee
 
PDF
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
 
PDF
Hadoop pig
Wei-Yu Chen
 
PDF
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
 
PPTX
KIISE:SIGDB Workshop presentation.
Kyong-Ha Lee
 
PDF
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
PDF
データ解析技術入門(Hadoop編)
Takumi Asai
 
Deep Learning Update May 2016
Frédéric Parienté
 
Designing HPC, Deep Learning, and Cloud Middleware for Exascale Systems
inside-BigData.com
 
Hadoop Summit Dublin 2016: Hadoop Platform at Yahoo - A Year in Review
Sumeet Singh
 
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
Sumeet Singh
 
State of Containers and the Convergence of HPC and BigData
inside-BigData.com
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Frederic Desprez
 
Hadoop & Big Data benchmarking
Bart Vandewoestyne
 
ABCI: AI Bridging Cloud Infrastructure for Scalable AI/Big Data
Hitoshi Sato
 
Hadoop for Scientific Workloads__HadoopSummit2010
Yahoo Developer Network
 
AI橋渡しクラウド(ABCI)における高性能計算とAI/ビッグデータ処理の融合
Hitoshi Sato
 
Fast and Scalable Python
Travis Oliphant
 
Treasure Data on The YARN - Hadoop Conference Japan 2014
Ryu Kobayashi
 
May 2013 HUG: HCatalog/Hive Data Out
Yahoo Developer Network
 
MapReduce: A useful parallel tool that still has room for improvement
Kyong-Ha Lee
 
20201006_PGconf_Online_Large_Data_Processing
Kohei KaiGai
 
Hadoop pig
Wei-Yu Chen
 
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Hitoshi Sato
 
KIISE:SIGDB Workshop presentation.
Kyong-Ha Lee
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
データ解析技術入門(Hadoop編)
Takumi Asai
 

Similar to Data Science und Machine Learning im Kubernetes-Ökosystem (20)

PDF
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PPTX
Microsoft AI Platform Overview
David Chou
 
PDF
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Big Data Value Association
 
PDF
Building ML Pipelines with DCOS
QAware GmbH
 
PPTX
Innovation with ai at scale on the edge vt sept 2019 v0
Ganesan Narayanasamy
 
PPTX
Designing Artificial Intelligence
David Chou
 
PPTX
Scaling Data Science on Big Data
DataWorks Summit
 
PDF
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
PDF
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
PDF
TensorFlow 16: Building a Data Science Platform
Seldon
 
PDF
NextGenML
Moldovan Radu Adrian
 
PDF
Big Data Advanced Analytics on Microsoft Azure 201904
Mark Tabladillo
 
PDF
Hopsworks - The Platform for Data-Intensive AI
QAware GmbH
 
PDF
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio, Inc.
 
PDF
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio, Inc.
 
PDF
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Databricks
 
PDF
Deep Learning with Apache MXNet
Julien SIMON
 
PDF
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
BESPIN GLOBAL
 
PDF
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Iulian Pintoiu
 
PDF
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Microsoft AI Platform Overview
David Chou
 
ExtremeEarth: Hopsworks, a data-intensive AI platform for Deep Learning with ...
Big Data Value Association
 
Building ML Pipelines with DCOS
QAware GmbH
 
Innovation with ai at scale on the edge vt sept 2019 v0
Ganesan Narayanasamy
 
Designing Artificial Intelligence
David Chou
 
Scaling Data Science on Big Data
DataWorks Summit
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Alluxio, Inc.
 
Hopsworks in the cloud Berlin Buzzwords 2019
Jim Dowling
 
TensorFlow 16: Building a Data Science Platform
Seldon
 
Big Data Advanced Analytics on Microsoft Azure 201904
Mark Tabladillo
 
Hopsworks - The Platform for Data-Intensive AI
QAware GmbH
 
Alluxio Webinar - Maximize GPU Utilization for Model Training
Alluxio, Inc.
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio, Inc.
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Databricks
 
Deep Learning with Apache MXNet
Julien SIMON
 
국내 건설 기계사 도입 사례를 통해 보는 AI가 적용된 수요 예측 관리 - 베스핀글로벌 조창윤 AI/ML팀 팀장
BESPIN GLOBAL
 
Productionizing Machine Learning - Bigdata meetup 5-06-2019
Iulian Pintoiu
 
Hopsworks at Google AI Huddle, Sunnyvale
Jim Dowling
 
Ad

More from inovex GmbH (20)

PDF
lldb – Debugger auf Abwegen
inovex GmbH
 
PDF
Are you sure about that?! Uncertainty Quantification in AI
inovex GmbH
 
PDF
Why natural language is next step in the AI evolution
inovex GmbH
 
PDF
WWDC 2019 Recap
inovex GmbH
 
PDF
Network Policies
inovex GmbH
 
PDF
Interpretable Machine Learning
inovex GmbH
 
PDF
Jenkins X – CI/CD in wolkigen Umgebungen
inovex GmbH
 
PDF
AI auf Edge-Geraeten
inovex GmbH
 
PDF
Prometheus on Kubernetes
inovex GmbH
 
PDF
Deep Learning for Recommender Systems
inovex GmbH
 
PDF
Azure IoT Edge
inovex GmbH
 
PDF
Representation Learning von Zeitreihen
inovex GmbH
 
PDF
Talk to me – Chatbots und digitale Assistenten
inovex GmbH
 
PDF
Künstlich intelligent?
inovex GmbH
 
PDF
Dev + Ops = Go
inovex GmbH
 
PDF
Das Android Open Source Project
inovex GmbH
 
PDF
Machine Learning Interpretability
inovex GmbH
 
PDF
Performance evaluation of GANs in a semisupervised OCR use case
inovex GmbH
 
PDF
People & Products – Lessons learned from the daily IT madness
inovex GmbH
 
PDF
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
inovex GmbH
 
lldb – Debugger auf Abwegen
inovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
inovex GmbH
 
Why natural language is next step in the AI evolution
inovex GmbH
 
WWDC 2019 Recap
inovex GmbH
 
Network Policies
inovex GmbH
 
Interpretable Machine Learning
inovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
inovex GmbH
 
AI auf Edge-Geraeten
inovex GmbH
 
Prometheus on Kubernetes
inovex GmbH
 
Deep Learning for Recommender Systems
inovex GmbH
 
Azure IoT Edge
inovex GmbH
 
Representation Learning von Zeitreihen
inovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
inovex GmbH
 
Künstlich intelligent?
inovex GmbH
 
Dev + Ops = Go
inovex GmbH
 
Das Android Open Source Project
inovex GmbH
 
Machine Learning Interpretability
inovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
inovex GmbH
 
People & Products – Lessons learned from the daily IT madness
inovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
inovex GmbH
 
Ad

Recently uploaded (20)

PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
PDF
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Coefficient of Variance in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Odoo CRM vs Zoho CRM: Honest Comparison 2025
Odiware Technologies Private Limited
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
The 5 Reasons for IT Maintenance - Arna Softech
Arna Softech
 
Empower Your Tech Vision- Why Businesses Prefer to Hire Remote Developers fro...
logixshapers59
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
IObit Driver Booster Pro 12.4.0.585 Crack Free Download
henryc1122g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 

Data Science und Machine Learning im Kubernetes-Ökosystem

  • 1. DATA SCIENCE UND MACHINE LEARNING IM KUBERNETES- ÖKOSYSTEM Hans-Peter Zorn, Stefan Igel Heidelberg, 26. September 2018
  • 2. ● Use-case: Analyse von bildgebender Massenspektronomie ● Data Science Workflows & ML Plattformen ● K8S als Basis für ML Plattformen ● Tools & Komponenten für DS-Workflows ● Ausblick Agenda
  • 3. › Expertensystem zur Qualitätsbewertung und Auswertung 3-dimensionaler Massenspektroskopiedaten › F&E-Projekt von Hochschule Mannheim und inovex › Laufzeit: 01.11.2017 - 31.10.2019 Use Case: EMQ Projekt Setup
  • 4. Data acquisition 4 von x Image Sources: Nature Reviews Cancer 10, 639-646 09/2010 Molecular Oncology 4, Issue 6, 529-538 12/2010 Bruker Rapiflex MALDI-TOF/TOF Mass spectrometer Kidney tissue slice Microscopic image Typical applications • Clinical diagnostic • Pharmaceutical monitoring • Histological research MALDI Mass Spectrometry Basic workflow & application
  • 5. 5 von x MSI Datacubes A state of the art MALDI-imaging dataset comprises a huge amount of spectra (up to 100k spectra) with each raw spectrum representing intensities (usually 10k – 100k) of small m/z bins and describing up to hundreds of different molecules. Data generation time: sample preparation (30 – 90 min), data acquisition (2 pixels / sec ~ 14 h, currently with the next generation MALDI system up to 50 pixels / sec ~ 30 – 50 min), Data analysis (~ 1 h) → Total time ~ 2 – 3.5 h / tissue sample. Jones, Emrys A., et al. Journal of proteomics 75.16 (2012): 4962-4989.
  • 6. 1. support data science team processes 2. democratization of data 3. democratization of machine learning Data Science / Machine Learning Plattformen Ziel: Professionalisieren von Data Science
  • 7. › Scalable › Reliable › Reproducible › Easy-to-use › Flexible › Automated › Offline and online Data Science / Machine Learning Plattformen unterstützen Machine Learning Workflows: https://eng.uber.com/michelangelo/ Manage Data Train Models Evaluate Models Deploy Models Make Predictions Monitor Predictions
  • 8. EMQ Machine Learning Platform Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 9. EMQ Machine Learning Platform Runtime Environment Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 10. Scalable? Sounds like Big Data ... Is there anything beyond Hadoop? Linux Kernel YARN, Zookeeper CoreOS, Kubernetes HDFS S3, NFS, Ceph, Quobyte, ... JVM Docker MapReduce, Tez, Spark, ... Spark, Tensorflow, ... Hadoop Stack Kubernetes Stack Distributed Processing Operating System Cluster Management Distributed Storage Processing Core Unit HBaseDistributed Serving elastic, Cassandra, Druid, ...
  • 11. Scalable? Sounds like Big Data ... Is there anything beyond Hadoop? Linux Kernel YARN, Zookeeper CoreOS, Kubernetes HDFS S3, NFS, Ceph, Quobyte, ... JVM Docker MapReduce, Tez, Spark, ... Spark, Tensorflow, ... Hadoop Stack Kubernetes Stack Distributed Processing Operating System Cluster Management Distributed Storage Processing Core Unit HBaseDistributed Serving elastic, Cassandra, Druid, ...
  • 12. › everything you need to build and scale › build, ship and run any app, anywhere › container orchestration, automated management, deployment, scaling › package manager for K8S Apps Ingredients for K8S Solutions Bare Metal, Public & Private Cloud https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
  • 13. ● Meistverbreitetes Containerformat ● Leichtgewichtig ● Resource Limitation ● Verfügbarkeit von Registries Packaging Docker, weil… https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
  • 14. ● Hardware-Abstraktion ● Container Scheduling und Management ● Service Discovery & Networking ● Konfigurationsmanagement ● Monitoring ● Load Balancing ● Rolling upgrades Deployment Kubernetes, wegen… https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
  • 15. ● Paketmanager ● Convenience ● Zahlreiche Vorlagen ● Templating Funktionalität Dependency Management Helm, für... https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
  • 16. › Infrastructure as Code › Cloud Provider agnostic › Software Defined Networking › Disposable Environments Continuous Integration Terraform, weil ...
  • 17. • Integration mit Gitlab • Einfach zu definierende CI-Pipelines • Integrierte Docker Registry Continuous Integration Gitlab-CI, weil https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf
  • 18. CI / CD Pipeline https://www.inovex.de/fileadmin/files/Vortraege/2017/big-data-in-der-cloud-zorn-kreiling-29.09.2017.pdf 18 Gitlab docker push git push helm install Service Deployment / Statefull Setkubectl docker pull PodPod
  • 19. EMQ Machine Learning Platform Ingest & Store Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 20. Distributed File System Ingest & Store Data Lake Stream Processing NoSQL DB File Transfer Runtime Environment Msg Online - Streaming Offline - Batch NoSQL DB
  • 21. Kubernetes auf OpenstackKubernetes in der Cloud Kubernetes neben Hadoop HDFS Kubernetes (managed) kubernetes Kubernetes neben MapR-FS
  • 22. EMQ Machine Learning Platform (Pre-)Processing Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 23. • integrate legacy algorithms • different programming languages (C++, R, Python, ...) • different base images (Pre-)Processing Standardized Data Processing
  • 24. (Pre-)Processing Orchestrate data processing steps ● reproducible ● flexible ● scalable
  • 25. (Pre-)Processing argo Architecture › Kubernetes API Erweiterung (CRD) › Batch Job Pattern › Data Handling per Buckets (S3)
  • 26. EMQ Machine Learning Platform Explore & Analyze Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 27. › Jupyter notebooks › Language of choice (Python, R, Scala, ... › Notebooks can be shared (git, ...) › Big data integration (Apache Spark) › pandas, scikit-learn, ggplot2, TensorFlow › Jupyter Hub › Multi-user Hub for Data Science Workgroups › spawns, manages, and proxies multiple instances of the single-user Jupyter notebook server. Train Models Jupyter Hub
  • 28. › multi-user Hub (tornado process) › configurable http proxy (node-http-proxy) › multiple single-user Jupyter notebook servers (Python/Jupyter/tornado) › REST API for administration of the Hub and its users. Train Models Jupyter Hub https://github.com/jupyterhub/jupyterhub https://jupyterhub.readthedocs.io/en/stable/
  • 29. EMQ Machine Learning Platform Model Training & Inference Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 30. › Herbst 2015, Google › “library for high performance numerical computation” › ML/ DL support › TensorBoard Deep Learning https://www.inovex.de/fileadmin/files/Vortraege/2018/skalieren-von-deep-learning-frameworks-m3-26.04.2018.pdf Tensorflow
  • 31. › Parameter Server › multi CPU/ GPU, multi Node › Infrastruktur: keine Voraussetzungen › IP-Adressen/ Hostnamen + Port Deep Learning Scaling Tensorflow Carnegie Mellon University, Baidu, Google: “Scaling Distributed Machine Learning with the Parameter Server” (2014) Worker Worker Worker Parameter Server
  • 32. › Distributed (Deep) Machine Learning Community (DMLC) › “A flexible and efficient library for deep learning.” › Amazons Framework der Wahl › (TensorBoard Support) Deep Learning Apache MXNet https://www.inovex.de/fileadmin/files/Vortraege/2018/skalieren-von-deep-learning-frameworks-m3-26.04.2018.pdf
  • 33. › verteilter KVStore › multi CPU/ GPU, multi Node › Infrastruktur: SSH / MPI / YARN / SGE › Hostfile mit IP-Adressen/ Hostnamen Deep Learning Scaling Apache MXNet T. Chen et al.: “MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems” (2015) GPU 1 GPU 2 GPU 1 GPU 2
  • 34. › DevicePlugin installieren › Base Image: nvidia/cuda › GPU Ressourcen verwenden Deep Learning GPU Support mit Kubernetes https://www.inovex.de/fileadmin/files/Vortraege/2018/skalieren-von-deep-learning-frameworks-m3-26.04.2018.pdf 1 resources: 2 limits: 3 nvidia.com/gpu: {{ $numGpus }}
  • 35. 3 Ways to run Spark on k8s: ● Spark in standalone mode: https://github.com/helm/charts/tree/master/stable/spark ● Spark operator on Kubernetes: https://github.com/GoogleCloudPlatform/spark-on-k8s-operator ● Using spark-submit: https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html Train Models Distributed Machine Learning
  • 36. spark-submit: ● Spark creates a Spark driver running within a k8s pod. ● The driver creates executors running within k8s pods, connects to them, and executes application code. Train Models Distributed Machine Learning https://spark.apache.org/docs/2.3.0/running-on-kubernetes.html
  • 37. EMQ Machine Learning Platform Logging & Monitoring Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 38. Logging & Monitoring } } } } Buffering und Transformation Sammeln von Logs Datenbank Frontend
  • 39. Logging & Monitoring } } Sammeln von Metriken Frontend } Datenbank
  • 40. EMQ Machine Learning Platform Metadata Management Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 41. ● über die Umgebung ● über die Daten ● über die Workflows ● über die Modelle ● über die Fachlichkeit ● ... Metadata … Daten über Daten
  • 42. EMQ Machine Learning Platform Putting it all together Explore (Pre-) Process Train Raw Data Ingest Prep. Data Set Training Set Infere Model Control Result MonitoringLogging Metadata Runtime Environment
  • 43. › Platform hardening › Adaption und Erweiterung für neue use-cases › NLP/Semantische Suche › IIoT › Metadaten › Modell-Management › Verbreitung Ausblick Manage Data Train Models Evaluat e Models Deploy Models Make Predicti ons Monitor Predicti ons
  • 44. › Sebastian Schmidt › Alexander Grizschancew › Sebastian Jäger › Alexander Lontke › Julien Heitmann › Marcel Hofmann › Kevin Exel › David Waidner Das Team … ohne das es das alles bei uns nicht gäbe › Matthias Schwartz › Stanislav Frolov › David Schmidt › Daniel Bäurer › Nils Domrose › Hans-Peter Zorn › Stefan Igel
  • 45. Vielen Dank Hans-Peter Zorn Head of Machine Perception & AI [email protected] Dr. Stefan Igel Head of Big Data Solutions [email protected]