SlideShare a Scribd company logo
Using Docker Containers for
Scientific Environments — On-
Premises and in the Cloud
Sergey Yakubov, Martin Gasthuber, Birgit Lewendel
KEK, Tsukuba, 18.10.2017
Page 2
Contents
Introduction
Scientific environments on-premises
• IT-Managed containers
• Custom user containers
Scientific environments in hybrid clouds
• HNSciCloud project
• Using cloud to extend local resources
Conclusions and outlook
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Page 3
Introduction
• Batch farm (HTCondor) – see talk by T. Finnern
• HPC cluster Maxwell (SLURM)
• Large storage, fast network and CPUs
• 12,000 cores, Infiniband, 76 TB memory, 3.3 PB storage
• Used mostly for offline data analyses/numerical simulations
• But also for online analyses (more in the future)
• Docker containers
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Compute resources at DESY
Page 4
Introduction
• Using Docker container technology we can create environments that allow to:
• separate IT and user requirements/dependencies
• separate responsibilities - IT focus on scaling and container template construction,
physicist on application development
• provide compute resources dynamically and quickly, whether on top of existing local
resources or in the cloud
• control provisioned resources - storage, CPUs, memory, networks, …
• Can we do this with OpenStack & Co? Probably yes, but …
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Containerized scientific environments
Page 5
Scientific environments on-premises
• A Dockerfile is created by IT/ group admins (e.g. Debian image with software for a
specific experiment) and stored as Puppet resource
• Puppet automatically creates an image on Dockerfile changes and pushes it to DESY’s
Docker registry
• Compute resources are reserved via SLURM
• At a specified time SLURM job starts Docker containers on each of the allocated
compute nodes with sshd daemon.
• Users with corresponding rights can login and do their work.
IT-Managed Containers
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
admin
admin user
ssh
Page 6
Scientific environments on-premises
• User submits a SLURM job script with Docker commands
• Compute resources are allocated via SLURM
• SLURM execute specified Docker containers on each of the allocated compute nodes
• Any Docker images can be used
• Docker authorization plugin takes care about security.
Custom user containers
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
user
Page 7
Scientific environments on-premises
Example - SIMEX
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
SimEx - photon science simulation platform
https://github.com/eucall-software/simex_platform
Page 8
Scientific environments on-premises
Example - SIMEX
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
X-ray wavefront propagation calculator
• Propagation of light through optical elements
• Utilizes SRW (Synchrotron Radiation Workshop) library
• C++ core + python wrappers
• Hybrid OpenMP/MPI parallelization
0
2
4
6
8
10
12
14
0 10 20 30 40
Speed-up
N cores
Threads x MPI
processes
Number
of nodes
Total time Time/file
1x1 1 11h 1031 s
40x1 1 65 min 98 s
4x10 4 7.5 min 45 s
8x5 8 4.2 min 51 s
Single source file 40 source files
Page 9
Scientific environments on-premises
Example - SIMEX
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
X-ray wavefront propagation calculator
• Propagation of light through optical elements
• Utilizes SRW (Synchrotron Radiation Workshop) library
• C++ core + python wrappers
• Hybrid OpenMP/MPI parallelization
0
2
4
6
8
10
12
14
0 10 20 30 40
Speed-up
N cores
Threads x MPI
processes
Number
of nodes
Total time Time/file
1x1 1 11h 1031 s
40x1 1 65 min 98 s
4x10 4 7.5 min 45 s
8x5 8 4.2 min 51 s
Single source file 40 source files
160x speed-up
Page 10
Helix Nebula Science Cloud
Joint Pre-Commercial Procurement
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Procurers: CERN, CNRS, DESY, EMBL-
EBI, ESRF, IFAE, INFN, KIT, STFC,
SURFSara
Experts: Trust-IT & EGI.eu
The group of procurers have committed
• Procurement funds
• Manpower for testing/evaluation
• Use-cases with applications & data
• In-house IT resources
Resulting services will be made available
to end-users from many research
communities
Co-funded via H2020 Grant Agreement
687614
Total procurement budget >5M€
* Thanks to the CERN IT Group for the provided HNSciCloud slides
*
Page 11
Helix Nebula Science Cloud
• Compute and storage
• support a range of virtual machine and container configurations including HPC working
with datasets in the petabyte range
• Transparent Data Access
• provide transparent for user on-premise’s data access from the cloud
• Network connectivity
• provide high-end network capacity via GEANT for the whole platform
• Federated Identity Management
• provide common identity and access management
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
*
Technical challenges
Page 12
Helix Nebula Science Cloud
Preparation
• Analysis of requirements,
current market offers and
relevant standards
• Build stakeholder group
• Develop tender material
Implementation & Sharing
Jan’16 Dec’18
Each	step	is	competitive - only	contractors	that	successfully	
complete	the	previous	step	can	bid	in	the	next
4	Designs
3	Prototypes
2	Pilots
Call-off	
Feb’17
Call-off	
Dec’17
Tender	
Jul’16
We are here
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
*
Project phases
Page 13
Scientific environments in hybrid clouds
Resources, Fast network,
Transparent Data Access from
HNSciCloud and SLURM
Elastic Computing
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
control node
compute nodes
Using cloud to extend local resources
Page 14
Scientific environments in hybrid clouds
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
cloud
compute nodes
Using cloud to extend local resources
control node
compute nodes
Page 15
Scientific environments in hybrid clouds
test.sh
Example
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
#!/bin/sh
#SBATCH --partition=cloudXXX
#SBATCH --workdir=/test_id
#SBATCH --nodes=1
id –u > cloud_id.txt
dockerrun centos:7 id –u > cloud_docker_id.txt
local-node$ sbatch test.sh
local-node$ id –u
12345
local-node$ cat cloud_id.txt
12345
local-node$ cat cloud_docker_id.txt
12345
Page 16
Conclusions and outlook
Containerized scientific environment
• Implemented via Docker
• Isolates work of different users/groups
• Same performance as on underlying infrastructure
• Portable
• More user experience to be gained
Hybrid clouds
• Dynamical cloud resource allocation/deallocation
• Transparent to the user
• user submits job to local scheduler
• transparent data access from the cloud
• thanks to Docker no need to install user software on the cloud VM
• Performance to be tested
| Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
Thank you for you attention!

More Related Content

PDF
Collaboration with NSFCloud
Ed Dodds
 
PDF
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebula Project
 
PDF
Elastic multicore scheduling with the XiTAO runtime
Miquel Pericas
 
PDF
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebula Project
 
PDF
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebula Project
 
PDF
Disaggregating Ceph using NVMeoF
ShapeBlue
 
PPTX
Webinar: Building a multi-cloud Kubernetes storage on GitLab
MayaData Inc
 
PDF
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
Lakmal Warusawithana
 
Collaboration with NSFCloud
Ed Dodds
 
OpenNebulaConf2017EU: Transforming an Old Supercomputer into a Cloud Platform...
OpenNebula Project
 
Elastic multicore scheduling with the XiTAO runtime
Miquel Pericas
 
OpenNebulaConf2017EU: Enabling Dev and Infra teams by Lodewijk De Schuyter,De...
OpenNebula Project
 
OpenNebulaConf2017EU: Welcome Talk State and Future of OpenNebula by Ignacio ...
OpenNebula Project
 
Disaggregating Ceph using NVMeoF
ShapeBlue
 
Webinar: Building a multi-cloud Kubernetes storage on GitLab
MayaData Inc
 
WSO2 Virtual Hackathon Big Data in the Cloud Case Study
Lakmal Warusawithana
 

What's hot (20)

PDF
ACACES 2019: Towards Energy Efficient Deep Learning
LEGATO project
 
PDF
OpenNebula Conf: 2014 | Lightning talk: Managing Docker Containers with OpenN...
NETWAYS
 
PPTX
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
DataWorks Summit
 
PDF
NSF CAC Cloud Interoperability Testbed Projects
Alan Sill
 
PDF
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
Ceph Community
 
PDF
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red_Hat_Storage
 
PDF
Scaling clusters to thousands of servers in the cloud
TechExeter
 
PDF
CEPH DAY BERLIN - CEPH ON THE BRAIN!
Ceph Community
 
PDF
CERN Batch in the HNSciCloud
Helix Nebula The Science Cloud
 
PPTX
MW2014 Art in the Clouds Alexander+Krause
Niki Krause
 
PPTX
ADS Team 8 Final Presentation
Pranay Mankad
 
PPTX
Mining and Managing Large-scale Linked Open Data
MOVING Project
 
PDF
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
OpenStack
 
PPTX
Ceph and OpenStack - Feb 2014
Ian Colle
 
PDF
OpenNebula Conf 2014 | State and future of OpenNebula - Ignacio Llorente
NETWAYS
 
PDF
DSD-INT 2019 Parallelization project for the USGS - Verkaik
Deltares
 
PPT
New web service oriented ARC
Ferenc Szalai
 
PDF
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
Edge AI and Vision Alliance
 
PDF
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
OW2
 
PDF
Using Cinder Block Storage
Red_Hat_Storage
 
ACACES 2019: Towards Energy Efficient Deep Learning
LEGATO project
 
OpenNebula Conf: 2014 | Lightning talk: Managing Docker Containers with OpenN...
NETWAYS
 
Drive Smarter Decisions with Hadoop and Windows Azure HDInsight
DataWorks Summit
 
NSF CAC Cloud Interoperability Testbed Projects
Alan Sill
 
CEPH DAY BERLIN - DEPLOYING CEPH IN KUBERNETES WITH ROOK
Ceph Community
 
Red Hat Storage Day Dallas - Storage for OpenShift Containers
Red_Hat_Storage
 
Scaling clusters to thousands of servers in the cloud
TechExeter
 
CEPH DAY BERLIN - CEPH ON THE BRAIN!
Ceph Community
 
CERN Batch in the HNSciCloud
Helix Nebula The Science Cloud
 
MW2014 Art in the Clouds Alexander+Krause
Niki Krause
 
ADS Team 8 Final Presentation
Pranay Mankad
 
Mining and Managing Large-scale Linked Open Data
MOVING Project
 
The Future of Cloud Software Defined Storage with Ceph: Andrew Hatfield, Red Hat
OpenStack
 
Ceph and OpenStack - Feb 2014
Ian Colle
 
OpenNebula Conf 2014 | State and future of OpenNebula - Ignacio Llorente
NETWAYS
 
DSD-INT 2019 Parallelization project for the USGS - Verkaik
Deltares
 
New web service oriented ARC
Ferenc Szalai
 
"Portable Performance via the OpenVX Computer Vision Library: Case Studies," ...
Edge AI and Vision Alliance
 
nuage, deployment strategy of a distributed cloud infrastructure, OW2con'15, ...
OW2
 
Using Cinder Block Storage
Red_Hat_Storage
 
Ad

Similar to Using Docker containers for scientific environments - on-premises and in the cloud - HEPiX 2017 (20)

PDF
HNSciCloud update @ the World LHC Computing Grid deployment board
Helix Nebula The Science Cloud
 
PPTX
Hybrid Cloud for CERN
Helix Nebula The Science Cloud
 
PPTX
Federated Cloud Computing
David Wallom
 
PPTX
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Patrick McGarry
 
PDF
Team 01 using geo dcat ap specification for sharing metadata in geoss and ins...
plan4all
 
PPT
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
Sven Schlarb
 
PPTX
20150924 rda federation_v1
Tim Bell
 
PDF
Cloud Infrastructure
Kamruddin Nur
 
PDF
NECOS Industrial Workshop lightning talk by Marcos Felipe Schwarz (RNP)
Christian Esteve Rothenberg
 
PPTX
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Jisc
 
PDF
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
inside-BigData.com
 
PDF
BDE SC3.3 Workshop - BDE Platform: Technical overview
BigData_Europe
 
PDF
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Databricks
 
PDF
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
Dina Goldshtein
 
PPTX
Opening the Path to Technical Excellence
NETWAYS
 
PPTX
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebula Project
 
PDF
CloudLab Overview
Ed Dodds
 
PDF
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Oleg Nenashev
 
PDF
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
PDF
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
HNSciCloud update @ the World LHC Computing Grid deployment board
Helix Nebula The Science Cloud
 
Hybrid Cloud for CERN
Helix Nebula The Science Cloud
 
Federated Cloud Computing
David Wallom
 
Ceph, Open Source, and the Path to Ubiquity in Storage - AACS Meetup 2014
Patrick McGarry
 
Team 01 using geo dcat ap specification for sharing metadata in geoss and ins...
plan4all
 
SCAPE Presentation at the Elag2013 conference in Gent/Belgium
Sven Schlarb
 
20150924 rda federation_v1
Tim Bell
 
Cloud Infrastructure
Kamruddin Nur
 
NECOS Industrial Workshop lightning talk by Marcos Felipe Schwarz (RNP)
Christian Esteve Rothenberg
 
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Jisc
 
Using a Widely Distributed Federated Cloud System to Support Multiple Dispara...
inside-BigData.com
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
BigData_Europe
 
Experience of Running Spark on Kubernetes on OpenStack for High Energy Physic...
Databricks
 
ETW - Monitor Anything, Anytime, Anywhere (NDC Oslo 2017)
Dina Goldshtein
 
Opening the Path to Technical Excellence
NETWAYS
 
OpenNebulaConf 2013 - Keynote: Opening the Path to Technical Excellence by Jo...
OpenNebula Project
 
CloudLab Overview
Ed Dodds
 
Jenkins Pipeline @ Scale. Building Automation Frameworks for Systems Integration
Oleg Nenashev
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...
Radhika Puthiyetath
 
Ad

More from Helix Nebula The Science Cloud (20)

PDF
M-PIL-3.2 Public Session
Helix Nebula The Science Cloud
 
PDF
Deep Learning for Fast Simulation
Helix Nebula The Science Cloud
 
PDF
Interactive Data Analysis for End Users on HN Science Cloud
Helix Nebula The Science Cloud
 
PDF
Container Federation Use Cases
Helix Nebula The Science Cloud
 
PDF
LHCb on RHEA and T-Systems
Helix Nebula The Science Cloud
 
PDF
HNSciCloud CMS status-report
Helix Nebula The Science Cloud
 
PDF
Helix Nebula Science Cloud usage by ALICE
Helix Nebula The Science Cloud
 
PDF
Hybrid cloud for science
Helix Nebula The Science Cloud
 
PDF
HNSciCloud PILOT PLATFORM OVERVIEW
Helix Nebula The Science Cloud
 
PDF
HNSciCloud Overview
Helix Nebula The Science Cloud
 
PDF
This Helix Nebula Science Cloud Pilot Phase Open Session
Helix Nebula The Science Cloud
 
PDF
Cloud Services for Education - HNSciCloud applied to the UP2U project
Helix Nebula The Science Cloud
 
PDF
Network experiences with Public Cloud Services @ TNC2017
Helix Nebula The Science Cloud
 
PDF
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
Helix Nebula The Science Cloud
 
PDF
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
Helix Nebula The Science Cloud
 
PDF
Pilot phase Award Ceremony - INFN Introduction and welcome
Helix Nebula The Science Cloud
 
PDF
Early adopter group and closing of webinar - João Fernandes (CERN)
Helix Nebula The Science Cloud
 
PDF
HNSciCloud pilot phase - Andrea Chierici (INFN)
Helix Nebula The Science Cloud
 
PDF
Pilot phase Award Ceremony - T-Systems
Helix Nebula The Science Cloud
 
PDF
Pilot phase Award Ceremony - RHEA
Helix Nebula The Science Cloud
 
M-PIL-3.2 Public Session
Helix Nebula The Science Cloud
 
Deep Learning for Fast Simulation
Helix Nebula The Science Cloud
 
Interactive Data Analysis for End Users on HN Science Cloud
Helix Nebula The Science Cloud
 
Container Federation Use Cases
Helix Nebula The Science Cloud
 
LHCb on RHEA and T-Systems
Helix Nebula The Science Cloud
 
HNSciCloud CMS status-report
Helix Nebula The Science Cloud
 
Helix Nebula Science Cloud usage by ALICE
Helix Nebula The Science Cloud
 
Hybrid cloud for science
Helix Nebula The Science Cloud
 
HNSciCloud PILOT PLATFORM OVERVIEW
Helix Nebula The Science Cloud
 
HNSciCloud Overview
Helix Nebula The Science Cloud
 
This Helix Nebula Science Cloud Pilot Phase Open Session
Helix Nebula The Science Cloud
 
Cloud Services for Education - HNSciCloud applied to the UP2U project
Helix Nebula The Science Cloud
 
Network experiences with Public Cloud Services @ TNC2017
Helix Nebula The Science Cloud
 
EOSC in practice - Silvana Muscella (chair EOSC HLEG)
Helix Nebula The Science Cloud
 
Helix Nebula Science Cloud Pilot Phase, 6 February 2018, Bologna, Italy
Helix Nebula The Science Cloud
 
Pilot phase Award Ceremony - INFN Introduction and welcome
Helix Nebula The Science Cloud
 
Early adopter group and closing of webinar - João Fernandes (CERN)
Helix Nebula The Science Cloud
 
HNSciCloud pilot phase - Andrea Chierici (INFN)
Helix Nebula The Science Cloud
 
Pilot phase Award Ceremony - T-Systems
Helix Nebula The Science Cloud
 
Pilot phase Award Ceremony - RHEA
Helix Nebula The Science Cloud
 

Recently uploaded (20)

PDF
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC
 
PPTX
谢尔丹学院毕业证购买|Sheridan文凭不见了怎么办谢尔丹学院成绩单
mookxk3
 
PPTX
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PPT
1965 INDO PAK WAR which Pak will never forget.ppt
sanjaychief112
 
PPTX
Crypto Recovery California Services.pptx
lionsgate network
 
PDF
LB# 820-1889_051-7370_C000.schematic.pdf
matheusalbuquerqueco3
 
PPTX
Different Generation Of Computers .pptx
divcoder9507
 
PPTX
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
PPTX
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
PPTX
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
PDF
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
APNIC
 
PDF
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
PPTX
Microsoft PowerPoint Student PPT slides.pptx
Garleys Putin
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
PPTX
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
renavieramopiquero
 
PPT
Transformaciones de las funciones elementales.ppt
rirosel211
 
PPTX
How tech helps people in the modern era.
upadhyayaryan154
 
PPTX
Pengenalan perangkat Jaringan komputer pada teknik jaringan komputer dan tele...
Prayudha3
 
PPTX
原版北不列颠哥伦比亚大学毕业证文凭UNBC成绩单2025年新版在线制作学位证书
e7nw4o4
 
BGP Security Best Practices that Matter, presented at PHNOG 2025
APNIC
 
谢尔丹学院毕业证购买|Sheridan文凭不见了怎么办谢尔丹学院成绩单
mookxk3
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
1965 INDO PAK WAR which Pak will never forget.ppt
sanjaychief112
 
Crypto Recovery California Services.pptx
lionsgate network
 
LB# 820-1889_051-7370_C000.schematic.pdf
matheusalbuquerqueco3
 
Different Generation Of Computers .pptx
divcoder9507
 
The Internet of Things (IoT) refers to a vast network of interconnected devic...
chethana8182
 
Artificial-Intelligence-in-Daily-Life (2).pptx
nidhigoswami335
 
办理方法西班牙假毕业证蒙德拉贡大学成绩单MULetter文凭样本
xxxihn4u
 
APNIC Update, presented at PHNOG 2025 by Shane Hermoso
APNIC
 
LOGENVIDAD DANNYFGRETRRTTRRRTRRRRRRRRR.pdf
juan456ytpro
 
Microsoft PowerPoint Student PPT slides.pptx
Garleys Putin
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Data Protection & Resilience in Focus.pdf
AmyPoblete3
 
LESSON-2-Roles-of-ICT-in-Teaching-for-learning_123922 (1).pptx
renavieramopiquero
 
Transformaciones de las funciones elementales.ppt
rirosel211
 
How tech helps people in the modern era.
upadhyayaryan154
 
Pengenalan perangkat Jaringan komputer pada teknik jaringan komputer dan tele...
Prayudha3
 
原版北不列颠哥伦比亚大学毕业证文凭UNBC成绩单2025年新版在线制作学位证书
e7nw4o4
 

Using Docker containers for scientific environments - on-premises and in the cloud - HEPiX 2017

  • 1. Using Docker Containers for Scientific Environments — On- Premises and in the Cloud Sergey Yakubov, Martin Gasthuber, Birgit Lewendel KEK, Tsukuba, 18.10.2017
  • 2. Page 2 Contents Introduction Scientific environments on-premises • IT-Managed containers • Custom user containers Scientific environments in hybrid clouds • HNSciCloud project • Using cloud to extend local resources Conclusions and outlook | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
  • 3. Page 3 Introduction • Batch farm (HTCondor) – see talk by T. Finnern • HPC cluster Maxwell (SLURM) • Large storage, fast network and CPUs • 12,000 cores, Infiniband, 76 TB memory, 3.3 PB storage • Used mostly for offline data analyses/numerical simulations • But also for online analyses (more in the future) • Docker containers | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba Compute resources at DESY
  • 4. Page 4 Introduction • Using Docker container technology we can create environments that allow to: • separate IT and user requirements/dependencies • separate responsibilities - IT focus on scaling and container template construction, physicist on application development • provide compute resources dynamically and quickly, whether on top of existing local resources or in the cloud • control provisioned resources - storage, CPUs, memory, networks, … • Can we do this with OpenStack & Co? Probably yes, but … | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba Containerized scientific environments
  • 5. Page 5 Scientific environments on-premises • A Dockerfile is created by IT/ group admins (e.g. Debian image with software for a specific experiment) and stored as Puppet resource • Puppet automatically creates an image on Dockerfile changes and pushes it to DESY’s Docker registry • Compute resources are reserved via SLURM • At a specified time SLURM job starts Docker containers on each of the allocated compute nodes with sshd daemon. • Users with corresponding rights can login and do their work. IT-Managed Containers | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba admin admin user ssh
  • 6. Page 6 Scientific environments on-premises • User submits a SLURM job script with Docker commands • Compute resources are allocated via SLURM • SLURM execute specified Docker containers on each of the allocated compute nodes • Any Docker images can be used • Docker authorization plugin takes care about security. Custom user containers | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba user
  • 7. Page 7 Scientific environments on-premises Example - SIMEX | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba SimEx - photon science simulation platform https://github.com/eucall-software/simex_platform
  • 8. Page 8 Scientific environments on-premises Example - SIMEX | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba X-ray wavefront propagation calculator • Propagation of light through optical elements • Utilizes SRW (Synchrotron Radiation Workshop) library • C++ core + python wrappers • Hybrid OpenMP/MPI parallelization 0 2 4 6 8 10 12 14 0 10 20 30 40 Speed-up N cores Threads x MPI processes Number of nodes Total time Time/file 1x1 1 11h 1031 s 40x1 1 65 min 98 s 4x10 4 7.5 min 45 s 8x5 8 4.2 min 51 s Single source file 40 source files
  • 9. Page 9 Scientific environments on-premises Example - SIMEX | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba X-ray wavefront propagation calculator • Propagation of light through optical elements • Utilizes SRW (Synchrotron Radiation Workshop) library • C++ core + python wrappers • Hybrid OpenMP/MPI parallelization 0 2 4 6 8 10 12 14 0 10 20 30 40 Speed-up N cores Threads x MPI processes Number of nodes Total time Time/file 1x1 1 11h 1031 s 40x1 1 65 min 98 s 4x10 4 7.5 min 45 s 8x5 8 4.2 min 51 s Single source file 40 source files 160x speed-up
  • 10. Page 10 Helix Nebula Science Cloud Joint Pre-Commercial Procurement | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba Procurers: CERN, CNRS, DESY, EMBL- EBI, ESRF, IFAE, INFN, KIT, STFC, SURFSara Experts: Trust-IT & EGI.eu The group of procurers have committed • Procurement funds • Manpower for testing/evaluation • Use-cases with applications & data • In-house IT resources Resulting services will be made available to end-users from many research communities Co-funded via H2020 Grant Agreement 687614 Total procurement budget >5M€ * Thanks to the CERN IT Group for the provided HNSciCloud slides *
  • 11. Page 11 Helix Nebula Science Cloud • Compute and storage • support a range of virtual machine and container configurations including HPC working with datasets in the petabyte range • Transparent Data Access • provide transparent for user on-premise’s data access from the cloud • Network connectivity • provide high-end network capacity via GEANT for the whole platform • Federated Identity Management • provide common identity and access management | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba * Technical challenges
  • 12. Page 12 Helix Nebula Science Cloud Preparation • Analysis of requirements, current market offers and relevant standards • Build stakeholder group • Develop tender material Implementation & Sharing Jan’16 Dec’18 Each step is competitive - only contractors that successfully complete the previous step can bid in the next 4 Designs 3 Prototypes 2 Pilots Call-off Feb’17 Call-off Dec’17 Tender Jul’16 We are here | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba * Project phases
  • 13. Page 13 Scientific environments in hybrid clouds Resources, Fast network, Transparent Data Access from HNSciCloud and SLURM Elastic Computing | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba control node compute nodes Using cloud to extend local resources
  • 14. Page 14 Scientific environments in hybrid clouds | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba cloud compute nodes Using cloud to extend local resources control node compute nodes
  • 15. Page 15 Scientific environments in hybrid clouds test.sh Example | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba #!/bin/sh #SBATCH --partition=cloudXXX #SBATCH --workdir=/test_id #SBATCH --nodes=1 id –u > cloud_id.txt dockerrun centos:7 id –u > cloud_docker_id.txt local-node$ sbatch test.sh local-node$ id –u 12345 local-node$ cat cloud_id.txt 12345 local-node$ cat cloud_docker_id.txt 12345
  • 16. Page 16 Conclusions and outlook Containerized scientific environment • Implemented via Docker • Isolates work of different users/groups • Same performance as on underlying infrastructure • Portable • More user experience to be gained Hybrid clouds • Dynamical cloud resource allocation/deallocation • Transparent to the user • user submits job to local scheduler • transparent data access from the cloud • thanks to Docker no need to install user software on the cloud VM • Performance to be tested | Sergey Yakubov | 18.10.2017 | Hepix Fall 2017 | KEK, Tsukuba
  • 17. Thank you for you attention!