Deploying OpenNebula in
an HPC environment
Alfred Gil
Chief Computational Scientist & Cofounder
OpenNebula Cloud TechDay
Barcelona, May 2019
• HPCNow! company overview
• Motivation
• Architecture
• Implementation
• Conclusions
Quick introduction to HPCNow!
● Global HPC consulting company
● IT + scientific background
● HPC services and solutions
● User-oriented company
● Hardware agnostic
Company overview
System Administrators
and User Support
Top500 Supercomputer Users
Company overview
IISW
Batch scheduler
Slurm, LSF, PBS, Torque, SGE
Cluster manager
sNow!, xCat, Rocks, Bright
Monitoring & alerts tools
Ganglia, Nagios, Icinga, Grafana, Elastic
Search
Parallel file system
BeeGFS, Lustre, GPFS, HDFS, CEPH
Company overview
User environment
User libraries, Modules,
EasyBuild, Spack
Development tools
Compilers: GNU, Intel, PGI, IBM
XL compilers; Debuggers and
profilers: V-Tune, DDT, GDB
Scientific and engineering applications
More than 100 references. Contact us to
know more.
Company overview
Virtualization
OpenNebula, OpenStack, VMware,
Xen-Source
Containers
Singularity, Docker, Docker
Swarm, LXD
Remote visualization
TurboVNC, VirtualGL, Websocket,
DCV, X2Go
HPC Portal
EnginFrame
Company overview
Contributions to HPC Community
Company overview
Public sector Private Companies
Company overview
Partners
HW SW
Company overview
• HPCNow! company overview
• Motivation
• Architecture
• Implementation
• Conclusions
What is High Performance Computing?
Many tasks and/or threads working together to
solve different parts of a single larger problem.
This is achieved with parallel programming, which
usually requires large shared memory systems or
low latency and high bandwidth network.
Motivation
HPC users need more than just compute solution
❅ Workflow: Pre-processing and post-processing, workflow frameworks,...
❅ Web services: RStudio, Galaxy, Jupyter notebook, JMS,...
❅ Software managers: Anaconda, EasyBuild, Spack,...
❅ Prebuilt software: Docker, Singularity, VM image (NeuroDebian,..),...
Motivation
Convergence Solution
HPC Cluster, Singularity, Docker Swarm, OpenNebula
Allows to dynamically re-architect / re-purpose
the HPC solution to accommodate different roles /
user needs.
Motivation
Dynamic Provisioning
Hybrid nodes
Vestibulumcongue
Vestibulum
congue
Vestibulum
congue
Spare
Nodes
OpenNebula
Slurm
DockerSwarm
Use Resource
scontrol update node=X state=RESUME
onehost enable X
docker node update --availability active X
1
Release Resource
scontrol update node=X state=DOWN
onehost offline X
docker node update --availability drain X
2
Motivation
• HPCNow! company overview
• Motivation
• Architecture
• Implementation
• Conclusions
mgmnt compute mgmnt hybrid
storage
Use case
Architecture
mgmnt
Management node
● VM’s (xen)
○ slurm01 slurmctld
○ slurmdb01 slurmdbd
○ ceph01 ceph-deploy
○ oneceph01 oned, sunstone, oneflow, onegate
○ login01
○ ldap01
● exports /home via NFS
Architecture
Global configuration
● OpenNebula v5.6.0
● Ceph v13.2.1 mimic
● Datastore
○ standard ceph configuration
■ cephds type Image
■ ceph_system type System
● Nodes with kvm hypervisor
● NIC’s with virtio model
Architecture
• HPCNow! company overview
• Motivation
• Architecture
• Implementation
• Conclusions
Stumbling blocks along the way
● Snapshots
○ datastore for images configured as raw
■ recommended for ceph using RBD
○ images stored as raw, even created as qcow2
○ snapshot of system disk, and recovering from ceph
■ rbd ls -l -p one
● Bridge destroyed when no virtual NIC linked
○ switch keep_empty_bridge to true in
/var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf
■ bug preventing to transfer config to hypervisors at
/var/tmp/one/etc/vnm/OpenNebulaNetwork.conf
○ create virtual network with PHYDEV unset
one-2-103-0
one-2-103-0@0
one-2-104-0
Implementation
Stumbling blocks along the way
● VM could not communicate with each other
○ switch net.bridge.bridge-nf-call-iptables parameter to 0.
○ tried to do it persistent in /etc/sysctl..d/bridge-nf-call.conf and
/usr/lib/sysctl.d/00-system.conf
■ bug prevents for working, when sysctl runs the bridge kernel
module is not already loaded.
○ fixed by modifying /usr/lib/systemd/system/libvirtd.service
Type=notify
EnvironmentFile=-/etc/sysconfig/libvirtd
ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS
+ExecStartPost=/usr/bin/sleep 30s
+ExecStartPost=/usr/sbin/sysctl -w net.bridge.bridge-nf-call-iptables=0
+ExecStartPost=/usr/sbin/sysctl -p
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
Implementation
Stumbling blocks along the way
● VM creation from Sunstone ended with FAILED status
○ error: Cannot check QEMU binary /usr/bin/qemu-system-x86_64: No such file or directory
■ ln -s /usr/libexec/qemu-kvm /usr/bin/qemu-system-x86_64
Implementation
• HPCNow! company overview
• Motivation
• Architecture
• Conclusions
Conclusions
● We architected and implemented a solution
deploying nodes with hybrid role.
● This solution allows dynamically re-purpose the
cluster to accommodate the user needs.
● OpenNebula has been found to be a really easy
tool to install, deploy and manage.
● Useful tips and collaboration in the forum to
troubleshoot issues.
Conclusions
info@hpcnow.com
www.hpcnow.com
Marie Curie, 8 - 08042 Barcelona (Spain)
34 Fernly Rise, 2019 Auckland (New Zealand)
Barcelona
Auckland

More Related Content

PDF
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
PDF
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
PDF
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
PDF
NetApp Hybrid Cloud with OpenNebula
PDF
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
PDF
Introduction to OpenNebula - Ignacio M. Llorente
PDF
Replacing vCloud with OpenNebula
PDF
Cncf storage-final-filip
OpenNebulaConf2019 - How We Use GOCA to Manage our OpenNebula Cloud - Jean-Ph...
OpenNebulaConf2019 - Image Backups in OpenNebula - Momčilo Medić - ITAF
OpenNebulaConf2019 - Crytek: A Video gaming Edge Implementation "on the shoul...
NetApp Hybrid Cloud with OpenNebula
OpenNebulaConf2019 - CORD and Edge computing with OpenNebula - Alfonso Aureli...
Introduction to OpenNebula - Ignacio M. Llorente
Replacing vCloud with OpenNebula
Cncf storage-final-filip

What's hot (19)

PDF
OpenNebulaConf2017US: Welcome and project update by Ignacio M. Llorente and R...
PDF
CloudOpen 2012 OpenNebula talk
PDF
Welcome talk unleashing the future of open-source enterprise cloud computing
PDF
OpenNebula Conf 2014 | The rOCCI project - a year later - alias OpenNebula in...
PDF
OpenNebula Conf 2014 | Practical experiences with OpenNebula for cloudifying ...
PDF
OpenNebulaConf 2016 - OpenNebula, OpenNebulaConf, OpenNebulaConf 2016
PDF
Policy driven SDN in CloudStack
PDF
Open Source & The Internet of Things
PDF
OpenNebula Conf 2014 | State and future of OpenNebula - Ignacio Llorente
PDF
OpenNebula Conf 2014 | From private cloud to laaS public services for Catalan...
PPTX
Intro to Project Calico: a pure layer 3 approach to scale-out networking
PDF
OpenNebula TechDay Boston 2015 - An introduction to OpenNebula
PPTX
Operators experience and perspective on SDN with VLANs and L3 Networks
PDF
rOCCI – Providing Interoperability through OCCI 1.1 Support for OpenNebula
PDF
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
PDF
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
PDF
Open Stack Days israel Keynote 2017
PDF
Performant and Resilient Storage: The Open Source & Linux Way
PDF
OpenNebulaconf2017US: Software defined networking with OpenNebula by Roy Keen...
OpenNebulaConf2017US: Welcome and project update by Ignacio M. Llorente and R...
CloudOpen 2012 OpenNebula talk
Welcome talk unleashing the future of open-source enterprise cloud computing
OpenNebula Conf 2014 | The rOCCI project - a year later - alias OpenNebula in...
OpenNebula Conf 2014 | Practical experiences with OpenNebula for cloudifying ...
OpenNebulaConf 2016 - OpenNebula, OpenNebulaConf, OpenNebulaConf 2016
Policy driven SDN in CloudStack
Open Source & The Internet of Things
OpenNebula Conf 2014 | State and future of OpenNebula - Ignacio Llorente
OpenNebula Conf 2014 | From private cloud to laaS public services for Catalan...
Intro to Project Calico: a pure layer 3 approach to scale-out networking
OpenNebula TechDay Boston 2015 - An introduction to OpenNebula
Operators experience and perspective on SDN with VLANs and L3 Networks
rOCCI – Providing Interoperability through OCCI 1.1 Support for OpenNebula
OpenNebula Conf 2014 | Cloud Automation for OpenNebula by Kishorekumar Neelam...
OpenNebulaConf2017EU: IPP Cloud by Jimmy Goffaux, IPPON
Open Stack Days israel Keynote 2017
Performant and Resilient Storage: The Open Source & Linux Way
OpenNebulaconf2017US: Software defined networking with OpenNebula by Roy Keen...
Ad

Similar to Deploying OpenNebula in an HPC environment (20)

PDF
OpenNebulaconf2017US: Paying down technical debt with "one" dollar bills by ...
PDF
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
PDF
OSDC 2012 | OpenNebula Tutorial by Constantino Vazquez Blanco
PDF
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
PDF
OSDC 2013 | The OpenNebula Cloud Platform for Datacenter Virtualization by Co...
PDF
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
PDF
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
PDF
OpenNebulaconf2017US: Using docker with OpenNebula by Jaime Melis, OpenNebula
PDF
OpenNebulaConf2015 2.14 Cloud Service Experience in TeideHPC Infrastructure -...
PDF
nebulaconf
PDF
Getting Started Hacking OpenNebula - Fosdem-2013
PPTX
Cloud using opennebulla for research work.pptx
PPTX
Galera on kubernetes_no_video
PPTX
Docker Machine and Swarm on OpenNebula - Jaime Melis
PDF
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
PDF
TechDay - April - OpenNebula and Docker
PDF
OpenNebula out in the Open, Ander Astudillo, SURFsara
PDF
State of Containers and the Convergence of HPC and BigData
PDF
PDF
Cloud Computing in practice with OpenNebula ~ Develer workshop 2012
OpenNebulaconf2017US: Paying down technical debt with "one" dollar bills by ...
TechDay - Cambridge 2016 - OpenNebula at Harvard Univerity
OSDC 2012 | OpenNebula Tutorial by Constantino Vazquez Blanco
OpenNebulaConf 2016 - Hypervisors and Containers Hands-on Workshop by Jaime M...
OSDC 2013 | The OpenNebula Cloud Platform for Datacenter Virtualization by Co...
ISC Cloud'13 - Hands-On Tutorial on “Building Your Cloud for HPC, Here & Now,...
OpenNebulaConf2015 2.06 OpenNebula in the Wild - Ander Astudillo
OpenNebulaconf2017US: Using docker with OpenNebula by Jaime Melis, OpenNebula
OpenNebulaConf2015 2.14 Cloud Service Experience in TeideHPC Infrastructure -...
nebulaconf
Getting Started Hacking OpenNebula - Fosdem-2013
Cloud using opennebulla for research work.pptx
Galera on kubernetes_no_video
Docker Machine and Swarm on OpenNebula - Jaime Melis
OpenNebulaConf 2014 - Dynamic virtual private clusters with OpenNebula and SG...
TechDay - April - OpenNebula and Docker
OpenNebula out in the Open, Ander Astudillo, SURFsara
State of Containers and the Convergence of HPC and BigData
Cloud Computing in practice with OpenNebula ~ Develer workshop 2012
Ad

More from OpenNebula Project (20)

PDF
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
PDF
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
PDF
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
PDF
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
PDF
NTS: What We Do With OpenNebula - and Why We Do It
PDF
OpenNebula from the Perspective of an ISP
PDF
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
PPTX
NSX with OpenNebula - upcoming 5.10
PDF
Security for Private Cloud Environments
PDF
CheckPoint R80.30 Installation on OpenNebula
PDF
DE-CIX: CloudConnectivity
PDF
PDF
Cloud Disaggregation with OpenNebula
PDF
OpenNebula and StorPool: Building Powerful Clouds
PDF
Nested virtualization & PCI pass-through
PDF
A Data Pro - Project Serendipity
PDF
The UNICORE Project: Unikraft and OpenNebula
PDF
Rancher Labs - Your own PaaS in action
PDF
Huawei - All-Flash Innovation
PDF
OpenNebula LXD Overview
OpenNebulaConf2019 - Welcome and Project Update - Ignacio M. Llorente, Rubén ...
OpenNebulaConf2019 - Building Virtual Environments for Security Analyses of C...
OpenNebulaConf2019 - 6 years (+) OpenNebula - Lessons learned - Sebastian Man...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
NTS: What We Do With OpenNebula - and Why We Do It
OpenNebula from the Perspective of an ISP
NTS CAPTAIN / OpenNebula at Julius Blum GmbH
NSX with OpenNebula - upcoming 5.10
Security for Private Cloud Environments
CheckPoint R80.30 Installation on OpenNebula
DE-CIX: CloudConnectivity
Cloud Disaggregation with OpenNebula
OpenNebula and StorPool: Building Powerful Clouds
Nested virtualization & PCI pass-through
A Data Pro - Project Serendipity
The UNICORE Project: Unikraft and OpenNebula
Rancher Labs - Your own PaaS in action
Huawei - All-Flash Innovation
OpenNebula LXD Overview

Recently uploaded (20)

PDF
Engineering Document Management System (EDMS)
PPTX
Chapter_05_System Modeling for software engineering
PDF
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
PPTX
Human-Computer Interaction for Lecture 1
PDF
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
PPTX
SmartGit 25.1 Crack + (100% Working) License Key
PPTX
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
PPTX
Human-Computer Interaction for Lecture 2
PPTX
A Spider Diagram, also known as a Radial Diagram or Mind Map.
PDF
MAGIX Sound Forge Pro CrackSerial Key Keygen
PPTX
Why 2025 Is the Best Year to Hire Software Developers in India
PDF
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
PDF
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
PPTX
ROI from Efficient Content & Campaign Management in the Digital Media Industry
PPTX
Human Computer Interaction lecture Chapter 2.pptx
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
PDF
AI-Powered Fuzz Testing: The Future of QA
PPTX
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
PDF
Mobile App for Guard Tour and Reporting.pdf
PDF
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer
Engineering Document Management System (EDMS)
Chapter_05_System Modeling for software engineering
Sanket Mhaiskar Resume - Senior Software Engineer (Backend, AI)
Human-Computer Interaction for Lecture 1
MiniTool Power Data Recovery 12.6 Crack + Portable (Latest Version 2025)
SmartGit 25.1 Crack + (100% Working) License Key
Post-Migration Optimization Playbook: Getting the Most Out of Your New Adobe ...
Human-Computer Interaction for Lecture 2
A Spider Diagram, also known as a Radial Diagram or Mind Map.
MAGIX Sound Forge Pro CrackSerial Key Keygen
Why 2025 Is the Best Year to Hire Software Developers in India
PDF-XChange Editor Plus 10.7.0.398.0 Crack Free Download Latest 2025
SOFTWARE ENGINEERING Software Engineering (3rd Edition) by K.K. Aggarwal & Yo...
ROI from Efficient Content & Campaign Management in the Digital Media Industry
Human Computer Interaction lecture Chapter 2.pptx
Understanding the Need for Systemic Change in Open Source Through Intersectio...
AI-Powered Fuzz Testing: The Future of QA
Streamlining Project Management in the AV Industry with D-Tools for Zoho CRM ...
Mobile App for Guard Tour and Reporting.pdf
Building an Inclusive Web Accessibility Made Simple with Accessibility Analyzer

Deploying OpenNebula in an HPC environment

  • 1. Deploying OpenNebula in an HPC environment Alfred Gil Chief Computational Scientist & Cofounder OpenNebula Cloud TechDay Barcelona, May 2019
  • 2. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  • 3. Quick introduction to HPCNow! ● Global HPC consulting company ● IT + scientific background ● HPC services and solutions ● User-oriented company ● Hardware agnostic Company overview
  • 4. System Administrators and User Support Top500 Supercomputer Users Company overview
  • 6. Batch scheduler Slurm, LSF, PBS, Torque, SGE Cluster manager sNow!, xCat, Rocks, Bright Monitoring & alerts tools Ganglia, Nagios, Icinga, Grafana, Elastic Search Parallel file system BeeGFS, Lustre, GPFS, HDFS, CEPH Company overview
  • 7. User environment User libraries, Modules, EasyBuild, Spack Development tools Compilers: GNU, Intel, PGI, IBM XL compilers; Debuggers and profilers: V-Tune, DDT, GDB Scientific and engineering applications More than 100 references. Contact us to know more. Company overview
  • 8. Virtualization OpenNebula, OpenStack, VMware, Xen-Source Containers Singularity, Docker, Docker Swarm, LXD Remote visualization TurboVNC, VirtualGL, Websocket, DCV, X2Go HPC Portal EnginFrame Company overview
  • 9. Contributions to HPC Community Company overview
  • 10. Public sector Private Companies Company overview
  • 12. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  • 13. What is High Performance Computing? Many tasks and/or threads working together to solve different parts of a single larger problem. This is achieved with parallel programming, which usually requires large shared memory systems or low latency and high bandwidth network. Motivation
  • 14. HPC users need more than just compute solution ❅ Workflow: Pre-processing and post-processing, workflow frameworks,... ❅ Web services: RStudio, Galaxy, Jupyter notebook, JMS,... ❅ Software managers: Anaconda, EasyBuild, Spack,... ❅ Prebuilt software: Docker, Singularity, VM image (NeuroDebian,..),... Motivation
  • 15. Convergence Solution HPC Cluster, Singularity, Docker Swarm, OpenNebula Allows to dynamically re-architect / re-purpose the HPC solution to accommodate different roles / user needs. Motivation
  • 16. Dynamic Provisioning Hybrid nodes Vestibulumcongue Vestibulum congue Vestibulum congue Spare Nodes OpenNebula Slurm DockerSwarm Use Resource scontrol update node=X state=RESUME onehost enable X docker node update --availability active X 1 Release Resource scontrol update node=X state=DOWN onehost offline X docker node update --availability drain X 2 Motivation
  • 17. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  • 18. mgmnt compute mgmnt hybrid storage Use case Architecture
  • 19. mgmnt Management node ● VM’s (xen) ○ slurm01 slurmctld ○ slurmdb01 slurmdbd ○ ceph01 ceph-deploy ○ oneceph01 oned, sunstone, oneflow, onegate ○ login01 ○ ldap01 ● exports /home via NFS Architecture
  • 20. Global configuration ● OpenNebula v5.6.0 ● Ceph v13.2.1 mimic ● Datastore ○ standard ceph configuration ■ cephds type Image ■ ceph_system type System ● Nodes with kvm hypervisor ● NIC’s with virtio model Architecture
  • 21. • HPCNow! company overview • Motivation • Architecture • Implementation • Conclusions
  • 22. Stumbling blocks along the way ● Snapshots ○ datastore for images configured as raw ■ recommended for ceph using RBD ○ images stored as raw, even created as qcow2 ○ snapshot of system disk, and recovering from ceph ■ rbd ls -l -p one ● Bridge destroyed when no virtual NIC linked ○ switch keep_empty_bridge to true in /var/lib/one/remotes/etc/vnm/OpenNebulaNetwork.conf ■ bug preventing to transfer config to hypervisors at /var/tmp/one/etc/vnm/OpenNebulaNetwork.conf ○ create virtual network with PHYDEV unset one-2-103-0 one-2-103-0@0 one-2-104-0 Implementation
  • 23. Stumbling blocks along the way ● VM could not communicate with each other ○ switch net.bridge.bridge-nf-call-iptables parameter to 0. ○ tried to do it persistent in /etc/sysctl..d/bridge-nf-call.conf and /usr/lib/sysctl.d/00-system.conf ■ bug prevents for working, when sysctl runs the bridge kernel module is not already loaded. ○ fixed by modifying /usr/lib/systemd/system/libvirtd.service Type=notify EnvironmentFile=-/etc/sysconfig/libvirtd ExecStart=/usr/sbin/libvirtd $LIBVIRTD_ARGS +ExecStartPost=/usr/bin/sleep 30s +ExecStartPost=/usr/sbin/sysctl -w net.bridge.bridge-nf-call-iptables=0 +ExecStartPost=/usr/sbin/sysctl -p ExecReload=/bin/kill -HUP $MAINPID KillMode=process Restart=on-failure Implementation
  • 24. Stumbling blocks along the way ● VM creation from Sunstone ended with FAILED status ○ error: Cannot check QEMU binary /usr/bin/qemu-system-x86_64: No such file or directory ■ ln -s /usr/libexec/qemu-kvm /usr/bin/qemu-system-x86_64 Implementation
  • 25. • HPCNow! company overview • Motivation • Architecture • Conclusions
  • 26. Conclusions ● We architected and implemented a solution deploying nodes with hybrid role. ● This solution allows dynamically re-purpose the cluster to accommodate the user needs. ● OpenNebula has been found to be a really easy tool to install, deploy and manage. ● Useful tips and collaboration in the forum to troubleshoot issues. Conclusions
  • 27. [email protected] www.hpcnow.com Marie Curie, 8 - 08042 Barcelona (Spain) 34 Fernly Rise, 2019 Auckland (New Zealand) Barcelona Auckland