SlideShare a Scribd company logo
Savanna -
Hadoop on
OpenStack
Mirantis, 2013Sergey Lukjanov
Savanna Technical Lead
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Open source native OpenStack component
● Supports different Hadoop distributions
● Solves both bare cluster provisioning use case
and "analytics as a service"
● Managed through REST API
● Web UI as part of the OpenStack Dashboard
● Flexible templates of Hadoop configurations
Savanna - Elastic Hadoop on OpenStack
● Project home - https://launchpad.net/savanna
○ bug tracking
○ blueprints
○ answers
● Code review (gerrit) - https://review.openstack.org
● Sources - https://github.com/stackforge/savanna
● Mailing list - savanna-all@lists.launchpad.net
● CI - https://jenkins.openstack.org and
http://jenkins.savanna.mirantis.com
Savanna - Elastic Hadoop on OpenStack
● Contributors:
○ large core team from Mirantis
○ teams from RedHat, Hortonworks
○ several minor contributors
● Intel joined recently
● Several upcoming customers
Savanna - Participants
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
● Administrators - centralized cluster management
and monitoring
● Dev and QA teams - fast clusters provisioning
● Data Scientists/Analysts - API to run the analytic
jobs with infrastructure provisioning happening
under the hood
● Making resources dedicated to IaaS cloud
available for Hadoop workload
Savanna Use Cases
● Central point of control over infrastructure
● Enables self-service capabilities, including choice
of Hadoop distribution to be used
● Integration with vendor tooling:
○ Ambari for Apache/HortonWorks
○ Cloudera Management Console
○ Intel Hadoop
● Utilization of free IaaS capacity for Hadoop tasks
Administrators Use Case
● Fast on-demand provisioning of the
environments
● Increase agility and speed of innovation
● Controlled access to data from production
Dev and QA Use Cases
● Simplified tasks execution - complexity of
provisioning and managing cluster hidden under
the hood
○ Access to higher level interfaces (e.g. pig, hive)
● Bursty workload: ad-hoc queries requiring a
significant resource only for short time period
● Utilization of free IaaS capacity for Hadoop tasks
Analytics Use Cases
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Roadmap for Hadoop in Cloud
Phase 1
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 1 - Basic Cluster Operation
● Cluster provisioning
● Deployment Engine implementation for pre-
installed images
● Templates for Hadoop cluster configuration
● REST API for cluster startup and operations
● Web UI integrated into OpenStack Dashboard
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 2 - Advanced Configuration
● Hadoop cluster configuration support:
○ Solutions for HDFS data reliability issue
○ Configurable DN storage location
○ Configurable topology of DN, NN, TT, JT
○ Add/remove nodes
○ More Hadoop parameters
● Integration with vendor
deployment/management tooling
● Basic monitoring support
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Phase 3 - Analytics as a Service
● API to execute Map/Reduce jobs without
exposing details of underlying infrastructure
(similar to AWS EMR)
● User-friendly UI for ad-hoc analytics queries
based on Hive or Pig
Roadmap for Hadoop in Cloud
Phase 1 [Released - April, 10]
Basic cluster provisioning of Apache Hadoop
Phase 2 [In progress - July 15]
Cluster operation support and integration with tooling,
advanced configuration (HDFS, Swift, etc.)
Phase 3 [Planned - October 15]
"Analytics as a service": job execution framework, support
different scripting languages, deeper integration with OS
Further Roadmap
● Autoscaling
● HA for NameNode
● Deeper HDFS and Swift integration
○ Caching of Swift data on HDFS
● Integration with logging and error handling
● HBase support
● Savanna Overview
● Savanna Use Cases
● Roadmap & Current Status
● Architecture & Features Overview
● Hadoop vs. Virtualization
Agenda
Architecture Overview
Savanna
Python
Client
RESTAPI
Cluster
Configuration
Manager
Horizon
Keystone
Auth
DAL
Nova
Glance
Swift
Savanna
Pages
Hadoop
VM
Provisioning
Plugin
Hadoop
VM
Hadoop
VM
Hadoop
VM
Instance
Interop Helper
Image
Registry
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
● HDFS Reliability
● Data Persistence
● I/O Performance
● etc.
Hadoop vs. Virtualization
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: the issue
Compute
DN DN
DN
DN DN
DN
Data Block
Compute
HDFS Reliability: single DN per host
DN
Compute
TT | DN
Compute
DN
Compute
DN
Cluster A Cluster B
HDFS Reliability: Hadoop-8468
hypervisor-awareness for HDFS scheduler
DN
Compute
DN DN
Compute
DN DN
Compute
DN
HDFS
Data Block
HDFS Reliability: Hadoop-8545
enables Swift for Hadoop
Swift
Hadoop
Job #1
HDFS
Hadoop
Job #2
...
Hadoop
Job #N
initial input
final output
● Master node(s)
● Worker nodes
Configurable topology of DN, NN, TT, JT
JT | NN JT NN+
TTTT | DN DN
10 6 8
HDFS Placement Options
● Ephemeral drive
/var/lib/nova/instances/instance-xxx/disk ->
/mnt/ephemeral
● Block storage volume
Cinder Volume -> /mnt/volume
● Bare hard drive support
/dev/sdb -> /mnt/sdb
Q&A
We are hiring!
Phase 1 deployment mechanism
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Savanna
Provision VMs with
pre-installed Hadoop
Configure Hadoop
Cluster
Tool usage scenarios
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
Tool
Manage Hadoop Cluster
VMVM
VM VM
Tool
Provision &
Manage Hadoop Cluster
Scenario I
Scenario II
Extensible Provisioning
● get extra configs
● validate input
● launch/terminate cluster
● add/remove nodes
● launch/terminate VMs
● get VM status
● ssh/scp to VM
Instance Interop
● register image in
Savanna
● add/remove tags
● get image by tag
Image registry
Plugin
S
a
v
a
n
n
a
get extra parameters
add/remove nodes
Provisioning Interaction
launch cluster
launch cluster
get extra parameters
for the plugin
S
a
v
a
n
n
a
U
s
e
r
P
l
u
g
i
n
validate cluster
parameters
add/remove nodes
launch cluster
add/remove nodes
Provisioning: Launching a Cluster
launch VMs
P
L
U
G
I
N
Image
Registry
Instance
Interop
Helper
get image
by tag
launch VMs
install and
configure
Hadoop
Hadoop
VM
Hadoop
VM
Hadoop
VM
Hadoop
VM
pass
commands
via ssh, scp
Q&A
We are hiring!

More Related Content

What's hot (20)

PPTX
Data Processing Updates - Juno Edition
OpenStack Foundation
 
PDF
Savanna: Hadoop on OpenStack
Mirantis
 
PPTX
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
PPTX
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
PDF
Hadoop and OpenStack
DataWorks Summit
 
PDF
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
spinningmatt
 
PDF
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
PPTX
20150314 sahara intro and the future plan for open stack meetup
Wei Ting Chen
 
PDF
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
PDF
Tachyon and Apache Spark
rhatr
 
PDF
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
PDF
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
PDF
IEEE International Conference on Data Engineering 2015
Yousun Jeong
 
PDF
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
PPTX
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 
PDF
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
 
PPTX
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
PDF
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
PPTX
20151027 sahara + manila final
Wei Ting Chen
 
PDF
Spark day 2017 - Spark on Kubernetes
Yousun Jeong
 
Data Processing Updates - Juno Edition
OpenStack Foundation
 
Savanna: Hadoop on OpenStack
Mirantis
 
Lessons Learned from Building an Enterprise Big Data Platform from the Ground...
DataWorks Summit
 
20150425 experimenting with openstack sahara on docker
Wei Ting Chen
 
Hadoop and OpenStack
DataWorks Summit
 
OpenStack Trove Day (19 Aug 2014, Cambridge MA) - Sahara
spinningmatt
 
State of Spark in the cloud (Spark Summit EU 2017)
Nicolas Poggi
 
20150314 sahara intro and the future plan for open stack meetup
Wei Ting Chen
 
Performance Troubleshooting Using Apache Spark Metrics
Databricks
 
Tachyon and Apache Spark
rhatr
 
How to Share State Across Multiple Apache Spark Jobs using Apache Ignite with...
Spark Summit
 
Spark Pipelines in the Cloud with Alluxio with Gene Pang
Spark Summit
 
IEEE International Conference on Data Engineering 2015
Yousun Jeong
 
Apache Spark on K8S Best Practice and Performance in the Cloud
Databricks
 
Procella: A fast versatile SQL query engine powering data at Youtube
DataWorks Summit
 
Fast, In-Memory SQL on Apache Cassandra with Apache Ignite (Rachel Pedreschi,...
DataStax
 
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Feeding Cassandra with Spark-Streaming and Kafka
DataStax Academy
 
20151027 sahara + manila final
Wei Ting Chen
 
Spark day 2017 - Spark on Kubernetes
Yousun Jeong
 

Viewers also liked (20)

PDF
Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance
 
PPT
Product Release Road-map Guide
Bim Akinfenwa
 
PDF
WSO2 Quarterly Technical Update
WSO2
 
PDF
Metalnox Product Overview
Dan Barefoot
 
PPTX
Share point 2010 roadmap
ctc TrainCanada
 
PDF
Roadmap for successful IT budgeting
Absoft Limited
 
PDF
Mobile ECM: Using the Nuxeo Platform from mobile devices
Nuxeo
 
PDF
Technical roadmap 2015 - Nuxeo Tour 2014
Nuxeo
 
PPTX
Windows azure overview
ctc TrainCanada
 
PDF
Gemtalk Product Roadmap
ESUG
 
PPTX
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
dhaval2929
 
PPTX
Introduction to GreenTouch
greentouch-org
 
PDF
New Products - Template and Roadmap Best Practices
sarjanacoid
 
PPS
Reverse Engineering for exploit writers
amiable_indian
 
PDF
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
Puppet
 
PDF
Asap roadmap
Rach Zsims
 
PPT
Change Presented ad A Project Roadmap: Infographic Template
dmdk12
 
PDF
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
Puppet
 
PPT
Tesla roadster
dmyers1
 
PDF
Mapping the Experience: How to Plan a Career Roadmap
Alison J. Herzog, MBA
 
Open Data Center Alliance Workgroups, Usage Models and Roadmap Structure
Open Data Center Alliance
 
Product Release Road-map Guide
Bim Akinfenwa
 
WSO2 Quarterly Technical Update
WSO2
 
Metalnox Product Overview
Dan Barefoot
 
Share point 2010 roadmap
ctc TrainCanada
 
Roadmap for successful IT budgeting
Absoft Limited
 
Mobile ECM: Using the Nuxeo Platform from mobile devices
Nuxeo
 
Technical roadmap 2015 - Nuxeo Tour 2014
Nuxeo
 
Windows azure overview
ctc TrainCanada
 
Gemtalk Product Roadmap
ESUG
 
Mr. Ravi Shankar Gopal | Roadmap for growth in nonwovens industry in india
dhaval2929
 
Introduction to GreenTouch
greentouch-org
 
New Products - Template and Roadmap Best Practices
sarjanacoid
 
Reverse Engineering for exploit writers
amiable_indian
 
PuppetConf 2016: A Roadmap for a Platform: Mixing Metaphors for Fun and Profi...
Puppet
 
Asap roadmap
Rach Zsims
 
Change Presented ad A Project Roadmap: Infographic Template
dmdk12
 
PuppetConf 2016: Can You Manage Me Now? Humanizing Configuration Management a...
Puppet
 
Tesla roadster
dmyers1
 
Mapping the Experience: How to Plan a Career Roadmap
Alison J. Herzog, MBA
 
Ad

Similar to Savanna - Elastic Hadoop on OpenStack (20)

DOCX
Prashanth Kumar_Hadoop_NEW
Prashanth Shankar kumar
 
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
PDF
3-2-1 Action! Running OpenStack Shared File System Service in Production
Sean Cohen
 
DOCX
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Mopuru Babu
 
PPTX
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
PPTX
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Anant Corporation
 
PDF
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
PDF
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
PDF
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
Wong Hoi Sing Edison
 
DOC
Resume_VipinKP
indhuparvathy
 
PPTX
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
PPTX
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
PPTX
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
PDF
Sap integration with_j_boss_technologies
Serge Pagop
 
PPTX
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
PDF
Pivotal HAWQ 소개
Seungdon Choi
 
PPTX
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
PDF
TechTalkThai webinar SAP HANA
Jarut Nakaramaleerat
 
PPTX
Hortonworks.bdb
Emil Andreas Siemes
 
PPTX
HDInsight Hadoop on Windows Azure
Lynn Langit
 
Prashanth Kumar_Hadoop_NEW
Prashanth Shankar kumar
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Databricks
 
3-2-1 Action! Running OpenStack Shared File System Service in Production
Sean Cohen
 
Sunshine consulting mopuru babu cv_java_j2ee_spring_bigdata_scala
Mopuru Babu
 
[Hadoop Meetup] Apache Hadoop 3 community update - Rohith Sharma
Newton Alex
 
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
Anant Corporation
 
VMworld 2013: Big Data Platform Building Blocks: Serengeti, Resource Manageme...
VMworld
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
[HKOSCON][20180616][Containerized High Availability Virtual Hosting Deploymen...
Wong Hoi Sing Edison
 
Resume_VipinKP
indhuparvathy
 
What's the Hadoop-la about Kubernetes?
DataWorks Summit
 
Vmware Serengeti - Based on Infochimps Ironfan
Jim Kaskade
 
Apache Tez - A unifying Framework for Hadoop Data Processing
DataWorks Summit
 
Sap integration with_j_boss_technologies
Serge Pagop
 
Presto for the Enterprise @ Hadoop Meetup
Wojciech Biela
 
Pivotal HAWQ 소개
Seungdon Choi
 
Big Data Day LA 2015 - What's new and next in Apache Tez by Bikas Saha of Hor...
Data Con LA
 
TechTalkThai webinar SAP HANA
Jarut Nakaramaleerat
 
Hortonworks.bdb
Emil Andreas Siemes
 
HDInsight Hadoop on Windows Azure
Lynn Langit
 
Ad

More from Sergey Lukjanov (6)

PDF
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
Sergey Lukjanov
 
PDF
Courses: concurrency #2
Sergey Lukjanov
 
PDF
Twitter Storm
Sergey Lukjanov
 
PDF
Java Agents and Instrumentation techtalk
Sergey Lukjanov
 
PDF
Java Bytecode techtalk
Sergey Lukjanov
 
PDF
Kotlin techtalk
Sergey Lukjanov
 
[Mirantis Day 2015] Проект Sahara - BigData на OpenStack
Sergey Lukjanov
 
Courses: concurrency #2
Sergey Lukjanov
 
Twitter Storm
Sergey Lukjanov
 
Java Agents and Instrumentation techtalk
Sergey Lukjanov
 
Java Bytecode techtalk
Sergey Lukjanov
 
Kotlin techtalk
Sergey Lukjanov
 

Recently uploaded (20)

PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Timothy Rottach - Ramp up on AI Use Cases, from Vector Search to AI Agents wi...
AWS Chicago
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 

Savanna - Elastic Hadoop on OpenStack

  • 1. Savanna - Hadoop on OpenStack Mirantis, 2013Sergey Lukjanov Savanna Technical Lead
  • 2. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 3. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 4. ● Open source native OpenStack component ● Supports different Hadoop distributions ● Solves both bare cluster provisioning use case and "analytics as a service" ● Managed through REST API ● Web UI as part of the OpenStack Dashboard ● Flexible templates of Hadoop configurations Savanna - Elastic Hadoop on OpenStack
  • 5. ● Project home - https://launchpad.net/savanna ○ bug tracking ○ blueprints ○ answers ● Code review (gerrit) - https://review.openstack.org ● Sources - https://github.com/stackforge/savanna ● Mailing list - [email protected] ● CI - https://jenkins.openstack.org and http://jenkins.savanna.mirantis.com Savanna - Elastic Hadoop on OpenStack
  • 6. ● Contributors: ○ large core team from Mirantis ○ teams from RedHat, Hortonworks ○ several minor contributors ● Intel joined recently ● Several upcoming customers Savanna - Participants
  • 7. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 8. ● Administrators - centralized cluster management and monitoring ● Dev and QA teams - fast clusters provisioning ● Data Scientists/Analysts - API to run the analytic jobs with infrastructure provisioning happening under the hood ● Making resources dedicated to IaaS cloud available for Hadoop workload Savanna Use Cases
  • 9. ● Central point of control over infrastructure ● Enables self-service capabilities, including choice of Hadoop distribution to be used ● Integration with vendor tooling: ○ Ambari for Apache/HortonWorks ○ Cloudera Management Console ○ Intel Hadoop ● Utilization of free IaaS capacity for Hadoop tasks Administrators Use Case
  • 10. ● Fast on-demand provisioning of the environments ● Increase agility and speed of innovation ● Controlled access to data from production Dev and QA Use Cases
  • 11. ● Simplified tasks execution - complexity of provisioning and managing cluster hidden under the hood ○ Access to higher level interfaces (e.g. pig, hive) ● Bursty workload: ad-hoc queries requiring a significant resource only for short time period ● Utilization of free IaaS capacity for Hadoop tasks Analytics Use Cases
  • 12. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 13. Roadmap for Hadoop in Cloud Phase 1 Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 14. Phase 1 - Basic Cluster Operation ● Cluster provisioning ● Deployment Engine implementation for pre- installed images ● Templates for Hadoop cluster configuration ● REST API for cluster startup and operations ● Web UI integrated into OpenStack Dashboard
  • 15. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 16. Phase 2 - Advanced Configuration ● Hadoop cluster configuration support: ○ Solutions for HDFS data reliability issue ○ Configurable DN storage location ○ Configurable topology of DN, NN, TT, JT ○ Add/remove nodes ○ More Hadoop parameters ● Integration with vendor deployment/management tooling ● Basic monitoring support
  • 17. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 18. Phase 3 - Analytics as a Service ● API to execute Map/Reduce jobs without exposing details of underlying infrastructure (similar to AWS EMR) ● User-friendly UI for ad-hoc analytics queries based on Hive or Pig
  • 19. Roadmap for Hadoop in Cloud Phase 1 [Released - April, 10] Basic cluster provisioning of Apache Hadoop Phase 2 [In progress - July 15] Cluster operation support and integration with tooling, advanced configuration (HDFS, Swift, etc.) Phase 3 [Planned - October 15] "Analytics as a service": job execution framework, support different scripting languages, deeper integration with OS
  • 20. Further Roadmap ● Autoscaling ● HA for NameNode ● Deeper HDFS and Swift integration ○ Caching of Swift data on HDFS ● Integration with logging and error handling ● HBase support
  • 21. ● Savanna Overview ● Savanna Use Cases ● Roadmap & Current Status ● Architecture & Features Overview ● Hadoop vs. Virtualization Agenda
  • 23. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 24. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 25. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 26. ● HDFS Reliability ● Data Persistence ● I/O Performance ● etc. Hadoop vs. Virtualization
  • 27. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 28. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 29. HDFS Reliability: the issue Compute DN DN DN DN DN DN Data Block Compute
  • 30. HDFS Reliability: single DN per host DN Compute TT | DN Compute DN Compute DN Cluster A Cluster B
  • 31. HDFS Reliability: Hadoop-8468 hypervisor-awareness for HDFS scheduler DN Compute DN DN Compute DN DN Compute DN HDFS Data Block
  • 32. HDFS Reliability: Hadoop-8545 enables Swift for Hadoop Swift Hadoop Job #1 HDFS Hadoop Job #2 ... Hadoop Job #N initial input final output
  • 33. ● Master node(s) ● Worker nodes Configurable topology of DN, NN, TT, JT JT | NN JT NN+ TTTT | DN DN 10 6 8
  • 34. HDFS Placement Options ● Ephemeral drive /var/lib/nova/instances/instance-xxx/disk -> /mnt/ephemeral ● Block storage volume Cinder Volume -> /mnt/volume ● Bare hard drive support /dev/sdb -> /mnt/sdb
  • 35. Q&A
  • 37. Phase 1 deployment mechanism Hadoop VM Hadoop VM Hadoop VM Hadoop VM Savanna Provision VMs with pre-installed Hadoop Configure Hadoop Cluster
  • 38. Tool usage scenarios Hadoop VM Hadoop VM Hadoop VM Hadoop VM Tool Manage Hadoop Cluster VMVM VM VM Tool Provision & Manage Hadoop Cluster Scenario I Scenario II
  • 39. Extensible Provisioning ● get extra configs ● validate input ● launch/terminate cluster ● add/remove nodes ● launch/terminate VMs ● get VM status ● ssh/scp to VM Instance Interop ● register image in Savanna ● add/remove tags ● get image by tag Image registry Plugin S a v a n n a
  • 40. get extra parameters add/remove nodes Provisioning Interaction launch cluster launch cluster get extra parameters for the plugin S a v a n n a U s e r P l u g i n validate cluster parameters add/remove nodes launch cluster add/remove nodes
  • 41. Provisioning: Launching a Cluster launch VMs P L U G I N Image Registry Instance Interop Helper get image by tag launch VMs install and configure Hadoop Hadoop VM Hadoop VM Hadoop VM Hadoop VM pass commands via ssh, scp
  • 42. Q&A