SlideShare a Scribd company logo
© Hortonworks Inc. 2013
Hadoop + OpenStack
integration Roadmap
Himanshu Bari
June 28th, 2013
Sr. Product Manager
hbari@hortonworks.com
© Hortonworks Inc. 2013
Disclaimer
•  This document may contain product features and technology directions
that are under development or may be under development in the future.
•  Technical feasibility, market demand, user feedback, and the Apache
Software Foundation community development process can all affect
timing and final delivery.
•  This document’s description of these features and technology
directions does not represent a contractual commitment from
Hortonworks to deliver these features in any generally available
product.
•  Product features and technology directions are subject to change, and
must not be included in contracts, purchase orders, or sales
agreements of any kind.
© Hortonworks Inc. 2013
Agenda
Why
Hadoop on
OpenStack
Use cases
A bit under
the hood
© Hortonworks Inc. 2013
Big Data & Cloud
Intersection
Point è2013
Big Data & Cloud are top priority for CIOs
Page 4
*
© Hortonworks Inc. 2013
OpenStack is an open source cloud
management platform
Glance
Image Service
Keystone
Identity Service
Horizon
QuantumNova
Cinder
Block Store
Swift
Object Store
(Apache License)
Ceilometer
Metering
Heat
Orchestration
Integrated
Mutli-hypervisor & guest OS
support
© Hortonworks Inc. 2013
OpenStack has taken over Amazon AWS in
market awareness…
Source: Google trends
© Hortonworks Inc. 2013
Maturing quickly with broad support..
Pushed	
  by	
  	
  
150+	
  vendors	
  	
  	
  
Millions	
  of	
  dollars	
  in	
  
venture	
  capital	
  
Early	
  adop;on	
  across	
  all	
  
ver;cals	
  
© Hortonworks Inc. 2013
Why Hadoop & OpenStack?
Hadoop provides a greenfield
use case
•  Net new workload
•  Needs scale out
infrastructure
•  Shared platform
OpenStack provides the perfect
cloud platform
•  Operational agility
•  Supports scale out architecture
•  Deployment choice across
public & private clouds
1.  Open source communities provide the fastest path to innovation
2.  Open source is changing the game as economics and accessibility serve to
accelerate cloud & big data market trends
3.  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc…
Marries two of the largest open source movements
© Hortonworks Inc. 2013
Accelerate Adoption of Hadoop on OpenStack
Page 9
The leading contributor
to Apache Hadoop
The leading system
integrator for OpenStack
The leading contributor
to OpenStack
Apache Hadoop…
The killer app for OpenStack
© Hortonworks Inc. 2013
OpenStack Infrastructure
Savanna
Elastic Hadoop Controller
Collaborating on Project Savanna
Page 10
Swift
storage
Hadoop Cluster
N
N
N
N
N
N
2
Ambari
Hadoop management
- - + +
N
N
N
N
1
3
1.  Cluster templates: deploy
pre configured Hadoop
clusters in seconds from
Horizon or Ambari
2.  HDFS-Swift connectors:
move data between HDFS
and Swift object storage
3.  Simplified elasticity
Project Savanna
Automate deployment of
Apache Hadoop on
OpenStack
© Hortonworks Inc. 2013
Agenda
Why
Hadoop on
OpenStack
Use cases
A bit under
the hood
© Hortonworks Inc. 2013
Focus on API driven tight integration
Hide Hadoop complexity
through APIs
“It Just Works” experience
Fully leverage virtualization
Scalability, Reliability,
Performance
Project Savanna
design Goals
© Hortonworks Inc. 2013
Problems driving use cases
Finance
Compliance
ITMarketing
Web
Mobile
Sensor
Interactive
Batch
Dev QA
Prod
Operational nightmare of
supporting multiple cluster flavors
Lack of agility
Underutilized
resources
Maintenance
complications
Cluster requirements vary by business unit,
data type & analytics use case
Can’t migrate from public to private cloud
© Hortonworks Inc. 2013
Provisioning related use cases
-  Frequent dev/test/staging cluster provision requests
-  Migrations from staging to prod and vice versa
-  Reduce operator error in cluster provisioning
-  Migrate away from Amazon EMR for Ad hoc analytics
requests to support experimentation
© Hortonworks Inc. 2013
Simplified provisioningPhase-1Phase-2
Use as is Single click
provisioning
Modify
Update VM
resource
allocation,
service to
VM mapping
and service
config
Provision
and/or save
template
Template based provisioning
Hadoop as a service (job flow based provisioning)
Pick	
  job	
  type	
  
+	
  
Cascading,	
  streaming	
  &	
  	
  
custom	
  jar	
  
Upload data
to Swift
Get results in
Swift
Cluster	
  template	
  
E.g.	
  QA	
  cluster	
  
Node	
  template	
  
	
  
a.	
  Resource	
  based	
  
	
  	
  	
  	
  -­‐	
  node.Large	
  
b.	
  Func;on	
  based	
  
	
  	
  	
  	
  -­‐	
  node.NameNode	
  
	
  
Modify
© Hortonworks Inc. 2013
Ambari embedded in Horizon
© Hortonworks Inc. 2013
Swift object store support
Phase-1
Phase-2 Bug fixes & optimizations
Read/write data from/to Swift object stores
Option-1: Copy data from Swift to HDFS, run mapreduce
and copy results back to swift
Option-2: Run mapreduce directly on top of Swift (Output
data still needs to be copied from HDFS to Swift)
© Hortonworks Inc. 2013
Elasticity related use cases
-  Commission a new node or decommission a node for
maintenance
-  For dev/test/staging clusters: automatically vary
cluster data & compute capacity based on tenant,
workload, time of day, resource utilization etc.
-  Automatically vary compute capacity for production
clusters
© Hortonworks Inc. 2013
Elasticity
Nodeelasticity
(computeand/ordata)
Manual
Rule
based
Long lived Short lived
Cluster life
(Swift or HDFS used for storage)
Phase-1
Phase-2
Handle variable
workloads eg. Alter
cluster compute node
count for peak/off-peak
hrs.
Job flow based
clusters for
ad-hoc analysis
Best for
Dev/QA use
Best for predictable
workloads.
© Hortonworks Inc. 2013
Multi-tenancy related use cases
-  Improve server utilization by creating a common
server pool for Hadoop and non Hadoop workloads
-  Simplify maintenance & upgrade testing with the
ability to multiple Hadoop clusters with different
versions on the same server pool
-  Support varying SLAs based on tenant and workload
through resource isolation provided by VMs
-  Simplify chargeback/showback
© Hortonworks Inc. 2013
Multi-tenancy
Phase-1
Phase-2
•  Access isolation
•  Single sign-on for Ambari & HUE through Keystone
integration
•  Dedicated Ambari & HUE instance per cluster per
tenant
•  Resource isolation
•  CPU, memory isolation through VMs
•  Ability to pin a Hadoop VM to a given set of physical
hosts to enable per tenant physical host isolation
•  Version isolation
•  Choice of Hadoop versions for tenants
•  Access isolation
•  Single Ambari instance per tenant ( multi-cluster
support with Ambari)
•  Keystone enhancements to support Hadoop job flow
level RBAC to support Hadoop as a service
© Hortonworks Inc. 2013
Agenda
Why
Hadoop on
OpenStack
Use cases
A bit
under the
hood
© Hortonworks Inc. 2013
Savanna logical architecture
OpenStack Infrastructure
Network Storage
Security Compute
Savanna
Controller
HDP Savanna plugin
API
Hadoop
Provisioning
Ambari template
management
Horizon +
Savanna UI
A
P
I
Configuration Elasticity
Orchestration
Plugin manager
Hadoop Cluster
Ambari + API
© Hortonworks Inc. 2013
Provisioning workflow overview
24
Horizon	
  
Savanna
Controller
+
HDP OpenStack
Plugin
Nova	
   Glance	
  Cluster
request
Provisions
vanilla
VMs
Ambari
configures all
services and
starts the
cluster
VM IMAGE
OS only
OR
Pre loaded
with HDP bits
HDP plugin
passes
cluster
template to
Ambari
Hadoop
Cluster
…
…
HDP
Plugin
installs
Ambari
Ambari
Server
HUE
NN JT DNDN
© Hortonworks Inc. 2013
Ambari based cluster templates
Preconfigured information across all
clusters using this template
HDP Stack Information
- Services & Components & Packages
- Description
- Package Dependencies
Hadoop Topology
Component / Host Group Mapping
Hadoop Configuration
All Hadoop Configuration for the Cluster
(hundreds of parameters and their
values)
Per cluster pluggable data
- User names
- Passwords
- Host names
- Host VM flavors ( CPU/Mem)
- Node count per host group
……….
……….
……….
……….
© Hortonworks Inc. 2013
Swift object store support (Hadoop-8545)
Dir
File1 file2 file3
KEYSTONE	
  
Dir/file1	
   Dir/file2	
  
MapReduce,
pig & Hive
Swift store-1
Create, read, write,
delete, mkdir, ls, mv
& stat
HDFS
+
Swift
Bridge
Container -1 Container -2
Swift store-n
…
Dir/file3	
  
Container -1
Input data
Output results
© Hortonworks Inc. 2013
Hadoop virtualization extensions(HVE)
• Account for the additional ‘node group’ layer so
replicas do not end up on VMs in the same hypervisor
• Available in HDP 1.3. Work in progress to enable in
HDP 2.0 ( YARN & HDFS)
Data
Center
Rack-1
Node
group-1
VM1 VM2
Node
group-2
VM1 VM2
Rack-2
Node
group-1
VM1 VM2
Node
group-2
VM1 VM2
-  Replica (place,
choose & remove)
policies
-  Balancer policies
-  Task placement &
container
allocation(YARN)

More Related Content

PPTX
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Hortonworks
 
PPTX
Apache Ambari BOF - APIs - Hadoop Summit 2013
Hortonworks
 
PPTX
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
PPTX
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
PPTX
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Werner Keil
 
PDF
Hadoop and OpenStack - Hadoop Summit San Jose 2014
spinningmatt
 
PDF
Hadoop and OpenStack
DataWorks Summit
 
PPTX
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 
Apache Ambari BOF - Blueprints + Azure - Hadoop Summit 2013
Hortonworks
 
Apache Ambari BOF - APIs - Hadoop Summit 2013
Hortonworks
 
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
Triple-E’class Continuous Delivery with Hudson, Maven, Kokki and PyDev
Werner Keil
 
Hadoop and OpenStack - Hadoop Summit San Jose 2014
spinningmatt
 
Hadoop and OpenStack
DataWorks Summit
 
Operationalizing YARN based Hadoop Clusters in the Cloud
DataWorks Summit/Hadoop Summit
 

What's hot (20)

PDF
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
PDF
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
spinningmatt
 
PPTX
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
PDF
Yarns About Yarn
Cloudera, Inc.
 
PPTX
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
PPTX
Get most out of Spark on YARN
DataWorks Summit
 
PPTX
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
PPTX
Empower Hive with Spark
DataWorks Summit
 
PDF
Pig Out to Hadoop
Hortonworks
 
PPTX
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Cloudera, Inc.
 
PPTX
Apache Hadoop YARN: best practices
DataWorks Summit
 
PPTX
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
PPTX
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
PDF
Hive on kafka
Szehon Ho
 
PPTX
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
 
PPTX
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
 
PPTX
Hadoop on Docker
Rakesh Saha
 
PPTX
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
PDF
Hortonworks technical workshop operations with ambari
Hortonworks
 
PPTX
Tez Data Processing over Yarn
InMobi Technology
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
The state of the art for OpenStack Data Processing (Hadoop on OpenStack) - At...
spinningmatt
 
Hive analytic workloads hadoop summit san jose 2014
alanfgates
 
Yarns About Yarn
Cloudera, Inc.
 
Lessons Learned Running Hadoop and Spark in Docker Containers
BlueData, Inc.
 
Get most out of Spark on YARN
DataWorks Summit
 
One Click Hadoop Clusters - Anywhere (Using Docker)
DataWorks Summit
 
Empower Hive with Spark
DataWorks Summit
 
Pig Out to Hadoop
Hortonworks
 
Hadoop Summit 2012 | A New Generation of Data Transfer Tools for Hadoop: Sqoop 2
Cloudera, Inc.
 
Apache Hadoop YARN: best practices
DataWorks Summit
 
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Hive on kafka
Szehon Ho
 
Apache Tez - Accelerating Hadoop Data Processing
hitesh1892
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
 
Hadoop on Docker
Rakesh Saha
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
Hortonworks technical workshop operations with ambari
Hortonworks
 
Tez Data Processing over Yarn
InMobi Technology
 
Ad

Viewers also liked (10)

PDF
Hadoop on OpenStack
Sandeep Raju
 
PDF
Hadoop on OpenStack - Trove Day 2014
Tesora
 
PDF
Hadoop For OpenStack Log Analysis
OpenStack Foundation
 
PDF
2012 09-08-josug-jeff
Zheng (Jeff) Xu
 
PDF
Dell Crowbar Software Framework for OpenStack Deployments
Mike Pittaro
 
PDF
Savanna: Hadoop on OpenStack
Mirantis
 
PDF
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
 
PPTX
20150314 sahara intro and the future plan for open stack meetup
Wei Ting Chen
 
PDF
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
PDF
Enabling exploratory data science with Spark and R
Databricks
 
Hadoop on OpenStack
Sandeep Raju
 
Hadoop on OpenStack - Trove Day 2014
Tesora
 
Hadoop For OpenStack Log Analysis
OpenStack Foundation
 
2012 09-08-josug-jeff
Zheng (Jeff) Xu
 
Dell Crowbar Software Framework for OpenStack Deployments
Mike Pittaro
 
Savanna: Hadoop on OpenStack
Mirantis
 
Hadoop on OpenStack - Sahara @DevNation 2014
spinningmatt
 
20150314 sahara intro and the future plan for open stack meetup
Wei Ting Chen
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
Enabling exploratory data science with Spark and R
Databricks
 
Ad

Similar to Apache Ambari BOF - OpenStack - Hadoop Summit 2013 (20)

PDF
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
PDF
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
PDF
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
PDF
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
PPTX
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
PPTX
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
PDF
Hortonworks HDP, Is it goog enough ?
Huxi LI
 
PPTX
Docker based Hadoop Deployment
Rakesh Saha
 
PPTX
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
Cisco DevNet
 
PPTX
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
PPTX
Streamline Hadoop DevOps with Apache Ambari
Jayush Luniya
 
PPTX
Managing Enterprise Hadoop Clusters with Apache Ambari
Hortonworks
 
PPTX
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 
PPTX
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
PPTX
Ambari blueprints-overview
Shivaji Dutta
 
PPTX
Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit
 
PPTX
Hadoop in the Clouds, Virtualization and Virtual Machines
DataWorks Summit
 
PPTX
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
PDF
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
PDF
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Savanna - Elastic Hadoop on OpenStack
Sergey Lukjanov
 
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
Hortonworks HDP, Is it goog enough ?
Huxi LI
 
Docker based Hadoop Deployment
Rakesh Saha
 
DEVNET-1141 Dynamic Dockerized Hadoop Provisioning
Cisco DevNet
 
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Streamline Hadoop DevOps with Apache Ambari
Jayush Luniya
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Hortonworks
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Ambari blueprints-overview
Shivaji Dutta
 
Accumulo Summit 2014: Monitoring Apache Accumulo
Accumulo Summit
 
Hadoop in the Clouds, Virtualization and Virtual Machines
DataWorks Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 

More from Hortonworks (20)

PDF
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
PDF
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PDF
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
PDF
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PPTX
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
PDF
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
PDF
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
PDF
Premier Inside-Out: Apache Druid
Hortonworks
 
PDF
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
PDF
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
PDF
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
PDF
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
PDF
Making Enterprise Big Data Small with Ease
Hortonworks
 
PDF
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
PDF
Driving Digital Transformation Through Global Data Management
Hortonworks
 
PPTX
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
PDF
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
PDF
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 
Hortonworks DataFlow (HDF) 3.3 - Taking Stream Processing to the Next Level
Hortonworks
 
IoT Predictions for 2019 and Beyond: Data at the Heart of Your IoT Strategy
Hortonworks
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Johns Hopkins - Using Hadoop to Secure Access Log Events
Hortonworks
 
Catch a Hacker in Real-Time: Live Visuals of Bots and Bad Guys
Hortonworks
 
HDF 3.2 - What's New
Hortonworks
 
Curing Kafka Blindness with Hortonworks Streams Messaging Manager
Hortonworks
 
Interpretation Tool for Genomic Sequencing Data in Clinical Environments
Hortonworks
 
IBM+Hortonworks = Transformation of the Big Data Landscape
Hortonworks
 
Premier Inside-Out: Apache Druid
Hortonworks
 
Accelerating Data Science and Real Time Analytics at Scale
Hortonworks
 
TIME SERIES: APPLYING ADVANCED ANALYTICS TO INDUSTRIAL PROCESS DATA
Hortonworks
 
Blockchain with Machine Learning Powered by Big Data: Trimble Transportation ...
Hortonworks
 
Delivering Real-Time Streaming Data for Healthcare Customers: Clearsense
Hortonworks
 
Making Enterprise Big Data Small with Ease
Hortonworks
 
Webinewbie to Webinerd in 30 Days - Webinar World Presentation
Hortonworks
 
Driving Digital Transformation Through Global Data Management
Hortonworks
 
HDF 3.1 pt. 2: A Technical Deep-Dive on New Streaming Features
Hortonworks
 
Hortonworks DataFlow (HDF) 3.1 - Redefining Data-In-Motion with Modern Data A...
Hortonworks
 
Unlock Value from Big Data with Apache NiFi and Streaming CDC
Hortonworks
 

Recently uploaded (20)

PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Doc9.....................................
SofiaCollazos
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 

Apache Ambari BOF - OpenStack - Hadoop Summit 2013

  • 1. © Hortonworks Inc. 2013 Hadoop + OpenStack integration Roadmap Himanshu Bari June 28th, 2013 Sr. Product Manager [email protected]
  • 2. © Hortonworks Inc. 2013 Disclaimer •  This document may contain product features and technology directions that are under development or may be under development in the future. •  Technical feasibility, market demand, user feedback, and the Apache Software Foundation community development process can all affect timing and final delivery. •  This document’s description of these features and technology directions does not represent a contractual commitment from Hortonworks to deliver these features in any generally available product. •  Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
  • 3. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  • 4. © Hortonworks Inc. 2013 Big Data & Cloud Intersection Point è2013 Big Data & Cloud are top priority for CIOs Page 4 *
  • 5. © Hortonworks Inc. 2013 OpenStack is an open source cloud management platform Glance Image Service Keystone Identity Service Horizon QuantumNova Cinder Block Store Swift Object Store (Apache License) Ceilometer Metering Heat Orchestration Integrated Mutli-hypervisor & guest OS support
  • 6. © Hortonworks Inc. 2013 OpenStack has taken over Amazon AWS in market awareness… Source: Google trends
  • 7. © Hortonworks Inc. 2013 Maturing quickly with broad support.. Pushed  by     150+  vendors       Millions  of  dollars  in   venture  capital   Early  adop;on  across  all   ver;cals  
  • 8. © Hortonworks Inc. 2013 Why Hadoop & OpenStack? Hadoop provides a greenfield use case •  Net new workload •  Needs scale out infrastructure •  Shared platform OpenStack provides the perfect cloud platform •  Operational agility •  Supports scale out architecture •  Deployment choice across public & private clouds 1.  Open source communities provide the fastest path to innovation 2.  Open source is changing the game as economics and accessibility serve to accelerate cloud & big data market trends 3.  Both are attracting major ecosystem players: IBM, RHT, HP, RAX, etc… Marries two of the largest open source movements
  • 9. © Hortonworks Inc. 2013 Accelerate Adoption of Hadoop on OpenStack Page 9 The leading contributor to Apache Hadoop The leading system integrator for OpenStack The leading contributor to OpenStack Apache Hadoop… The killer app for OpenStack
  • 10. © Hortonworks Inc. 2013 OpenStack Infrastructure Savanna Elastic Hadoop Controller Collaborating on Project Savanna Page 10 Swift storage Hadoop Cluster N N N N N N 2 Ambari Hadoop management - - + + N N N N 1 3 1.  Cluster templates: deploy pre configured Hadoop clusters in seconds from Horizon or Ambari 2.  HDFS-Swift connectors: move data between HDFS and Swift object storage 3.  Simplified elasticity Project Savanna Automate deployment of Apache Hadoop on OpenStack
  • 11. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  • 12. © Hortonworks Inc. 2013 Focus on API driven tight integration Hide Hadoop complexity through APIs “It Just Works” experience Fully leverage virtualization Scalability, Reliability, Performance Project Savanna design Goals
  • 13. © Hortonworks Inc. 2013 Problems driving use cases Finance Compliance ITMarketing Web Mobile Sensor Interactive Batch Dev QA Prod Operational nightmare of supporting multiple cluster flavors Lack of agility Underutilized resources Maintenance complications Cluster requirements vary by business unit, data type & analytics use case Can’t migrate from public to private cloud
  • 14. © Hortonworks Inc. 2013 Provisioning related use cases -  Frequent dev/test/staging cluster provision requests -  Migrations from staging to prod and vice versa -  Reduce operator error in cluster provisioning -  Migrate away from Amazon EMR for Ad hoc analytics requests to support experimentation
  • 15. © Hortonworks Inc. 2013 Simplified provisioningPhase-1Phase-2 Use as is Single click provisioning Modify Update VM resource allocation, service to VM mapping and service config Provision and/or save template Template based provisioning Hadoop as a service (job flow based provisioning) Pick  job  type   +   Cascading,  streaming  &     custom  jar   Upload data to Swift Get results in Swift Cluster  template   E.g.  QA  cluster   Node  template     a.  Resource  based          -­‐  node.Large   b.  Func;on  based          -­‐  node.NameNode     Modify
  • 16. © Hortonworks Inc. 2013 Ambari embedded in Horizon
  • 17. © Hortonworks Inc. 2013 Swift object store support Phase-1 Phase-2 Bug fixes & optimizations Read/write data from/to Swift object stores Option-1: Copy data from Swift to HDFS, run mapreduce and copy results back to swift Option-2: Run mapreduce directly on top of Swift (Output data still needs to be copied from HDFS to Swift)
  • 18. © Hortonworks Inc. 2013 Elasticity related use cases -  Commission a new node or decommission a node for maintenance -  For dev/test/staging clusters: automatically vary cluster data & compute capacity based on tenant, workload, time of day, resource utilization etc. -  Automatically vary compute capacity for production clusters
  • 19. © Hortonworks Inc. 2013 Elasticity Nodeelasticity (computeand/ordata) Manual Rule based Long lived Short lived Cluster life (Swift or HDFS used for storage) Phase-1 Phase-2 Handle variable workloads eg. Alter cluster compute node count for peak/off-peak hrs. Job flow based clusters for ad-hoc analysis Best for Dev/QA use Best for predictable workloads.
  • 20. © Hortonworks Inc. 2013 Multi-tenancy related use cases -  Improve server utilization by creating a common server pool for Hadoop and non Hadoop workloads -  Simplify maintenance & upgrade testing with the ability to multiple Hadoop clusters with different versions on the same server pool -  Support varying SLAs based on tenant and workload through resource isolation provided by VMs -  Simplify chargeback/showback
  • 21. © Hortonworks Inc. 2013 Multi-tenancy Phase-1 Phase-2 •  Access isolation •  Single sign-on for Ambari & HUE through Keystone integration •  Dedicated Ambari & HUE instance per cluster per tenant •  Resource isolation •  CPU, memory isolation through VMs •  Ability to pin a Hadoop VM to a given set of physical hosts to enable per tenant physical host isolation •  Version isolation •  Choice of Hadoop versions for tenants •  Access isolation •  Single Ambari instance per tenant ( multi-cluster support with Ambari) •  Keystone enhancements to support Hadoop job flow level RBAC to support Hadoop as a service
  • 22. © Hortonworks Inc. 2013 Agenda Why Hadoop on OpenStack Use cases A bit under the hood
  • 23. © Hortonworks Inc. 2013 Savanna logical architecture OpenStack Infrastructure Network Storage Security Compute Savanna Controller HDP Savanna plugin API Hadoop Provisioning Ambari template management Horizon + Savanna UI A P I Configuration Elasticity Orchestration Plugin manager Hadoop Cluster Ambari + API
  • 24. © Hortonworks Inc. 2013 Provisioning workflow overview 24 Horizon   Savanna Controller + HDP OpenStack Plugin Nova   Glance  Cluster request Provisions vanilla VMs Ambari configures all services and starts the cluster VM IMAGE OS only OR Pre loaded with HDP bits HDP plugin passes cluster template to Ambari Hadoop Cluster … … HDP Plugin installs Ambari Ambari Server HUE NN JT DNDN
  • 25. © Hortonworks Inc. 2013 Ambari based cluster templates Preconfigured information across all clusters using this template HDP Stack Information - Services & Components & Packages - Description - Package Dependencies Hadoop Topology Component / Host Group Mapping Hadoop Configuration All Hadoop Configuration for the Cluster (hundreds of parameters and their values) Per cluster pluggable data - User names - Passwords - Host names - Host VM flavors ( CPU/Mem) - Node count per host group ………. ………. ………. ……….
  • 26. © Hortonworks Inc. 2013 Swift object store support (Hadoop-8545) Dir File1 file2 file3 KEYSTONE   Dir/file1   Dir/file2   MapReduce, pig & Hive Swift store-1 Create, read, write, delete, mkdir, ls, mv & stat HDFS + Swift Bridge Container -1 Container -2 Swift store-n … Dir/file3   Container -1 Input data Output results
  • 27. © Hortonworks Inc. 2013 Hadoop virtualization extensions(HVE) • Account for the additional ‘node group’ layer so replicas do not end up on VMs in the same hypervisor • Available in HDP 1.3. Work in progress to enable in HDP 2.0 ( YARN & HDFS) Data Center Rack-1 Node group-1 VM1 VM2 Node group-2 VM1 VM2 Rack-2 Node group-1 VM1 VM2 Node group-2 VM1 VM2 -  Replica (place, choose & remove) policies -  Balancer policies -  Task placement & container allocation(YARN)