SlideShare a Scribd company logo
Docker-Based
Hadoop Provisioning
On Cisco InterCloud
Innovation Architect, CIS CTO Group
Cisco
Dmitri Chtchourov Rakesh Saha
Product Management
Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Cautionary Statement Regarding Forward-Looking Statements
This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this
presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in
usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the
capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can
identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,”
“could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms
or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as
predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current
expectations and projections about future events and trends that we believe may affect our business, financial condition and
prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be
achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking
statements.
The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we
undertake no obligation to update any of the information in this presentation.
Trademarks
Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be
trademarks of their respective owners.
Speakers
Rakesh Saha
Product Management
Hortonworks
Dmitri Chtchourov
Innovation Architect, CIS CTO Group
Cisco
Agenda
• About Hortonworks
• Cloudbreak – Docker-based Hadoop provisioning tool
• Introduction to Docker
• Hadoop Provisioning using Docker
• Cisco and Hortonworks Collaboration
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
About HortonworksONLY
100open source
Apache Hadoop data platform
% Founded in 2011
HADOOP
1ST
distribution to go public
IPO Fall 2014 (NASDAQ: HDP)
subscription
customers322 employees across
600+
countrie
s
technology partners
1000+ 17TM
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Hortonworks
Mission:
Power your Modern Data Architecture
with HDP and Enterprise Apache Hadoop
Customer Momentum
• 300+ customers in seven quarters, growing at 75+/quarter
• Two thirds of customers come from F1000
Hortonworks and Hadoop at
Scale
• HDP in production on largest clusters on planet
• Multiple +1000 node clusters, including 35,000 nodes at
Yahoo!, 800 nodes at Spotify
• Founded in 2011
• Original 24 architects, developers,
operators of Hadoop from Yahoo!
• We are leaders in Hadoop community
• 500+ employees
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
OPERATIONAL TOOLS
DEV & DATA TOOLS
INFRASTRUCTURE
HDP is deeply integrated in the data centerSOURCES
EXISTING
Systems
Clickstream Web &Social Geolocation Sensor &
Machine
Server Logs Unstructured
DATASYSTEM
RDBMS EDW MPP
APPLICATIONS
Deep Partnerships
Hortonworks engages in deep
engineered relationships with the
leaders in the data center, such as
Cisco, Microsoft, EMC, Pivotal,
Teradata, Red Hat, SAS & SAP.
Broad Partnerships
Over a 1,000 partners work with us
to certify their applications to work
with Hadoop so they can extend big
data to their users.
HDP
Governance
&Integration
Security
Operations
Data Access
Data Management
YARN
Agenda
Cloudbreak Docker Provisioning Collaboration
Cloudbreak
• Developed by SequenceIQ
• Open source with Apache 2.0
license [ Apache project soon ]
• Deploys selected services to
public and private cloud via
Ambari Blueprints
• Elastic – can spin up any number
of nodes, add/remove on the fly
• Provides full cloud lifecycle
management post-deployment
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Launch HDP on Any Cloud for Any Application
Dev / Test
(all HDP services)
Data Science
(Spark)
Cloudbreak
1. Pick a Blueprint
2. Choose a Cloud
3. Launch HDP!
Example Ambari
Blueprints:
IoT Apps, BI / Analytics, Data Science, Dev /
Test
Hadoop in Cloud Provisioning with Cloudbreak
Create
Templates
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Template
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Blueprint
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Provider Credentials
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Provisioning: Launch
Create
Template
Provide
Blueprint
Associate
Credentials
Launch
Cluster
Specialized Blueprints
Quick productivity with pre-configured clusters blueprints
 Lambda Architecture
 Machine Learning
 Batch ETL
 …
BI / Analytics
(Hive)
IoT Apps
(Storm, HBase, Hive)
Dev / Test
(all HDP services)
Data Science
(Spark)
Autoscaling
Policy
• Policies based on any Ambari metrics
• Coordinates with YARN
• Policies are based on Metrics or Time
• Scaling can be service or component
type specific
Optimize cloud usage via Elastic Clusters
Auto-scale
Policy
Auto-scale
Policy
Auto-scale
Policy
YARN
Ambari
Alerts
Ambari
Metrics
Ambari
Ambari
Ambari
Provisioning
Cloudbreak
Static
Dynamic
Enforces Policies
Scales
Cluster/YARN Apps
Metrics and Alerts Feed
Cloudbreak
Scaling for Static and Dynamic Clusters
Provisioning – How it works
Start VMs -
with a running
Docker
daemon
Cloudbreak
Bootstrap
•Start Consul
Cluster
•Start Swarm
Cluster (Consul
for discovery)
Start Ambari
servers/agents
- Swarm API
Ambari
services
registered in
Consul
(Registrator)
Post Blueprint
Agenda
Cloudbreak Docker Provisioning Collaboration
Multiplicity
of
Stacks
Multiplicity
of hardware
environments
Static website Web frontendUser DB Queue Analytics DB
Development
VM QA server Public Cloud
Contributor’s
laptopProduction
Cluster
Customer Data
Center
An engine that enables any payload to be
encapsulated as a lightweight, portable,
self-sufficient container
Docker is a “Shipping Container” System for Code
 Lightweight, portable
 Build once, run anywhere
 VM – without the overhead of a VM
 Isolated containers
 Automated and scripted
Docker
Why Is Docker So Exciting?
For Developers:
Build once…run anywhere
• A clean, safe, and portable runtime
environment for your app.
• No missing dependencies, packages etc.
• Run each app in its own isolated container
• Automate testing, integration, packaging
• Reduce/eliminate concerns about
compatibility on different platforms
• Cheap, zero-penalty containers to deploy
services
For DevOps:
Configure once…run anything
• Make the entire lifecycle more efficient,
consistent, and repeatable
• Eliminate inconsistencies between SDLC
stages
• Support segregation of duties
• Significantly improves the speed and
reliability of CICD
• Significantly lightweight compared to VMs
App
A
Hypervisor (Type 2)
Host OS
Server
Guest
OS
Bins/
Libs
App
A’
Guest
OS
Bins/
Libs
App
B
Guest
OS
Bins/
Libs
Docker
Host OS kernel
Server
bin
AppA
lib
AppB
VM
Container
Containers are isolated,
Share only the kernel
Guest
OS
Guest
OS
…result is significantly faster
deployment, much less overhead,
easier migration, faster restart
lib
AppB
lib
AppB
lib
AppB
bin
AppA
Docker: Containers vs. VMs
Agenda
Cloudbreak Docker Provisioning Collaboration
HDP as Docker
Containers
via Cloudbreak
• Running Ambari Cluster in Containers
• Use Blueprint to define services
• All HDP services share a single container
Cloudb
reak
Ambari HDP
Installs
Ambari on
the VMs
Docker
VM
Docker
VM
Docker
Linux
Instruct
s
Ambari
to build
HDP
cluster
Cloud Provider/Bare Metal
Provisions
VMs from
Cloud
Providers
Run Hadoop as Docker Containers
Swarm + Consul for Placement and Discovery
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-
agn
amb-ser
amb-
agn
amb-
agn
amb-
agn
amb-
agn
Blueprint
Cloudbreak
Run Hadoop as Docker containers
Docker Docker
DockerDockerDocker
Docker
amb-agn
- hdfs
- hbase
amb-ser
amb-agn
-hdfs
-hive
amb-agn
-hdfs
-yarn
amb-agn
-hdfs
-zookpr
amb-agn
-nmnode
-hdfs
• Quick installation with pre-pulled rpms
• Same process/images for dev/qa/prod
• Same process for single/multi-node
Benefits of running Hadoop on Docker
Demo
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Hadoop on Docker
Agenda
Cloudbreak Docker Provisioning Collaboration
Cisco and Hortonworks’ Partnership
100% open source Hadoop Distribution,
Support and Training
Integrated Infrastructures for Big Data
CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD
YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY,
SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF
OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.
Results of the collaboration
• Efficient Hadoop as a
service
• Adoption of Docker for
enterprise Hadoop
deployment
Tasks
Cisco
InterCloud
Public Cloud
Provider
HDP installation
15:04 mins 11:55 mins
Teragen (avg of 3 execution)
7:08 mins 22:15 mins
Terasort(avg of 3 execution)
32:09 mins 60:12 mins
Teravalidate(avg of 3
execution)
2:31 mins 10:40 mins
Observations Future Collaboration
• Docker is maturing inside enterprises
• Interest to run Docker on top of bare
metal
• Big data app developers are leaning
towards containerization of apps
• YARN is becoming application
deployment platform beyond big data
apps
• Demand for native containerized fully
managed app on YARN
• Run Docker natively on
Openstack
• Run Docker on Yarn
• OpenStack bare metal
Conclusion
Data Science
IoT
BI / Analytics
Dev / Test
Blueprints
HDP
HDP + Cisco InterCloud - Efficient Hadoop-as-a-service
Learn More
Download the Hortonworks Sandbox
Learn Hadoop
Build Your Analytic App
Try Hadoop 2
More about Cisco & Hortonworks
http://hortonworks.com/partner/cisco/
More about Hortonworks’ Acquisition of SequenceIQ
http://bit.ly/1R1ktxO

More Related Content

What's hot (20)

PPTX
How to deploy Apache Spark in a multi-tenant, on-premises environment
BlueData, Inc.
 
PPTX
Micro services vs hadoop
Gergely Devenyi
 
PPTX
DevNexus 2015: Kubernetes & Container Engine
Kit Merker
 
PDF
Bare-metal performance for Big Data workloads on Docker containers
BlueData, Inc.
 
PDF
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
PPTX
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
PDF
Webinar: OpenStack Benefits for VMware
Platform9
 
PPTX
Ansible + Hadoop
Michael Young
 
PPTX
Apache Spark Operations
Cloudera, Inc.
 
PDF
Big data and Kubernetes
Anirudh Ramanathan
 
PPTX
HPC and cloud distributed computing, as a journey
Peter Clapham
 
PPTX
Intro to Apache Spark
Cloudera, Inc.
 
PPTX
Running An Apache Project: 10 Traps and How to Avoid Them
Owen O'Malley
 
PDF
Openshift Container Platform on Azure
Glenn West
 
PPTX
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
PDF
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
PDF
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
PPTX
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
VMware Tanzu
 
PPTX
Dev ops for big data cluster management tools
Ran Silberman
 
PDF
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 
How to deploy Apache Spark in a multi-tenant, on-premises environment
BlueData, Inc.
 
Micro services vs hadoop
Gergely Devenyi
 
DevNexus 2015: Kubernetes & Container Engine
Kit Merker
 
Bare-metal performance for Big Data workloads on Docker containers
BlueData, Inc.
 
Hadoop Everywhere & Cloudbreak
Sean Roberts
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
Webinar: OpenStack Benefits for VMware
Platform9
 
Ansible + Hadoop
Michael Young
 
Apache Spark Operations
Cloudera, Inc.
 
Big data and Kubernetes
Anirudh Ramanathan
 
HPC and cloud distributed computing, as a journey
Peter Clapham
 
Intro to Apache Spark
Cloudera, Inc.
 
Running An Apache Project: 10 Traps and How to Avoid Them
Owen O'Malley
 
Openshift Container Platform on Azure
Glenn West
 
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Hortonworks Technical Workshop: HDP everywhere - cloud considerations using...
Hortonworks
 
APACHE TOREE: A JUPYTER KERNEL FOR SPARK by Marius van Niekerk
Spark Summit
 
Cloud Foundry and OpenStack - A Marriage Made in Heaven! (Cloud Foundry Summi...
VMware Tanzu
 
Dev ops for big data cluster management tools
Ran Silberman
 
Apache Ambari BOF - OpenStack - Hadoop Summit 2013
Hortonworks
 

Viewers also liked (9)

PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
PPTX
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
 
PDF
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Jeffrey Breen
 
PPTX
Managing Docker Containers In A Cluster - Introducing Kubernetes
Marc Sluiter
 
PPTX
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
PDF
Docker Swarm Cluster
Fernando Ike
 
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
PPTX
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
PPT
Docker based Hadoop provisioning - Hadoop Summit 2014
Janos Matyas
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Building a Graph Database in Neo4j with Spark & Spark SQL to gain new insight...
DataWorks Summit/Hadoop Summit
 
Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily.....
Jeffrey Breen
 
Managing Docker Containers In A Cluster - Introducing Kubernetes
Marc Sluiter
 
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Docker Swarm Cluster
Fernando Ike
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Configuring Your First Hadoop Cluster On EC2
benjaminwootton
 
Docker based Hadoop provisioning - Hadoop Summit 2014
Janos Matyas
 
Ad

Similar to Hadoop on Docker (20)

PPTX
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
PDF
Data in the Cloud Crash Course
DataWorks Summit
 
PDF
Data in the Cloud Crash Course
DataWorks Summit
 
PDF
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
PDF
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
PPTX
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
PPTX
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
PDF
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
PPTX
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
PDF
Hortonworks HDP, Is it goog enough ?
Huxi LI
 
PDF
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
PDF
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
PPTX
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
PDF
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
PDF
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
PDF
Apache Hadoop on the Open Cloud
Hortonworks
 
PPTX
Hadoop and Spark – Perfect Together
Hortonworks
 
PPTX
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
PPTX
S2DS London 2015 - Hadoop Real World
Sean Roberts
 
PPTX
Yahoo! Hack Europe
Hortonworks
 
Docker based Hadoop provisioning - anywhere
Janos Matyas
 
Data in the Cloud Crash Course
DataWorks Summit
 
Data in the Cloud Crash Course
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
Getting the Most Out of Your Data in the Cloud with Cloudbreak
Hortonworks
 
Mrinal devadas, Hortonworks Making Sense Of Big Data
PatrickCrompton
 
Hortonworks HDP, Is it goog enough ?
Huxi LI
 
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Storm Demo Talk - Colorado Springs May 2015
Mac Moore
 
Don't Let Security Be The 'Elephant in the Room'
Hortonworks
 
A Comprehensive Approach to Building your Big Data - with Cisco, Hortonworks ...
Hortonworks
 
Rescue your Big Data from Downtime with HP Operations Bridge and Apache Hadoop
Hortonworks
 
Apache Hadoop on the Open Cloud
Hortonworks
 
Hadoop and Spark – Perfect Together
Hortonworks
 
Moving towards enterprise ready Hadoop clusters on the cloud
DataWorks Summit/Hadoop Summit
 
S2DS London 2015 - Hadoop Real World
Sean Roberts
 
Yahoo! Hack Europe
Hortonworks
 
Ad

Recently uploaded (20)

PPTX
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
PDF
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
PDF
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
PPTX
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
PPTX
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
PPTX
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PPTX
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
PPTX
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PDF
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
PPTX
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PPTX
Home Care Tools: Benefits, features and more
Third Rock Techkno
 
OpenChain @ OSS NA - In From the Cold: Open Source as Part of Mainstream Soft...
Shane Coughlan
 
Build It, Buy It, or Already Got It? Make Smarter Martech Decisions
bbedford2
 
SAP Firmaya İade ABAB Kodları - ABAB ile yazılmıl hazır kod örneği
Salih Küçük
 
Help for Correlations in IBM SPSS Statistics.pptx
Version 1 Analytics
 
AI + DevOps = Smart Automation with devseccops.ai.pdf
Devseccops.ai
 
Customise Your Correlation Table in IBM SPSS Statistics.pptx
Version 1 Analytics
 
AOMEI Partition Assistant Crack 10.8.2 + WinPE Free Downlaod New Version 2025
bashirkhan333g
 
AEM User Group: India Chapter Kickoff Meeting
jennaf3
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Milwaukee Marketo User Group - Summer Road Trip: Mapping and Personalizing Yo...
bbedford2
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
SciPy 2025 - Packaging a Scientific Python Project
Henry Schreiner
 
Comprehensive Risk Assessment Module for Smarter Risk Management
EHA Soft Solutions
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
How to Hire AI Developers_ Step-by-Step Guide in 2025.pdf
DianApps Technologies
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Adobe Premiere Pro Crack / Full Version / Free Download
hashhshs786
 
Homogeneity of Variance Test Options IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
Home Care Tools: Benefits, features and more
Third Rock Techkno
 

Hadoop on Docker

  • 1. Docker-Based Hadoop Provisioning On Cisco InterCloud Innovation Architect, CIS CTO Group Cisco Dmitri Chtchourov Rakesh Saha Product Management Hortonworks
  • 2. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Cautionary Statement Regarding Forward-Looking Statements This presentation contains forward-looking statements involving risks and uncertainties. Such forward-looking statements in this presentation generally relate to future events, our ability to increase the number of support subscription customers, the growth in usage of the Hadoop framework, our ability to innovate and develop the various open source projects that will enhance the capabilities of the Hortonworks Data Platform, anticipated customer benefits and general business outlook. In some cases, you can identify forward-looking statements because they contain words such as “may,” “will,” “should,” “expects,” “plans,” “anticipates,” “could,” “intends,” “target,” “projects,” “contemplates,” “believes,” “estimates,” “predicts,” “potential” or “continue” or similar terms or expressions that concern our expectations, strategy, plans or intentions. You should not rely upon forward-looking statements as predictions of future events. We have based the forward-looking statements contained in this presentation primarily on our current expectations and projections about future events and trends that we believe may affect our business, financial condition and prospects. We cannot assure you that the results, events and circumstances reflected in the forward-looking statements will be achieved or occur, and actual results, events, or circumstances could differ materially from those described in the forward-looking statements. The forward-looking statements made in this prospectus relate only to events as of the date on which the statements are made and we undertake no obligation to update any of the information in this presentation. Trademarks Hortonworks is a trademark of Hortonworks, Inc. in the United States and other jurisdictions. Other names used herein may be trademarks of their respective owners.
  • 3. Speakers Rakesh Saha Product Management Hortonworks Dmitri Chtchourov Innovation Architect, CIS CTO Group Cisco
  • 4. Agenda • About Hortonworks • Cloudbreak – Docker-based Hadoop provisioning tool • Introduction to Docker • Hadoop Provisioning using Docker • Cisco and Hortonworks Collaboration
  • 5. © Hortonworks Inc. 2011 – 2015. All Rights Reserved About HortonworksONLY 100open source Apache Hadoop data platform % Founded in 2011 HADOOP 1ST distribution to go public IPO Fall 2014 (NASDAQ: HDP) subscription customers322 employees across 600+ countrie s technology partners 1000+ 17TM
  • 6. © Hortonworks Inc. 2011 – 2015. All Rights Reserved Hortonworks Mission: Power your Modern Data Architecture with HDP and Enterprise Apache Hadoop Customer Momentum • 300+ customers in seven quarters, growing at 75+/quarter • Two thirds of customers come from F1000 Hortonworks and Hadoop at Scale • HDP in production on largest clusters on planet • Multiple +1000 node clusters, including 35,000 nodes at Yahoo!, 800 nodes at Spotify • Founded in 2011 • Original 24 architects, developers, operators of Hadoop from Yahoo! • We are leaders in Hadoop community • 500+ employees
  • 7. © Hortonworks Inc. 2011 – 2015. All Rights Reserved OPERATIONAL TOOLS DEV & DATA TOOLS INFRASTRUCTURE HDP is deeply integrated in the data centerSOURCES EXISTING Systems Clickstream Web &Social Geolocation Sensor & Machine Server Logs Unstructured DATASYSTEM RDBMS EDW MPP APPLICATIONS Deep Partnerships Hortonworks engages in deep engineered relationships with the leaders in the data center, such as Cisco, Microsoft, EMC, Pivotal, Teradata, Red Hat, SAS & SAP. Broad Partnerships Over a 1,000 partners work with us to certify their applications to work with Hadoop so they can extend big data to their users. HDP Governance &Integration Security Operations Data Access Data Management YARN
  • 9. Cloudbreak • Developed by SequenceIQ • Open source with Apache 2.0 license [ Apache project soon ] • Deploys selected services to public and private cloud via Ambari Blueprints • Elastic – can spin up any number of nodes, add/remove on the fly • Provides full cloud lifecycle management post-deployment
  • 10. BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Launch HDP on Any Cloud for Any Application Dev / Test (all HDP services) Data Science (Spark) Cloudbreak 1. Pick a Blueprint 2. Choose a Cloud 3. Launch HDP! Example Ambari Blueprints: IoT Apps, BI / Analytics, Data Science, Dev / Test
  • 11. Hadoop in Cloud Provisioning with Cloudbreak Create Templates Provide Blueprint Associate Credentials Launch Cluster
  • 16. Specialized Blueprints Quick productivity with pre-configured clusters blueprints  Lambda Architecture  Machine Learning  Batch ETL  …
  • 17. BI / Analytics (Hive) IoT Apps (Storm, HBase, Hive) Dev / Test (all HDP services) Data Science (Spark) Autoscaling Policy • Policies based on any Ambari metrics • Coordinates with YARN • Policies are based on Metrics or Time • Scaling can be service or component type specific Optimize cloud usage via Elastic Clusters
  • 19. Provisioning – How it works Start VMs - with a running Docker daemon Cloudbreak Bootstrap •Start Consul Cluster •Start Swarm Cluster (Consul for discovery) Start Ambari servers/agents - Swarm API Ambari services registered in Consul (Registrator) Post Blueprint
  • 21. Multiplicity of Stacks Multiplicity of hardware environments Static website Web frontendUser DB Queue Analytics DB Development VM QA server Public Cloud Contributor’s laptopProduction Cluster Customer Data Center An engine that enables any payload to be encapsulated as a lightweight, portable, self-sufficient container Docker is a “Shipping Container” System for Code
  • 22.  Lightweight, portable  Build once, run anywhere  VM – without the overhead of a VM  Isolated containers  Automated and scripted Docker
  • 23. Why Is Docker So Exciting? For Developers: Build once…run anywhere • A clean, safe, and portable runtime environment for your app. • No missing dependencies, packages etc. • Run each app in its own isolated container • Automate testing, integration, packaging • Reduce/eliminate concerns about compatibility on different platforms • Cheap, zero-penalty containers to deploy services For DevOps: Configure once…run anything • Make the entire lifecycle more efficient, consistent, and repeatable • Eliminate inconsistencies between SDLC stages • Support segregation of duties • Significantly improves the speed and reliability of CICD • Significantly lightweight compared to VMs
  • 24. App A Hypervisor (Type 2) Host OS Server Guest OS Bins/ Libs App A’ Guest OS Bins/ Libs App B Guest OS Bins/ Libs Docker Host OS kernel Server bin AppA lib AppB VM Container Containers are isolated, Share only the kernel Guest OS Guest OS …result is significantly faster deployment, much less overhead, easier migration, faster restart lib AppB lib AppB lib AppB bin AppA Docker: Containers vs. VMs
  • 26. HDP as Docker Containers via Cloudbreak • Running Ambari Cluster in Containers • Use Blueprint to define services • All HDP services share a single container Cloudb reak Ambari HDP Installs Ambari on the VMs Docker VM Docker VM Docker Linux Instruct s Ambari to build HDP cluster Cloud Provider/Bare Metal Provisions VMs from Cloud Providers Run Hadoop as Docker Containers
  • 27. Swarm + Consul for Placement and Discovery
  • 28. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker
  • 29. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb- agn amb-ser amb- agn amb- agn amb- agn amb- agn Blueprint
  • 30. Cloudbreak Run Hadoop as Docker containers Docker Docker DockerDockerDocker Docker amb-agn - hdfs - hbase amb-ser amb-agn -hdfs -hive amb-agn -hdfs -yarn amb-agn -hdfs -zookpr amb-agn -nmnode -hdfs
  • 31. • Quick installation with pre-pulled rpms • Same process/images for dev/qa/prod • Same process for single/multi-node Benefits of running Hadoop on Docker
  • 32. Demo
  • 43. Cisco and Hortonworks’ Partnership 100% open source Hadoop Distribution, Support and Training Integrated Infrastructures for Big Data CISCO AND HORTONWORKS ARE PARTNERING TO HELP YOU BUILD YOUR BIG DATA SOLUTION AND REACH MASSIVE SCALABILITY, SUPERIOR EFFICIENCY AND DRAMATICALLY LOWER TOTAL COST OF OWNERSHIP THANKS TO A VALIDATED JOINT ARCHITECTURE.
  • 44. Results of the collaboration • Efficient Hadoop as a service • Adoption of Docker for enterprise Hadoop deployment Tasks Cisco InterCloud Public Cloud Provider HDP installation 15:04 mins 11:55 mins Teragen (avg of 3 execution) 7:08 mins 22:15 mins Terasort(avg of 3 execution) 32:09 mins 60:12 mins Teravalidate(avg of 3 execution) 2:31 mins 10:40 mins
  • 45. Observations Future Collaboration • Docker is maturing inside enterprises • Interest to run Docker on top of bare metal • Big data app developers are leaning towards containerization of apps • YARN is becoming application deployment platform beyond big data apps • Demand for native containerized fully managed app on YARN • Run Docker natively on Openstack • Run Docker on Yarn • OpenStack bare metal
  • 46. Conclusion Data Science IoT BI / Analytics Dev / Test Blueprints HDP HDP + Cisco InterCloud - Efficient Hadoop-as-a-service
  • 47. Learn More Download the Hortonworks Sandbox Learn Hadoop Build Your Analytic App Try Hadoop 2 More about Cisco & Hortonworks http://hortonworks.com/partner/cisco/ More about Hortonworks’ Acquisition of SequenceIQ http://bit.ly/1R1ktxO

Editor's Notes

  • #2: Deploying Hadoop on Openstack is never been easier but Hortonworks and Cisco collaboration in last few months makes it completely automated and seamless.
  • #3: This is cautionary statement as this presentation may have product and collaboration direction which are subject to change.
  • #6: We were founded in 2011 by 24 developers from Yahoo where Hadoop was conceived to address data challenges at internet scale. What we now know of as Hadoop really started in 2005, when a team at Yahoo was directed to build out a large-scale data storage and processing technology that would allow them to improve their most critical application, Search. Their challenge was essentially two-fold. First they needed to capture and archive the contents of the internet, and then process the data so that users could search through it effectively an efficiently. Clearly traditional approaches were both technically (due to the size of the data) and commercially (due to the cost) impractical. The result was the Apache Hadoop project that delivered large scale storage (HDFS) and processing (MapReduce). Today we are over 600 employees and have partnered with over 1000 companies who are the leaders in the data center We have also been very fortunate to achieve very significant customer adoption with over 330 customers as of the end of 2014, spanning nearly every vertical.   Hortonworks was founded the sole intent to make Hadoop an enterprise data platform. With YARN as its foundation, HDP delivers a centralized architecture with true multi-tenancy for data-processing and shared services for Security, Governance and Operations to satisfy enterprise requirements, all deeply integrated and certified with leading datacenter technologies. We are uniquely focused on this transformation of Hadoop and doing our work completely in open source. This is all predicated on our leadership in the community, which enables not only to best support users of but also provides uniquely present customer requirements within this open, thriving community.
  • #7: Hortonworks approach is quite clear… we are focused on delivery of enterprise grade Hadoop as a reliable data platform that will enable your transition to a modern data architecture. To this end, we work solely within the broad open source community with a focus on innovation at the core of Apache Hadoop with YARN as a foundation and then within all the related projects that deliver on the key requirements for the enterprise such as governance, security and operation. Since our incepetion just three years ago, we have grown to more than 450 employees and have partnered closely with the leaders in the datacenter, all of whom share this vision: to enable a modern data architecture with Hadoop in order to allow their customers to address the architectural challenge that they all are facing due to exploding data volumes.
  • #8: Hortonworks Open platform approach enables us to partner and co-exist with other data center technologies. Our deep engineering relationship with data center leaders like Cisco makes it possible for customers to augment their data center with Hadoop technologies for their next generation modern data architecture.
  • #9: Hortonwork’s Hadoop platform had already been enabled deployment Hadoop in any environment from Linux to Windows , Bare metal to Cloud so that Hadoop deployment environment should be business decision rather than a technical one. In continuation of such Hadoop Everywhere vision, Hortonworks recent acquisition of SequenceIQ added a provisioning and auto-scaling toolset which makes it even more easier to deploy Hadoop in private and public Cloud to accelerate the time-to-value for Hadoop deployment.
  • #10: Cloudbreak is developed by SequenceIQ company from beautiful city of Budapest. Hortonworks acquired them in the month of April. Cloudbreak is open source with Apache 2.0 license and uses many other open source technologies as the build blocks including Docker. It is Hadoop cluster deployment and management tool which can deploy any app or use case specific hadoop cluster to public and private cloud environment in matter of minutes. It also provide on-going cluster infrastructure management including policy based auto-scaling of clusters to optimize infrastructure usage.
  • #12: Cloudbreak enables launching Hadoop cluster in 4 easy steps.
  • #13: Create template captures your hadoop cluster infrastructure definition – node size , network setup . Cloudbreak support heterogeneous instances for building the hadoop cluster as all service or service components are not same in terms of their resource requirement.
  • #18: Cloudbreak not only simplify the Hadoop cluster provisioning in Cisco Openstack Cloud but also automatically scale the Hadoop clusters based on SLA or time based policies. SLA is monitored through Hadoop service metrics captured by Ambari. This way Cloudbreak enables you to get an elastic Hadoop clusters very quickly in Cisco Openstack Cloud.
  • #19: Cloudbreak actively monitors Ambari metrics to assess health of every Hadoop service. It allows defining policies based on these metrics for every cluster deployed and enabled for auto-scaling. Based on these metrics and user defined policies , cloudbreak can scale clusters or services by adding nodes or allocating more yarn containers depending of the type of hadoop service.
  • #20: View from 10000 ft high. Only thing it will need is a Docker daemon. All cloud providers are going towards Docker including Cisco Intercloud.
  • #22: Quick question - How many of you have used Docker before. Docker is a container based virtualization framework. It is an open platform for developers and admins to build, ship, and run distributed applications.
  • #23: Consisting of Docker Engine, a portable, lightweight runtime and packaging tool, and Docker Hub, a cloud service for sharing applications and automating workflows, Docker enables apps to be quickly assembled from components and eliminates the friction between development, QA, and production environments. Docker is Lightweight, portable VM but without the overhead of a VM.
  • #25: Unlike traditional virtualization Docker is fast, lightweight and easy to use. Docker allows you to create containers holding all the dependencies for an application. Each container is kept isolated from any other, and nothing gets shared.
  • #28: Steps: Can span us Docker containers remotely on hosts considering: 1. Resource management - aware of the cluster resources (e.g. can schedule it with bin packing - anywhere where 1GB memory is available) or randomly 2. Constraints using labels (label one node and stsrt the container based on labels) 3. Affinity - containers can be co-scheduled (link, vollumes-from, net=container on the same host)
  • #45: Best of Hadoop , Docker and Openstack in a single cloud platform to our joint customers. Description Texas 3 GCP VM types GP2-2Xlarge n1-standard-8 Cores 8 8 Memory 32 GB 30 GB Volume size 2 x 400GB 2 x 400GB Volume type HDD (magnetic) generic (magnetic) Data nodes count 10 10 HDFS size 8 TB 8 TB Yarn memory 240 GB 240 GB HDP blueprint multinode-hdfs-yarn
  • #49: We are expanding our Cloud strategy to meet Enterprise customer demand. Look at the top first. We’ve done a great job of taking our platform for Private Cloud and provisioning Enterprise workloads. We’ve done a great job with UCS, with VBlock, with FlexPod. As a matter of fact, we are the leader in converged infrastructure today, and that market is expanding as customers look to Cisco and our Partners to deliver the Enterprise workloads and the benefits of Private Cloud. They’re also asking for Dev/Ops models. They want to create truly native applications for the Public Cloud. They want to harness the value of Hadoop and Big Data Analytics and Hana. And they want to leverage the collaborative platform present today. We are the leader in Private Cloud infrastructure. Along the left-hand side, our Partners have done some amazing things. 3 Million seats of HCS, the IaaS platforms that they’ve invested in, small, medium, large, local community-based infrastructure platforms. Some Partners have enabled the PaaS platform. Some Partners are hosting MicroSoft applications, like Dimension Data does today…globally around the world. Some Partners have managed to build a Citrix or VMware virtual desktop offer. So what Cisco Cloud Services offer is an engine to generate more services to augment capabilities we’ve invested in, and to do so in a way that only we could do together. You’ll see us leverage the extensions through innovations in the WebEx platform. You’ll see that Meraki is a very powerful model to continue to expand. You’ll hear more about the portfolio of Unified Threat Defense, and comprehensive threat defense that we think only we can bring to the cloud. You’ll see more about analytics, and the Platforms that we have in store. You’ll soon see more about Hana-as-a-Service. And all the capabilities we can bring, will be an acceleration of those offers that we can bring to you. Why not accelerate all of our capabilities together, using our capabilities in a way that no one else has. And btw, we can’t ignore the big Public Clouds. Let’s use the Intercloud FabricT manager when appropriate to just move a workload out to that Public Cloud. I don’t care if its Azure, or Amazon or Google. Only Cisco can do this through some of the innovations that we have. How are we going to do this?
  • #51: Cisco Intercloud Fabric: Solution Overview