SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lessons Learned Running a Container
Cloud on Apache Hadoop YARN
Billie Rinaldi
Software Engineering YARN R&D - Hortonworks
Hadoop, YARN, HDFS, Ambari, Ranger, Atlas, and Apache
are trademarks of the Apache Software Foundation
2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Introduction
Building Blocks for the Container Cloud
Lessons Learned
Q&A
Overview:
3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Disclaimer
• Some features discussed today are still experimental and/or under current
development.
• Security is a work in progress and a security assessment should be performed
before implementing these features.
• Many features are released in Apache Hadoop 3.1.0 in April 2018, but some will
not come out until 3.2.0 or 3.1.1.
4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Introduction
5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
We Build, Test, and Release Open Source Software
• The rapid pace of open source results in
• dozens of releases a year
• tens of thousands of tests per release.
• Many permutations of tests
• Over a dozen supported Linux operating systems.
• Multiple backend databases.
• Nearly thirty open source products in our stack.
• Multiple supported versions of HDP per release.
6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Addressing the Challenges
• How is the industry addressing the these challenges? What are customers asking for?
• How can we reduce overhead to achieve greater density and improve hardware
utilization?
• How can we improve the speed at which tests run?
• How can we reuse packaging and automation?
7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Solution: Container Cloud
• Containers eliminate a bulk of the virtualization overhead, improving density per
node.
• Containers help reduce image variance through composition and simplified
packaging.
• Container startup time is fast, no real boot sequence.
• Containers naturally fit into YARN and its container model.
• Allow us to “use what we ship and ship what we use.”
8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Container Cloud Architecture
Shared Services
Resource
Management
(YARN)
Management
and
Monitoring
(Ambari)
Jenkins
Worker
(Docker)
Testing HDP and HDF releases in container clusters
HDP
(Docker)
Worker
(Docker)
Storage
(HDFS)
Service
Discovery and
REST API
(YARN Services)
Security and
Governance
(Ranger and
Atlas)
SubmitTest
LaunchTest
Worker
(Docker)
HDP
(Docker)
HDP
(Docker)
HDP
(Docker)
9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Two years later …
5.8+ million containers and many lessons learned.
10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
• YARN Docker Support – Enables additional container types to make it easier
to onboard new applications and services on YARN.
• YARN Services Framework – Provides AM implementation and NM
improvements that enable long running services on YARN.
• YARN Service Discovery – Allows services running on YARN to discover one
another.
12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Adding Docker on YARN
• Why Docker?
• Provides a lightweight mechanism for
packaging, distributing, and isolating
processes.
• Currently the most popular containerization
framework.
• Allows YARN developers to focus on
integration instead of container primitives.
• Mostly fits into the YARN Container model.
• Buzzword compliant.
13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Containers
• What is a YARN container?
• Process.
• Local Resources (scripts, jars, security tokens).
• Resource Requirements (CPU, Memory, I/O).
• AM requests containers.
• RM allocates containers on NMs.
• NM runs containers.
• Container Executor encapsulates platform-
specific logic needed to start a YARN container.
https://www.pepperfry.com/tupperware-mini-rectangular-white-container-850ml-1109991.html
14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
New Abstraction: Container Runtimes
• Challenge: Container Executor approach is cluster wide; need more flexibility while still
being able to leverage existing Container Executor features.
• Solution: Runtimes added to LinuxContainerExecutor in YARN-3611 – initially released in
Apache Hadoop 2.8.0, while improvements are ongoing).
DefaultLinuxContainerRuntime DockerLinuxContainerRuntime
Existing Linux process-
based execution.
Using Docker to run and
monitor a container.
15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Distributed Shell and MR on Docker Examples
Environment variables are currently used to set the Container Runtime options.*
*WARNING: might change.
https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaGXMdZCdR6_RUC235TdafDqURxk-KJIptwALUmg5ZmCb3YBW7
> yarn jar $YARN_EXAMPLES_JAR pi 
-Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 
-Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 
1 40000
> yarn jar $DSHELL_JAR 
-shell_env YARN_CONTAINER_RUNTIME_TYPE=docker 
-shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7 
-shell_command "sleep 120” 
-jar $DSHELL_JAR 
-num_containers 1
16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Recent Improvements: Container Lifecycle
• Improvements to stopping and cleaning up of containers (YARN-5366)
• Improvements to handling short lived containers (YARN-5366, YARN-7914)
• Container relaunch improvements to reuse existing container (YARN-7973)
• Data in the container’s root filesystem and workdir can be recovered on
the same node
• Support for sending specific signals to the container’s root process (YARN-
5366)
• Delayed deletion for debugging (YARN-5366)
17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Recent Improvements: Container Security
• ACLs for privileged containers, with the ability to disable privileged
containers system wide (YARN-6623)
• Sudo / group check for running privileged containers (YARN-7221)
• Default untrusted mode for running unmodified images out of the box
(YARN-7516)
• Username to UID/GID mapping to ensure privacy (YARN-4266)
• User supplied bind mounts validated against an admin supplied whitelist
(YARN-5534)
• More restrictive YARN mounts to limit host exposure (YARN-7815)
18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
• YARN Docker Support – Enables additional container types to make it easier
to onboard new applications and services on YARN.
• YARN Services Framework – Provides AM implementation and NM
improvements that enable long running services on YARN.
• YARN Service Discovery – Allows services running on YARN to discover one
another.
19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Goals
• Long Running – Simplify the deployment and management of long running
applications on YARN.
• Easily Bring New Applications – Remove tedious process of bringing new
applications to YARN.
• Easy to Manage Applications – REST API and Command Line tools.
• Declarative Configuration – Provide configuration to the applications,
declare resource needs, specify placement policies.
20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Overview
• Apache Slider – incubating at Apache since 2014, designed to make it easier
to run long-running applications on YARN.
• Simplified and first-class support for services in YARN (YARN-4692) – initiated
in 2016.
• Container orchestrator to provision docker-based or native-process based containers
(YARN-5079), integrates Slider core into YARN.
• REST API for managing services on YARN (YARN-4793).
• Simplified discovery of services via DNS mechanisms (YARN-4757).
• Released in Apache Hadoop 3.1.0!
21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Docker Httpd Example
{
"name": "simple-httpd-service",
"version": "1.0.0",
"lifetime": "3600",
"components": [
{
"name": "httpd",
"number_of_containers": 2,
"launch_command": "/usr/bin/run-httpd",
"artifact": {
"id": "centos/httpd-24-centos7:latest",
"type": "DOCKER"
},
"resource": {
"cpus": 1,
"memory": "1024"
},
...
> yarn app –launch simple-httpd-service 
simple-httpd-service.json
22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Services Docker Httpd Example continued
"readiness_check": {
"type": "HTTP",
"properties": {
"url": "http://${THIS_HOST}:8080"
}
},
"configuration": {
"files": [
{
"type": "TEMPLATE",
"dest_file": "/var/www/html/index.html",
"properties": {
"content": "<html><body>Hello from
${COMPONENT_INSTANCE_NAME}!</body></html>"
}
}
]
}
23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Service assembly
{
"name": "httpd-proxy-service",
"version": "1.0.0",
"components": [
{
"artifact": {
"id": "simple-httpd-service",
"type": "SERVICE"
}
},
{
"name": "httpd-proxy",
"number_of_containers": 1,
"dependencies": [ "httpd" ],
"artifact": {
"id": "centos/httpd-24-centos7",
"type": "DOCKER"
}, ...
> yarn app –save simple-httpd-service 
simple-httpd-service.json
> yarn app –launch httpd-proxy-service 
httpd-proxy-service.json
24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Other features in progress
• Container upgrade
– Localize new resource while container is running (YARN-1503) portions in 2.9.0
– Restart container with new resources using same container allocation (YARN-4726)
– Support in service AM and service REST API (YARN-7512) slated for release 3.2.0
• Placement policy support in YARN services (YARN-7142)
• User supplied Docker client configs in YARN services (YARN-7996)
• Entrypoint support (YARN-7654)
• And many more!
25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Building Blocks for the Container Cloud
• YARN Docker Support – Enables additional container types to make it easier
to onboard new applications and services on YARN.
• YARN Services Framework – Provides AM implementation and NM
improvements that enable long running services on YARN.
• YARN Service Discovery – Allows services running on YARN to discover one
another.
26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
YARN Service Registry
• The YARN Service Registry allows deployed applications to register
themselves to allow discovery by other applications (YARN-913).
• Entries are stored in Zookeeper as the default k/v store, providing HA and
consistency.
• Native Java clients, REST and CLI interfaces exist for access the YARN Service
Registry.
http://api-university.com/blog/let-developers-try-your-apis-without-registration/
27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Simplified Discovery via DNS
Challenge - Native Java clients and REST interfaces are not ideal
discovery mechanisms for existing applications.
Solution - Exposing YARN Service Registry entries via a more
generic and widely used discovery mechanism: DNS.
The YARN Registry DNS server (YARN-4757) meets these needs.
• Watches the YARN Service Registry for new application and container
registration/deregistration.
• Creates the appropriate DNS records for the container
componentInstanceName.serviceName.user.domain
ctr-e138-1518143905142-215498-01-000007.domain
• Supports zone transfers or zone forwarding.
http://www.idownloadblog.com/2016/03/05/how-to-use-custom-dns-settings/
28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Lessons Learned
29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Success!
5.8+ million containers, 1.1+ million testsHuge uptick in adoption
30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Successes continued
• First full HDP release tested and certified end to end on the container cloud!
• All supported operating systems (CentOS 6/7, SLES, Ubuntu 14/16, Debian)
running in containers on CentOS 7.3 hosts!
Density per node improved by 2.5x!
14
35Virtual Machines
Containers
31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
…but not without some pain
“No gains without pains.”
- Benjamin Franklin, Poor Richard's Almanack
32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
IP Management
Challenge: YARN does not manage IP addresses directly.
What we did:
• Allocate a pool of IP addresses to the cluster on the same VLAN.
• Use Docker’s bridge networking with fixed_cidr option.
• Each node in the cluster is allocated 64 IP addresses from the pool.
• Use docker inspect to get the container IP address and add it to the
YARN Service Registry
• YARN Registry DNS Server registers DNS records for easy lookup.
33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Docker Storage Driver and Filesystem
Challenge: Many Docker Storage drivers, lots of limitations.
What is the best option for us?
What we did:
• Extensive testing of create, stop, and delete operation timings.
• Eliminate options that require significant modifications to the Linux
OS to ease adoption in enterprises.
• If possible, use a driver deemed production ready by maintainers.
• Ultimately, we landed on DeviceMapper LVM thinpool with ext4
backing filesystem ... or so we thought.
34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
DeviceMapper kernel oops and performance
Challenge: Heavy writes to the container’s root filesystem
causing kernel panics, uninterruptible processes, and high IO wait.
What we did:
• DeviceMapper is the only viable option due to our workloads, can’t
switch to a different storage driver.
• Install SSDs and configure Docker’s graph storage to use it to
eliminate high IO wait.
• Test various RAID controller firmware, Linux kernel, and backing
filesystems to find a stable combination that doesn’t result in panics
and Docker hangs (most recently testing upgrade to CentOS 7.4 with
4.15 kernel, so we could look at overlay2 now).
35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
User Namespacing
Challenge: YARN provides features for localized resources and log aggregation.
Running containers as the submitting user presents challenges.
What we did:
• Run all containers as the nobody user so that user and application directories
configured by YARN are available to the container.
• Update images so that nobody UID/GID match in image and host.
• Allow for “vanilla containers” that do not bind mount the YARN directories,
allowing the process in the container to run as any user, but logs can no
longer be aggregated (Docker run override disabled).
• User namespacing in Docker is lacking, as it can only remap a single user. As
this restriction is removed, we expect to migrate to this feature. Recent
improvements allow setting UID:GID pair, but further support is needed.
36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Image Management
Challenge: The implicit pull of a large image can lead to task timeouts.
SSD space is a premium making image clean up important.
What we did:
• Run an internal private registry to reduce WAN load.
• Jenkins job that builds and distributes the image to all nodes in the
cluster, avoiding the implicit pull from “docker run”
• Jenkins job to clean up images that are no longer needed. Full
thinpool can also cause kernel panics – aggressively age off images.
• Reuse base images where possible to reduce bandwidth.
• Work being discussed to provide first-class YARN support for image
management (YARN-3854).
37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Summary
38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Summary
• Massive density improvements.
• Greatly improved ease of use.
• Many real world lessons learned.
• Widespread internal adoption.
• Improved self service capabilities.
• Internal use of long running services a reality!
39 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Get Involved
• Still plenty of work to go!
• Improve docker image management
and user handling
• Networking plugins.
• Security/permissions models.
• Bring Your Own Image challenges.
• Follow Along.
• Try out Apache Hadoop 3.1.0 or checkout trunk
and build it! Ansible/vagrant setups available.
40 © Hortonworks Inc. 2011 – 2018. All Rights Reserved
Questions?
billie@hortonworks.com
skumpf@hortonworks.com

More Related Content

What's hot (20)

PPTX
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
DataWorks Summit
 
PPTX
SDLC with Apache NiFi
DataWorks Summit
 
PPTX
SAM—streaming analytics made easy
DataWorks Summit
 
PPTX
Accelerating query processing with materialized views in Apache Hive
DataWorks Summit
 
PDF
Containers and Big Data
DataWorks Summit
 
PPTX
Containers and Big Data
DataWorks Summit
 
PDF
Curing the Kafka Blindness – Streams Messaging Manager
DataWorks Summit
 
PDF
Deep learning 101
DataWorks Summit
 
PPTX
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
PPTX
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
DataWorks Summit
 
PDF
Data in the Cloud Crash Course
DataWorks Summit
 
PPTX
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
PPTX
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
PDF
What's New in Apache Hive 3.0?
DataWorks Summit
 
PDF
Keynote
DataWorks Summit
 
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PPTX
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
PPTX
Navigating Idiosyncrasies of IoT Development
DataWorks Summit
 
PDF
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 
PDF
Present and future of unified, portable and efficient data processing with Ap...
DataWorks Summit
 
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
DataWorks Summit
 
SDLC with Apache NiFi
DataWorks Summit
 
SAM—streaming analytics made easy
DataWorks Summit
 
Accelerating query processing with materialized views in Apache Hive
DataWorks Summit
 
Containers and Big Data
DataWorks Summit
 
Containers and Big Data
DataWorks Summit
 
Curing the Kafka Blindness – Streams Messaging Manager
DataWorks Summit
 
Deep learning 101
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub...
DataWorks Summit
 
Data in the Cloud Crash Course
DataWorks Summit
 
Using LLVM to accelerate processing of data in Apache Arrow
DataWorks Summit
 
Accelerating TensorFlow with RDMA for high-performance deep learning
DataWorks Summit
 
What's New in Apache Hive 3.0?
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Innovation in the Enterprise Rent-A-Car Data Warehouse
DataWorks Summit
 
Navigating Idiosyncrasies of IoT Development
DataWorks Summit
 
Achieving a 360-degree view of manufacturing via open source industrial data ...
DataWorks Summit
 
Present and future of unified, portable and efficient data processing with Ap...
DataWorks Summit
 

Similar to Lessons learned running a container cloud on YARN (20)

PPTX
Running a container cloud on YARN
DataWorks Summit
 
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
PPTX
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
PPTX
Overview of slider project
Steve Loughran
 
PPTX
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
PDF
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit
 
PPTX
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 
PPTX
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
PDF
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
PDF
Apache Hadoop YARN: state of the union
DataWorks Summit
 
PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
PDF
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PPTX
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
PPTX
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
PPTX
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
PPTX
MHUG - YARN
Joseph Niemiec
 
Running a container cloud on YARN
DataWorks Summit
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Yahoo Developer Network
 
YARN Containerized Services: Fading The Lines Between On-Prem And Cloud
DataWorks Summit
 
Overview of slider project
Steve Loughran
 
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
 
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Accumulo Summit 2016: Apache Accumulo on Docker with YARN Native Services
Accumulo Summit
 
April 2016 HUG: The latest of Apache Hadoop YARN and running your docker apps...
Yahoo Developer Network
 
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
Developing YARN Applications - Integrating natively to YARN July 24 2014
Hortonworks
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
MHUG - YARN
Joseph Niemiec
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
July Patch Tuesday
Ivanti
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
Biography of Daniel Podor.pdf
Daniel Podor
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
July Patch Tuesday
Ivanti
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
Biography of Daniel Podor.pdf
Daniel Podor
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 

Lessons learned running a container cloud on YARN

  • 1. 1 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lessons Learned Running a Container Cloud on Apache Hadoop YARN Billie Rinaldi Software Engineering YARN R&D - Hortonworks Hadoop, YARN, HDFS, Ambari, Ranger, Atlas, and Apache are trademarks of the Apache Software Foundation
  • 2. 2 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Introduction Building Blocks for the Container Cloud Lessons Learned Q&A Overview:
  • 3. 3 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Disclaimer • Some features discussed today are still experimental and/or under current development. • Security is a work in progress and a security assessment should be performed before implementing these features. • Many features are released in Apache Hadoop 3.1.0 in April 2018, but some will not come out until 3.2.0 or 3.1.1.
  • 4. 4 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Introduction
  • 5. 5 © Hortonworks Inc. 2011 – 2018. All Rights Reserved We Build, Test, and Release Open Source Software • The rapid pace of open source results in • dozens of releases a year • tens of thousands of tests per release. • Many permutations of tests • Over a dozen supported Linux operating systems. • Multiple backend databases. • Nearly thirty open source products in our stack. • Multiple supported versions of HDP per release.
  • 6. 6 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Addressing the Challenges • How is the industry addressing the these challenges? What are customers asking for? • How can we reduce overhead to achieve greater density and improve hardware utilization? • How can we improve the speed at which tests run? • How can we reuse packaging and automation?
  • 7. 7 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Solution: Container Cloud • Containers eliminate a bulk of the virtualization overhead, improving density per node. • Containers help reduce image variance through composition and simplified packaging. • Container startup time is fast, no real boot sequence. • Containers naturally fit into YARN and its container model. • Allow us to “use what we ship and ship what we use.”
  • 8. 8 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Container Cloud Architecture Shared Services Resource Management (YARN) Management and Monitoring (Ambari) Jenkins Worker (Docker) Testing HDP and HDF releases in container clusters HDP (Docker) Worker (Docker) Storage (HDFS) Service Discovery and REST API (YARN Services) Security and Governance (Ranger and Atlas) SubmitTest LaunchTest Worker (Docker) HDP (Docker) HDP (Docker) HDP (Docker)
  • 9. 9 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Two years later … 5.8+ million containers and many lessons learned.
  • 10. 10 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud
  • 11. 11 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  • 12. 12 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Adding Docker on YARN • Why Docker? • Provides a lightweight mechanism for packaging, distributing, and isolating processes. • Currently the most popular containerization framework. • Allows YARN developers to focus on integration instead of container primitives. • Mostly fits into the YARN Container model. • Buzzword compliant.
  • 13. 13 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Containers • What is a YARN container? • Process. • Local Resources (scripts, jars, security tokens). • Resource Requirements (CPU, Memory, I/O). • AM requests containers. • RM allocates containers on NMs. • NM runs containers. • Container Executor encapsulates platform- specific logic needed to start a YARN container. https://www.pepperfry.com/tupperware-mini-rectangular-white-container-850ml-1109991.html
  • 14. 14 © Hortonworks Inc. 2011 – 2018. All Rights Reserved New Abstraction: Container Runtimes • Challenge: Container Executor approach is cluster wide; need more flexibility while still being able to leverage existing Container Executor features. • Solution: Runtimes added to LinuxContainerExecutor in YARN-3611 – initially released in Apache Hadoop 2.8.0, while improvements are ongoing). DefaultLinuxContainerRuntime DockerLinuxContainerRuntime Existing Linux process- based execution. Using Docker to run and monitor a container.
  • 15. 15 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Distributed Shell and MR on Docker Examples Environment variables are currently used to set the Container Runtime options.* *WARNING: might change. https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTaGXMdZCdR6_RUC235TdafDqURxk-KJIptwALUmg5ZmCb3YBW7 > yarn jar $YARN_EXAMPLES_JAR pi -Dmapreduce.map.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" -Dmapreduce.reduce.env="YARN_CONTAINER_RUNTIME_TYPE=docker,YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7" 1 40000 > yarn jar $DSHELL_JAR -shell_env YARN_CONTAINER_RUNTIME_TYPE=docker -shell_env YARN_CONTAINER_RUNTIME_DOCKER_IMAGE=centos:7 -shell_command "sleep 120” -jar $DSHELL_JAR -num_containers 1
  • 16. 16 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Lifecycle • Improvements to stopping and cleaning up of containers (YARN-5366) • Improvements to handling short lived containers (YARN-5366, YARN-7914) • Container relaunch improvements to reuse existing container (YARN-7973) • Data in the container’s root filesystem and workdir can be recovered on the same node • Support for sending specific signals to the container’s root process (YARN- 5366) • Delayed deletion for debugging (YARN-5366)
  • 17. 17 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Recent Improvements: Container Security • ACLs for privileged containers, with the ability to disable privileged containers system wide (YARN-6623) • Sudo / group check for running privileged containers (YARN-7221) • Default untrusted mode for running unmodified images out of the box (YARN-7516) • Username to UID/GID mapping to ensure privacy (YARN-4266) • User supplied bind mounts validated against an admin supplied whitelist (YARN-5534) • More restrictive YARN mounts to limit host exposure (YARN-7815)
  • 18. 18 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  • 19. 19 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Goals • Long Running – Simplify the deployment and management of long running applications on YARN. • Easily Bring New Applications – Remove tedious process of bringing new applications to YARN. • Easy to Manage Applications – REST API and Command Line tools. • Declarative Configuration – Provide configuration to the applications, declare resource needs, specify placement policies.
  • 20. 20 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Overview • Apache Slider – incubating at Apache since 2014, designed to make it easier to run long-running applications on YARN. • Simplified and first-class support for services in YARN (YARN-4692) – initiated in 2016. • Container orchestrator to provision docker-based or native-process based containers (YARN-5079), integrates Slider core into YARN. • REST API for managing services on YARN (YARN-4793). • Simplified discovery of services via DNS mechanisms (YARN-4757). • Released in Apache Hadoop 3.1.0!
  • 21. 21 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example { "name": "simple-httpd-service", "version": "1.0.0", "lifetime": "3600", "components": [ { "name": "httpd", "number_of_containers": 2, "launch_command": "/usr/bin/run-httpd", "artifact": { "id": "centos/httpd-24-centos7:latest", "type": "DOCKER" }, "resource": { "cpus": 1, "memory": "1024" }, ... > yarn app –launch simple-httpd-service simple-httpd-service.json
  • 22. 22 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Services Docker Httpd Example continued "readiness_check": { "type": "HTTP", "properties": { "url": "http://${THIS_HOST}:8080" } }, "configuration": { "files": [ { "type": "TEMPLATE", "dest_file": "/var/www/html/index.html", "properties": { "content": "<html><body>Hello from ${COMPONENT_INSTANCE_NAME}!</body></html>" } } ] }
  • 23. 23 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Service assembly { "name": "httpd-proxy-service", "version": "1.0.0", "components": [ { "artifact": { "id": "simple-httpd-service", "type": "SERVICE" } }, { "name": "httpd-proxy", "number_of_containers": 1, "dependencies": [ "httpd" ], "artifact": { "id": "centos/httpd-24-centos7", "type": "DOCKER" }, ... > yarn app –save simple-httpd-service simple-httpd-service.json > yarn app –launch httpd-proxy-service httpd-proxy-service.json
  • 24. 24 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Other features in progress • Container upgrade – Localize new resource while container is running (YARN-1503) portions in 2.9.0 – Restart container with new resources using same container allocation (YARN-4726) – Support in service AM and service REST API (YARN-7512) slated for release 3.2.0 • Placement policy support in YARN services (YARN-7142) • User supplied Docker client configs in YARN services (YARN-7996) • Entrypoint support (YARN-7654) • And many more!
  • 25. 25 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Building Blocks for the Container Cloud • YARN Docker Support – Enables additional container types to make it easier to onboard new applications and services on YARN. • YARN Services Framework – Provides AM implementation and NM improvements that enable long running services on YARN. • YARN Service Discovery – Allows services running on YARN to discover one another.
  • 26. 26 © Hortonworks Inc. 2011 – 2018. All Rights Reserved YARN Service Registry • The YARN Service Registry allows deployed applications to register themselves to allow discovery by other applications (YARN-913). • Entries are stored in Zookeeper as the default k/v store, providing HA and consistency. • Native Java clients, REST and CLI interfaces exist for access the YARN Service Registry. http://api-university.com/blog/let-developers-try-your-apis-without-registration/
  • 27. 27 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Simplified Discovery via DNS Challenge - Native Java clients and REST interfaces are not ideal discovery mechanisms for existing applications. Solution - Exposing YARN Service Registry entries via a more generic and widely used discovery mechanism: DNS. The YARN Registry DNS server (YARN-4757) meets these needs. • Watches the YARN Service Registry for new application and container registration/deregistration. • Creates the appropriate DNS records for the container componentInstanceName.serviceName.user.domain ctr-e138-1518143905142-215498-01-000007.domain • Supports zone transfers or zone forwarding. http://www.idownloadblog.com/2016/03/05/how-to-use-custom-dns-settings/
  • 28. 28 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Lessons Learned
  • 29. 29 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Success! 5.8+ million containers, 1.1+ million testsHuge uptick in adoption
  • 30. 30 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Successes continued • First full HDP release tested and certified end to end on the container cloud! • All supported operating systems (CentOS 6/7, SLES, Ubuntu 14/16, Debian) running in containers on CentOS 7.3 hosts! Density per node improved by 2.5x! 14 35Virtual Machines Containers
  • 31. 31 © Hortonworks Inc. 2011 – 2018. All Rights Reserved …but not without some pain “No gains without pains.” - Benjamin Franklin, Poor Richard's Almanack
  • 32. 32 © Hortonworks Inc. 2011 – 2018. All Rights Reserved IP Management Challenge: YARN does not manage IP addresses directly. What we did: • Allocate a pool of IP addresses to the cluster on the same VLAN. • Use Docker’s bridge networking with fixed_cidr option. • Each node in the cluster is allocated 64 IP addresses from the pool. • Use docker inspect to get the container IP address and add it to the YARN Service Registry • YARN Registry DNS Server registers DNS records for easy lookup.
  • 33. 33 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Docker Storage Driver and Filesystem Challenge: Many Docker Storage drivers, lots of limitations. What is the best option for us? What we did: • Extensive testing of create, stop, and delete operation timings. • Eliminate options that require significant modifications to the Linux OS to ease adoption in enterprises. • If possible, use a driver deemed production ready by maintainers. • Ultimately, we landed on DeviceMapper LVM thinpool with ext4 backing filesystem ... or so we thought.
  • 34. 34 © Hortonworks Inc. 2011 – 2018. All Rights Reserved DeviceMapper kernel oops and performance Challenge: Heavy writes to the container’s root filesystem causing kernel panics, uninterruptible processes, and high IO wait. What we did: • DeviceMapper is the only viable option due to our workloads, can’t switch to a different storage driver. • Install SSDs and configure Docker’s graph storage to use it to eliminate high IO wait. • Test various RAID controller firmware, Linux kernel, and backing filesystems to find a stable combination that doesn’t result in panics and Docker hangs (most recently testing upgrade to CentOS 7.4 with 4.15 kernel, so we could look at overlay2 now).
  • 35. 35 © Hortonworks Inc. 2011 – 2018. All Rights Reserved User Namespacing Challenge: YARN provides features for localized resources and log aggregation. Running containers as the submitting user presents challenges. What we did: • Run all containers as the nobody user so that user and application directories configured by YARN are available to the container. • Update images so that nobody UID/GID match in image and host. • Allow for “vanilla containers” that do not bind mount the YARN directories, allowing the process in the container to run as any user, but logs can no longer be aggregated (Docker run override disabled). • User namespacing in Docker is lacking, as it can only remap a single user. As this restriction is removed, we expect to migrate to this feature. Recent improvements allow setting UID:GID pair, but further support is needed.
  • 36. 36 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Image Management Challenge: The implicit pull of a large image can lead to task timeouts. SSD space is a premium making image clean up important. What we did: • Run an internal private registry to reduce WAN load. • Jenkins job that builds and distributes the image to all nodes in the cluster, avoiding the implicit pull from “docker run” • Jenkins job to clean up images that are no longer needed. Full thinpool can also cause kernel panics – aggressively age off images. • Reuse base images where possible to reduce bandwidth. • Work being discussed to provide first-class YARN support for image management (YARN-3854).
  • 37. 37 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Summary
  • 38. 38 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Summary • Massive density improvements. • Greatly improved ease of use. • Many real world lessons learned. • Widespread internal adoption. • Improved self service capabilities. • Internal use of long running services a reality!
  • 39. 39 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Get Involved • Still plenty of work to go! • Improve docker image management and user handling • Networking plugins. • Security/permissions models. • Bring Your Own Image challenges. • Follow Along. • Try out Apache Hadoop 3.1.0 or checkout trunk and build it! Ansible/vagrant setups available.
  • 40. 40 © Hortonworks Inc. 2011 – 2018. All Rights Reserved Questions? [email protected] [email protected]