SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop YARN:
State of the Union
Sanjay Radia
Founder, Chief Architect Hortonworks
Apache Hadoop PMC
2 © Hortonworks Inc. 2011–2018. All rights reserved
About the Speaker
• Sanjay Radia
• Chief Architect, Founder, Hortonworks
• Apache Hadoop PMC and Committer
• Part of the original Hadoop team at Yahoo! since 2007
• Chief Architect of Hadoop Core at Yahoo!
• Prior
• Data center automation, virtualization, Java, HA, OSs, File Systems
• Startup, Sun Microsystems, INRIA…
• Ph.D., University of Waterloo
Page 2Architecting the Future of Big Data
3 © Hortonworks Inc. 2011–2018. All rights reserved
• Introduction
• Past
• State of Union for YARN
Agenda
4 © Hortonworks Inc. 2011–2018. All rights reserved
A Brief Timeline from Past Year: GA Releases
2.8.0 2.9.0 3.0.0 3.1.0
• GPU/FPGA
• YARN Native
Service
• Placement
Constraints
• YARN Federation
• Opportunistic
Container
(Backported from 3.0)
• New YARN UI
• Timeline V2
• Global Scheduling
• Multiple Resource types
• New YARN UI
• Timeline service V2
Ever involving requirements (computation intensive, larger, services)
• Application Priority
• Reservations
• Node labels
improvements
22 March ’17 17 Nov ’17 13 Dec ‘17 02 Aug ‘182.8.4 2.9.1 3.0.3 3.1.1
8 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop 3.0/3.1
9 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
10 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
11 © Hortonworks Inc. 2011–2018. All rights reserved
• Many customers with clusters made up of large number of nodes
• Oath (Yahoo!), Twitter, LinkedIn, Microsoft, Alibaba etc.
• Previous largest clusters: 6K-8K
• Now: 50K nodes in a single cluster of Microsoft [1]
• Roadmap: To 100K and beyond
[1] https://azure.microsoft.com/en-us/blog/how-microsoft-drives-exabyte-analytics-on-the-world-s-largest-yarn-cluster/
Looking at the Scale!
12 © Hortonworks Inc. 2011–2018. All rights reserved
• Enables applications to scale to 100k of thousands of nodes
• Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters
• Federation negotiates with sub-clusters RM’s and provide resources to the application
• Applications can schedule tasks on any node
YARN Federation
13 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
14 © Hortonworks Inc. 2011–2018. All rights reserved
• YARN-5139
• Problems
• Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions.
• Performance and also hard to do global placement policies
• Several coarse grained locks
• With this, we improved to
• Look at several nodes at a time
• Fine grained locks
• Multiple allocator threads
• YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour!
• 10X throughput gains
• Much better placement decisions
Moving Towards Global & Fast Scheduling
16 © Hortonworks Inc. 2011–2018. All rights reserved
Better Placement Strategies (YARN-6592)
• Past
• Supported constraints in form of Node Locality
• Now YARN can support a lot more use cases
• Co-locate the allocations of a job on the same rack (affinity)
• Spread allocations across machines (anti-affinity) to minimize resource interference
• Allow up to a specific number of allocations in a node group (cardinality)
17 © Hortonworks Inc. 2011–2018. All rights reserved
Better Placement Strategies (YARN-6592)
Affinity Anti-affinity
Region
Server
Region
Server
18 © Hortonworks Inc. 2011–2018. All rights reserved
Absolute Resources Configuration in CS – YARN-5881
• The cloud model! “Give me X resources, not X%”
• Gives ability to configure Queue resources as below
<memory=24GB, vcores=20, yarn.io/gpu=2>
• Enables admins to assign different quotas of different resource-types
• No more “Single percentage value limitation for all resource-types”
root
(memory=60G, vcores=40,
gpu=10)
sales
(memory=40G, vcores=20,
gpu=6)
marketing
(memory=10G, vcores=10,
gpu=4)
engineering
(memory=10G, vcores=10,
gpu=0)
20 © Hortonworks Inc. 2011–2018. All rights reserved
Useful Scheduling Features from Recent 2.X Releases
• Application Priorities
• Queue Priorities
• Previously : Resources to least satisfied queue
• Queues can have priorities e.g. (production over dev)
21 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
22 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: UI (1)
23 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: UI (2)
24 © Hortonworks Inc. 2011–2018. All rights reserved
Timeline Service 2.0
• Application History
• Where did my container run
• Why is my app slow
• Why is failing
• What happened with my app
• Cluster History
• Which user had most utilization
• Largest App
• What happened in my cluster
• …
• Scalable & Robust Service
• Using HBase as backend for better scale
• More robust storage fault tolerance
• Migration and compatibility with v.1.5
25 © Hortonworks Inc. 2011–2018. All rights reserved
Usability: Queue & Logs
API based queue management
Decentralized
(YARN-5734)
Improved logs
management
(YARN-4904)
Live application logs
26 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
27 © Hortonworks Inc. 2011–2018. All rights reserved
• Better Packaging model
• Lightweight mechanism for packaging and resource isolation
• Popularized and made accessible by Docker
• Native integration ++ in YARN
• Support for “Container Runtimes” in LCE: YARN-3611
• Native process runtime
• Docker runtime
• Many security/usability improves added to 3.0.x / 3.1.x
Containers
28 © Hortonworks Inc. 2011–2018. All rights reserved
• Run both with and without docker on the same cluster
• Choose at run-time!
Containers
29 © Hortonworks Inc. 2011–2018. All rights reserved
• Apache Spark applications have a complex set of required software dependencies
• Docker on YARN helps to solve Spark package isolation issues with
• PySpark - Python versions, packages
• R - packages
Spark on Docker in YARN
31 © Hortonworks Inc. 2011–2018. All rights reserved
Hadoop Apps
YARN on YARN… Towards a Private Cloud …YCloud
YARN
MR Tez Spark
Tensor
Flow
YARN
MR Tez Spark
Spark
32 © Hortonworks Inc. 2011–2018. All rights reserved
YCloud: YARN Based Container Cloud We Use for Testing
Ăź Testing Hadoop on Hadoop!
33 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
34 © Hortonworks Inc. 2011–2018. All rights reserved
• YARN supported only Memory
and CPU
• Now
• A generalized vector for all resources
• Admin could add arbitrary resource types!
Resource Profiles and Custom Resource Types
• Ease of resource requesting model
using profiles for apps
Profile Memory CPU GPU
Small 2 GB 4 Cores 0 Cores
Medium 4 GB 8 Cores 0 Cores
Large 16 GB 16 Cores 4 CoresMemory
CPU
GPU
FPGA
Node Manager
35 © Hortonworks Inc. 2011–2018. All rights reserved
• Why?
• No need to setup separate clusters
• Leverage shared compute!
• Why need isolation?
• Multiple processes use the single GPU will be:
• Serialized.
• Cause OOM easily.
• GPU isolation on YARN:
• Granularity is for per-GPU device.
• Use cgroups / docker to enforce isolation.
GPU Support on YARN
Tensorflow 1.2
Nginx AppUbuntu 14:04
Nginx AppHost OS
GPU Base Lib v1
Volume Mount
CUDA Library 5.0
36 © Hortonworks Inc. 2011–2018. All rights reserved
• FPGA isolation on YARN: .
• Granularity is for per-FPGA device.
• Use Cgroups to enforce the isolation.
• Currently, only Intel OpenCL SDK for FPGA is supported.
• But implementation is extensible to other FPGA SDK.
FPGA on YARN
37 © Hortonworks Inc. 2011–2018. All rights reserved
Key Themes
Scale
Platform Themes Workload Themes
Scheduling Usability Containers Resources Services
38 © Hortonworks Inc. 2011–2018. All rights reserved
Services Support in YARN
• A native YARN services framework (YARN-4692)
• Native YARN framework layer for services and beyond
• Apache Slider retired from Incubator – lessons and key code carried over to YARN
• Simplified discovery of services via DNS mechanisms: YARN-4757
• regionserver-0.hbase-app-3.hadoop.yarn.site
• Application & Services upgrades
• “Do an upgrade of my HBase app with minimal impact to end-users”
• YARN-4726
39 © Hortonworks Inc. 2011–2018. All rights reserved
• Applications need simple APIs
• Need to be deployable “easily”
• Simple REST API layer fronting YARN
• YARN-4793 Simplified API layer
• Spawn services & Manage them
Simplified APIs for Service Definitions
41 © Hortonworks Inc. 2011–2018. All rights reserved
LLAP on YARN
• Apache Hive LLAP is a key long running application
• Used for query processing
• Designed to run on a shared multi-tenant YARN cluster
42 © Hortonworks Inc. 2011–2018. All rights reserved
Application Timeout – YARN-3813
• Controlling running time of workloads in YARN
• Define lifetime for an application anytime for YARN to manage.
• “Give me resources for this app/service but kill it after 15 days”
Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
43 © Hortonworks Inc. 2011–2018. All rights reserved
Apache Hadoop 3.2 and Beyond
44 © Hortonworks Inc. 2011–2018. All rights reserved
• Every user says “Give me 16GB for my task”, even though it’s only needed at peak
• Each node has some allocated but unutilized capacity. Use such capacity to run opportunistic tasks
• Preempt such tasks when needed
Container Overcommit (YARN-1011)
Un-allocated
Allocated
Utilized
Allocated-
but-unutilized
Node1
Guaranteed
Containers
Node1
Opportunistic
Containers
Preemption
When necessary
45 © Hortonworks Inc. 2011–2018. All rights reserved
• “Take me to a node with JDK 10”
• Node Partition vs. Node Attribute
• Partition:
• One partition for one node
• ACL
• Shares between queues
• Preemption enforced.
• Attribute:
• For container placement
• No ACL/Shares on attributes
• First-come-first-serve
Node Attributes (YARN-3409) Partition - 1 (15 nodes)
Queue-A (40%)
Queue-B (60%)
Partition - 2 (15 nodes)
A (25%)
B (25%)
C (25%)
D (25%)
Node 1
os.type=ubuntu
os.version=14.10
glibc.version=2.20
JDK.version=8u20
Node 2
os.type=RHEL
os.version=5.1
GPU.type=x86_64
JDK.version=7u20
Node 16
os.type=windows
os.version=7
JDK.version=8u20
Node 17
os.type=SUSE
os.version=12
GPU.type=i686
JDK.version=7u20
48 © Hortonworks Inc. 2011–2018. All rights reserved
Questions?

More Related Content

PDF
Containers and Big Data
DataWorks Summit
 
PDF
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
PDF
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
PDF
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
PDF
What s new in spark 2.3 and spark 2.4
DataWorks Summit
 
PDF
Ozone and HDFS’s evolution
DataWorks Summit
 
PDF
Ozone and HDFS's Evolution
DataWorks Summit
 
PDF
Keynote
DataWorks Summit
 
Containers and Big Data
DataWorks Summit
 
Apache Hadoop YARN: state of the union - Tokyo
DataWorks Summit
 
Meet HBase 2.0 and Phoenix-5.0
DataWorks Summit
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
What s new in spark 2.3 and spark 2.4
DataWorks Summit
 
Ozone and HDFS’s evolution
DataWorks Summit
 
Ozone and HDFS's Evolution
DataWorks Summit
 
Keynote
DataWorks Summit
 

What's hot (20)

PDF
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
PDF
Containers and Big Data
DataWorks Summit
 
PDF
What is new in Apache Hive 3.0?
DataWorks Summit
 
PDF
Deep learning 101
DataWorks Summit
 
PDF
What's New in Apache Hive 3.0?
DataWorks Summit
 
PDF
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
PDF
Data Centric Transformation in Telecom
DataWorks Summit
 
PDF
Data in the Cloud Crash Course
DataWorks Summit
 
PPTX
How to Ingest 16 Billion Records Per Day into your Hadoop Environment
DataWorks Summit
 
PPTX
Navigating Idiosyncrasies of IoT Development
DataWorks Summit
 
PDF
Fast SQL on Hadoop, really?
DataWorks Summit
 
PDF
Curing the Kafka Blindness – Streams Messaging Manager
DataWorks Summit
 
PPTX
Apache Hadoop YARN: state of the union
DataWorks Summit
 
PDF
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
PDF
What is New in Apache Hive 3.0?
DataWorks Summit
 
PDF
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
PDF
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
PDF
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
 
PDF
Solving Cybersecurity at Scale
DataWorks Summit
 
PPTX
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
DataWorks Summit
 
Apache Hadoop YARN: State of the Union
DataWorks Summit
 
Containers and Big Data
DataWorks Summit
 
What is new in Apache Hive 3.0?
DataWorks Summit
 
Deep learning 101
DataWorks Summit
 
What's New in Apache Hive 3.0?
DataWorks Summit
 
Hadoop Operations - Past, Present, and Future
DataWorks Summit
 
Data Centric Transformation in Telecom
DataWorks Summit
 
Data in the Cloud Crash Course
DataWorks Summit
 
How to Ingest 16 Billion Records Per Day into your Hadoop Environment
DataWorks Summit
 
Navigating Idiosyncrasies of IoT Development
DataWorks Summit
 
Fast SQL on Hadoop, really?
DataWorks Summit
 
Curing the Kafka Blindness – Streams Messaging Manager
DataWorks Summit
 
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Using Spark Streaming and NiFi for the Next Generation of ETL in the Enterprise
DataWorks Summit
 
What is New in Apache Hive 3.0?
DataWorks Summit
 
Hortonworks Technical Workshop - Operational Best Practices Workshop
Hortonworks
 
Hp Converged Systems and Hortonworks - Webinar Slides
Hortonworks
 
What’s new in Apache Spark 2.3 and Spark 2.4
DataWorks Summit
 
Solving Cybersecurity at Scale
DataWorks Summit
 
Open source computer vision with TensorFlow, Apache MiniFi, Apache NiFi, Open...
DataWorks Summit
 
Ad

Similar to Apache Hadoop YARN: state of the union (20)

PPTX
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
PPTX
Apache Hadoop YARN: state of the union
DataWorks Summit
 
PPTX
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
PPTX
YARN - Past, Present, & Future
DataWorks Summit
 
PPTX
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
PPTX
MHUG - YARN
Joseph Niemiec
 
PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
PPTX
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
PPTX
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
PPTX
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
PPTX
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
PPTX
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Wangda Tan
 
PDF
Yarn 3.1 and Beyond in ApacheCon North America 2018
Naganarasimha Garla
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
YARN - Past, Present, & Future
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
MHUG - YARN
Joseph Niemiec
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
Scheduling Policies in YARN
DataWorks Summit/Hadoop Summit
 
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Hadoop Summit - Scheduling policies in YARN - San Jose 2016
Wangda Tan
 
Yarn 3.1 and Beyond in ApacheCon North America 2018
Naganarasimha Garla
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
Doc9.....................................
SofiaCollazos
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 

Apache Hadoop YARN: state of the union

  • 1. 1 Š Hortonworks Inc. 2011–2018. All rights reserved Apache Hadoop YARN: State of the Union Sanjay Radia Founder, Chief Architect Hortonworks Apache Hadoop PMC
  • 2. 2 Š Hortonworks Inc. 2011–2018. All rights reserved About the Speaker • Sanjay Radia • Chief Architect, Founder, Hortonworks • Apache Hadoop PMC and Committer • Part of the original Hadoop team at Yahoo! since 2007 • Chief Architect of Hadoop Core at Yahoo! • Prior • Data center automation, virtualization, Java, HA, OSs, File Systems • Startup, Sun Microsystems, INRIA… • Ph.D., University of Waterloo Page 2Architecting the Future of Big Data
  • 3. 3 Š Hortonworks Inc. 2011–2018. All rights reserved • Introduction • Past • State of Union for YARN Agenda
  • 4. 4 Š Hortonworks Inc. 2011–2018. All rights reserved A Brief Timeline from Past Year: GA Releases 2.8.0 2.9.0 3.0.0 3.1.0 • GPU/FPGA • YARN Native Service • Placement Constraints • YARN Federation • Opportunistic Container (Backported from 3.0) • New YARN UI • Timeline V2 • Global Scheduling • Multiple Resource types • New YARN UI • Timeline service V2 Ever involving requirements (computation intensive, larger, services) • Application Priority • Reservations • Node labels improvements 22 March ’17 17 Nov ’17 13 Dec ‘17 02 Aug ‘182.8.4 2.9.1 3.0.3 3.1.1
  • 5. 8 Š Hortonworks Inc. 2011–2018. All rights reserved Apache Hadoop 3.0/3.1
  • 6. 9 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 7. 10 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 8. 11 Š Hortonworks Inc. 2011–2018. All rights reserved • Many customers with clusters made up of large number of nodes • Oath (Yahoo!), Twitter, LinkedIn, Microsoft, Alibaba etc. • Previous largest clusters: 6K-8K • Now: 50K nodes in a single cluster of Microsoft [1] • Roadmap: To 100K and beyond [1] https://azure.microsoft.com/en-us/blog/how-microsoft-drives-exabyte-analytics-on-the-world-s-largest-yarn-cluster/ Looking at the Scale!
  • 9. 12 Š Hortonworks Inc. 2011–2018. All rights reserved • Enables applications to scale to 100k of thousands of nodes • Federation divides a large (10-100k nodes) cluster into smaller units called sub-clusters • Federation negotiates with sub-clusters RM’s and provide resources to the application • Applications can schedule tasks on any node YARN Federation
  • 10. 13 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 11. 14 Š Hortonworks Inc. 2011–2018. All rights reserved • YARN-5139 • Problems • Current design of one-node-at-a-time allocation cycle can lead to suboptimal decisions. • Performance and also hard to do global placement policies • Several coarse grained locks • With this, we improved to • Look at several nodes at a time • Fine grained locks • Multiple allocator threads • YARN scheduler can allocate 3k+ containers per second ≈ 10 mil allocations / hour! • 10X throughput gains • Much better placement decisions Moving Towards Global & Fast Scheduling
  • 12. 16 Š Hortonworks Inc. 2011–2018. All rights reserved Better Placement Strategies (YARN-6592) • Past • Supported constraints in form of Node Locality • Now YARN can support a lot more use cases • Co-locate the allocations of a job on the same rack (affinity) • Spread allocations across machines (anti-affinity) to minimize resource interference • Allow up to a specific number of allocations in a node group (cardinality)
  • 13. 17 Š Hortonworks Inc. 2011–2018. All rights reserved Better Placement Strategies (YARN-6592) Affinity Anti-affinity Region Server Region Server
  • 14. 18 Š Hortonworks Inc. 2011–2018. All rights reserved Absolute Resources Configuration in CS – YARN-5881 • The cloud model! “Give me X resources, not X%” • Gives ability to configure Queue resources as below <memory=24GB, vcores=20, yarn.io/gpu=2> • Enables admins to assign different quotas of different resource-types • No more “Single percentage value limitation for all resource-types” root (memory=60G, vcores=40, gpu=10) sales (memory=40G, vcores=20, gpu=6) marketing (memory=10G, vcores=10, gpu=4) engineering (memory=10G, vcores=10, gpu=0)
  • 15. 20 Š Hortonworks Inc. 2011–2018. All rights reserved Useful Scheduling Features from Recent 2.X Releases • Application Priorities • Queue Priorities • Previously : Resources to least satisfied queue • Queues can have priorities e.g. (production over dev)
  • 16. 21 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 17. 22 Š Hortonworks Inc. 2011–2018. All rights reserved Usability: UI (1)
  • 18. 23 Š Hortonworks Inc. 2011–2018. All rights reserved Usability: UI (2)
  • 19. 24 Š Hortonworks Inc. 2011–2018. All rights reserved Timeline Service 2.0 • Application History • Where did my container run • Why is my app slow • Why is failing • What happened with my app • Cluster History • Which user had most utilization • Largest App • What happened in my cluster • … • Scalable & Robust Service • Using HBase as backend for better scale • More robust storage fault tolerance • Migration and compatibility with v.1.5
  • 20. 25 Š Hortonworks Inc. 2011–2018. All rights reserved Usability: Queue & Logs API based queue management Decentralized (YARN-5734) Improved logs management (YARN-4904) Live application logs
  • 21. 26 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 22. 27 Š Hortonworks Inc. 2011–2018. All rights reserved • Better Packaging model • Lightweight mechanism for packaging and resource isolation • Popularized and made accessible by Docker • Native integration ++ in YARN • Support for “Container Runtimes” in LCE: YARN-3611 • Native process runtime • Docker runtime • Many security/usability improves added to 3.0.x / 3.1.x Containers
  • 23. 28 Š Hortonworks Inc. 2011–2018. All rights reserved • Run both with and without docker on the same cluster • Choose at run-time! Containers
  • 24. 29 Š Hortonworks Inc. 2011–2018. All rights reserved • Apache Spark applications have a complex set of required software dependencies • Docker on YARN helps to solve Spark package isolation issues with • PySpark - Python versions, packages • R - packages Spark on Docker in YARN
  • 25. 31 Š Hortonworks Inc. 2011–2018. All rights reserved Hadoop Apps YARN on YARN… Towards a Private Cloud …YCloud YARN MR Tez Spark Tensor Flow YARN MR Tez Spark Spark
  • 26. 32 Š Hortonworks Inc. 2011–2018. All rights reserved YCloud: YARN Based Container Cloud We Use for Testing Ăź Testing Hadoop on Hadoop!
  • 27. 33 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 28. 34 Š Hortonworks Inc. 2011–2018. All rights reserved • YARN supported only Memory and CPU • Now • A generalized vector for all resources • Admin could add arbitrary resource types! Resource Profiles and Custom Resource Types • Ease of resource requesting model using profiles for apps Profile Memory CPU GPU Small 2 GB 4 Cores 0 Cores Medium 4 GB 8 Cores 0 Cores Large 16 GB 16 Cores 4 CoresMemory CPU GPU FPGA Node Manager
  • 29. 35 Š Hortonworks Inc. 2011–2018. All rights reserved • Why? • No need to setup separate clusters • Leverage shared compute! • Why need isolation? • Multiple processes use the single GPU will be: • Serialized. • Cause OOM easily. • GPU isolation on YARN: • Granularity is for per-GPU device. • Use cgroups / docker to enforce isolation. GPU Support on YARN Tensorflow 1.2 Nginx AppUbuntu 14:04 Nginx AppHost OS GPU Base Lib v1 Volume Mount CUDA Library 5.0
  • 30. 36 Š Hortonworks Inc. 2011–2018. All rights reserved • FPGA isolation on YARN: . • Granularity is for per-FPGA device. • Use Cgroups to enforce the isolation. • Currently, only Intel OpenCL SDK for FPGA is supported. • But implementation is extensible to other FPGA SDK. FPGA on YARN
  • 31. 37 Š Hortonworks Inc. 2011–2018. All rights reserved Key Themes Scale Platform Themes Workload Themes Scheduling Usability Containers Resources Services
  • 32. 38 Š Hortonworks Inc. 2011–2018. All rights reserved Services Support in YARN • A native YARN services framework (YARN-4692) • Native YARN framework layer for services and beyond • Apache Slider retired from Incubator – lessons and key code carried over to YARN • Simplified discovery of services via DNS mechanisms: YARN-4757 • regionserver-0.hbase-app-3.hadoop.yarn.site • Application & Services upgrades • “Do an upgrade of my HBase app with minimal impact to end-users” • YARN-4726
  • 33. 39 Š Hortonworks Inc. 2011–2018. All rights reserved • Applications need simple APIs • Need to be deployable “easily” • Simple REST API layer fronting YARN • YARN-4793 Simplified API layer • Spawn services & Manage them Simplified APIs for Service Definitions
  • 34. 41 Š Hortonworks Inc. 2011–2018. All rights reserved LLAP on YARN • Apache Hive LLAP is a key long running application • Used for query processing • Designed to run on a shared multi-tenant YARN cluster
  • 35. 42 Š Hortonworks Inc. 2011–2018. All rights reserved Application Timeout – YARN-3813 • Controlling running time of workloads in YARN • Define lifetime for an application anytime for YARN to manage. • “Give me resources for this app/service but kill it after 15 days” Reservation-based Scheduling: If You’re Late Don’t Blame Us! - Carlo, et al. 2015
  • 36. 43 Š Hortonworks Inc. 2011–2018. All rights reserved Apache Hadoop 3.2 and Beyond
  • 37. 44 Š Hortonworks Inc. 2011–2018. All rights reserved • Every user says “Give me 16GB for my task”, even though it’s only needed at peak • Each node has some allocated but unutilized capacity. Use such capacity to run opportunistic tasks • Preempt such tasks when needed Container Overcommit (YARN-1011) Un-allocated Allocated Utilized Allocated- but-unutilized Node1 Guaranteed Containers Node1 Opportunistic Containers Preemption When necessary
  • 38. 45 Š Hortonworks Inc. 2011–2018. All rights reserved • “Take me to a node with JDK 10” • Node Partition vs. Node Attribute • Partition: • One partition for one node • ACL • Shares between queues • Preemption enforced. • Attribute: • For container placement • No ACL/Shares on attributes • First-come-first-serve Node Attributes (YARN-3409) Partition - 1 (15 nodes) Queue-A (40%) Queue-B (60%) Partition - 2 (15 nodes) A (25%) B (25%) C (25%) D (25%) Node 1 os.type=ubuntu os.version=14.10 glibc.version=2.20 JDK.version=8u20 Node 2 os.type=RHEL os.version=5.1 GPU.type=x86_64 JDK.version=7u20 Node 16 os.type=windows os.version=7 JDK.version=8u20 Node 17 os.type=SUSE os.version=12 GPU.type=i686 JDK.version=7u20
  • 39. 48 Š Hortonworks Inc. 2011–2018. All rights reserved Questions?