SlideShare a Scribd company logo
Hadoop YARN Services 
Steve Loughran– Hortonworks 
stevel at hortonworks.com 
@steveloughran 
ApacheCon EU, November 2014
Apache Hadoop + YARN: 
An OS for data
An OS can do more than SQL 
statements
An OS can do more than run 
admin-installed apps
An OS lets you run whatever 
you want!
An OS Offers 
• Persistent Storage 
• Execution of code 
• jobs & services 
• scheduling 
• Communications 
• Security
YARN Services: 
Long lived applications 
within a Hadoop cluster
Background: YARN 
YARN Node Manager 
HDFS 
YARN Resource Manager 
“The RM” 
HDFS 
YARN Node Manager 
HDFS 
YARN Node Manager 
HDFS 
• Servers run YARN Node Managers (NM) 
• NM's heartbeat to Resource Manager (RM) 
• RM schedules work over cluster 
• RM allocates containers to apps 
• NMs start containers 
• NMs report container health
Client creates App Master 
YARN Node Manager 
HDFS 
YARN Resource Manager 
“The RM” 
HDFS 
YARN Node Manager 
HDFS 
YARN Node Manager 
HDFS 
Client 
Application Master
“AM” requests containers 
YARN Node Manager 
HDFS 
YARN Resource Manager 
HDFS 
YARN Node Manager 
HDFS 
YARN Node Manager 
Application Master 
HDFS 
Container 
Container 
Container
Short lived applications 
• failure: clean restart 
• logs: collect at end 
• placement: by data 
• security: Kerberos delegation tokens 
• discovery: launcher app can track
Long-lived services 
• failure: stay available 
• logs: ongoing collection 
• placement: availability, performance 
• security: ?? 
• discovery: ???
YARN-896 
Support for YARN services:
YARN-896 
Log aggregation 
Kerberos token renewal 
Gang scheduling 
Service registration & discovery 
Net & Disk resources 
Windowed failure tracking 
Container reuse 
Anti-affinity placement 
Container resource flexing 
Container signalling 
Labelled nodes & queues 
Applications to continue over AM restart 
REST
Hadoop 2.6 
Log aggregation 
Kerberos token renewal 
Gang scheduling 
Service registration & discovery 
Net & Disk resources 
Windowed failure tracking 
Container reuse 
Anti-affinity placement 
Container resource flexing 
Container signalling 
Labelled nodes & queues 
Applications to continue over AM restart 
(Docker) 
REST
YARN-913 Service Registry 
$ slider resolve --path ~/services/org-apache-slider/storm1 
{ "type" : "JSONServiceRecord", 
"external" : [ { 
"api" : "http://", 
"addressType" : "uri", 
"protocolType" : "webui", 
"addresses" : [ { 
"uri" : "http://nn.example.com:46132" 
} ] 
}, { 
"api" : "classpath:org.apache.slider.publisher.configurations", 
"addressType" : "uri", 
"protocolType" : "REST", 
"addresses" : [ { 
"uri" : "http://nn.example.com:46132/ws/v1/slider/publisher/slider" 
}] 
} } ] }
Internal and external 
"internal" : [ { 
"api" : "classpath:org.apache.slider.agents.secure", 
"addressType" : "uri", 
"protocolType" : "REST", 
"addresses" : [ { 
"uri" : "https://nn.example.com:47749/ws/v1/slider/agents" 
} ] 
} ]
Failures 
YARN Node Manager 
HDFS 
YARN Resource Manager 
HDFS 
YARN Node Manager 
HDFS 
YARN Node Manager 
Application Master 
HDFS 
Container 
Container 
Container
Failures 
YARN Node Manager 
HDFS 
YARN Resource Manager 
HDFS 
YARN Node Manager 
HDFS 
Container 
Container
Failures 
YARN Node Manager 
HDFS 
YARN Resource Manager 
HDFS 
YARN Node Manager 
HDFS 
Application Master 
Container 
Container 
container 1 
container 2 
lost: container 3
Easy: enabling 
// Client 
amLauncher.setKeepContainersOverRestarts(true); 
amLauncher.setMaxAppAttempts(8); 
// Server 
List<Container> liveContainers = 
amRegistrationData.getContainersFromPreviousAttempts();
Harder: rebuilding state 
Persisted Rebuilt Transient 
Node Map 
Specification 
Placement History 
Event History 
Component Map Container Queues
Log Aggregation 
<property> 
<name>yarn.log-aggregation-enable</name> 
<value>true</value> 
</property>
$ yarn rmadmin 
... 
-addToClusterNodeLabels [label1,label2,label3] 
-removeFromClusterNodeLabels [label1,label2,label3] 
-replaceLabelsOnNode [node1:port,label1,label2] 
-directlyAccessNodeLabelStore 
Labels
Labels offer 
• Separation of workloads 
• Separation of service roles 
• Separation of production & dev code 
• Allocation to specific hardware classes
Security 
• Token expiry a core Kerberos feature 
• Token expiry inimical to service 
longevity 
• Specifically: token delegation
Security 
YARN: 
AM/RM token renewal 
NM HDFS access for AM container 
relaunch 
You: embrace keytabs, test lots
…so you can now 
• Write long lived apps 
• with failure resilience 
• centralised log viewing 
• labelled/isolated placement 
• in secure clusters
Why not just use Mesos?
Hadoop is everywhere!
Hadoop 2.7+ 
Log aggregation 
Kerberos token renewal 
Gang scheduling 
Service registration & discovery 
Docker 
Net & Disk resources 
Windowed failure tracking 
Container reuse 
Anti-affinity placement 
Container resource flexing 
Container signalling 
Labelled nodes & queues 
Applications to continue over AM restart 
REST
Questions? 
http://hadoop.apache.org

More Related Content

PPTX
Slider: Applications on YARN
Steve Loughran
 
PPTX
Hadoop YARN Services
DataWorks Summit
 
PPTX
Bring your Service to YARN
DataWorks Summit
 
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
PPTX
Apache Slider
Shivaji Dutta
 
PPTX
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
PPTX
Why your Spark Job is Failing
DataWorks Summit
 
PPTX
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 
Slider: Applications on YARN
Steve Loughran
 
Hadoop YARN Services
DataWorks Summit
 
Bring your Service to YARN
DataWorks Summit
 
Hadoop and Kerberos: the Madness Beyond the Gate
Steve Loughran
 
Apache Slider
Shivaji Dutta
 
Big Data in Container; Hadoop Spark in Docker and Mesos
Heiko Loewe
 
Why your Spark Job is Failing
DataWorks Summit
 
Tuning Apache Ambari performance for Big Data at scale with 3000 agents
DataWorks Summit
 

What's hot (20)

PPTX
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
PPTX
Why Your Apache Spark Job is Failing
Cloudera, Inc.
 
PPTX
Inside hadoop-dev
Steve Loughran
 
PPTX
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao
 
PDF
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
PPTX
Streamline Hadoop DevOps with Apache Ambari
Alejandro Fernandez
 
PPTX
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
PDF
Spark Working Environment in Windows OS
Universiti Technologi Malaysia (UTM)
 
PPTX
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tuning Apache Ambari Performance for Big Data at Scale with 3,000 Agents
Alejandro Fernandez
 
PPTX
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
PPTX
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
 
PDF
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Maris Elsins
 
PPTX
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
PPTX
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
PDF
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
PDF
Hadoop security
shrey mehrotra
 
PPTX
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Why Your Apache Spark Job is Failing
Cloudera, Inc.
 
Inside hadoop-dev
Steve Loughran
 
Achieve big data analytic platform with lambda architecture on cloud
Scott Miao
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
Streamline Hadoop DevOps with Apache Ambari
Alejandro Fernandez
 
Intro to Spark - for Denver Big Data Meetup
Gwen (Chen) Shapira
 
Spark Working Environment in Windows OS
Universiti Technologi Malaysia (UTM)
 
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tuning Apache Ambari Performance for Big Data at Scale with 3,000 Agents
Alejandro Fernandez
 
HadoopCon- Trend Micro SPN Hadoop Overview
Yafang Chang
 
YARN and the Docker container runtime
DataWorks Summit/Hadoop Summit
 
Oracle Databases on AWS - Getting the Best Out of RDS and EC2
Maris Elsins
 
Multi-tenant, Multi-cluster and Multi-container Apache HBase Deployments
DataWorks Summit
 
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Scaling Spark Workloads on YARN - Boulder/Denver July 2015
Mac Moore
 
Hadoop security
shrey mehrotra
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Ad

Viewers also liked (6)

ODP
Community Engagement
Steve Loughran
 
PPTX
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
PPTX
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
PPTX
Monitoring Spark Applications
Tzach Zohar
 
PPTX
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
PPTX
The hardest part of microservices: your data
Christian Posta
 
Community Engagement
Steve Loughran
 
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Monitoring Spark Applications
Tzach Zohar
 
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
The hardest part of microservices: your data
Christian Posta
 
Ad

Similar to YARN Services (20)

PPTX
Overview of slider project
Steve Loughran
 
PPTX
2013 11-19-hoya-status
Steve Loughran
 
KEY
Building Distributed Systems in Scala
Alex Payne
 
PDF
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
DataWorks Summit
 
PPTX
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
Ankit Singhal
 
PPTX
Hoya for Code Review
Steve Loughran
 
PDF
MariaDB on Docker
MariaDB plc
 
PPTX
Get most out of Spark on YARN
DataWorks Summit
 
PPTX
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
PROIDEA
 
PPTX
Apache Hadoop YARN State of the Union
Weiwei Yang
 
PPTX
Open stack in sina
Hui Cheng
 
PDF
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld
 
PDF
Getting started with MariaDB with Docker
MariaDB plc
 
PDF
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
PDF
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
PDF
Monitoring and Log Management for
Sematext Group, Inc.
 
PPTX
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit
 
PPTX
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
nnakasone
 
PDF
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Ovadiah Myrgorod
 
PDF
Harnessing the power of YARN with Apache Twill
Terence Yim
 
Overview of slider project
Steve Loughran
 
2013 11-19-hoya-status
Steve Loughran
 
Building Distributed Systems in Scala
Alex Payne
 
One Grid to rule them all: Building a Multi-tenant Data Cloud with YARN
DataWorks Summit
 
NoSql day 2019 - Floating on a Raft - Apache HBase durability with Apache Ratis
Ankit Singhal
 
Hoya for Code Review
Steve Loughran
 
MariaDB on Docker
MariaDB plc
 
Get most out of Spark on YARN
DataWorks Summit
 
DOD 2016 - Stefan Thies - Monitoring and Log Management for Docker Swarm and...
PROIDEA
 
Apache Hadoop YARN State of the Union
Weiwei Yang
 
Open stack in sina
Hui Cheng
 
VMworld 2013: Architecting VMware Horizon Workspace for Scale and Performance
VMworld
 
Getting started with MariaDB with Docker
MariaDB plc
 
Combine SAS High-Performance Capabilities with Hadoop YARN
Hortonworks
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Monitoring and Log Management for
Sematext Group, Inc.
 
Deploying Apache Flume to enable low-latency analytics
DataWorks Summit
 
Ai tour 2019 Mejores Practicas en Entornos de Produccion Big Data Open Source...
nnakasone
 
Create a Varnish cluster in Kubernetes for Drupal caching - DrupalCon North A...
Ovadiah Myrgorod
 
Harnessing the power of YARN with Apache Twill
Terence Yim
 

More from Steve Loughran (20)

PPTX
Hadoop Vectored IO
Steve Loughran
 
PPTX
The age of rename() is over
Steve Loughran
 
PPTX
What does Rename Do: (detailed version)
Steve Loughran
 
PPTX
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
PPTX
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PPTX
PUT is the new rename()
Steve Loughran
 
PPT
Extreme Programming Deployed
Steve Loughran
 
PPT
Testing
Steve Loughran
 
PPTX
I hate mocking
Steve Loughran
 
PPTX
What does rename() do?
Steve Loughran
 
PPTX
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
PPTX
Apache Spark and Object Stores
Steve Loughran
 
PPTX
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
PPTX
Datacentre stack
Steve Loughran
 
PPTX
Help! My Hadoop doesn't work!
Steve Loughran
 
ODP
2014 01-02-patching-workflow
Steve Loughran
 
PPTX
Hadoop: Beyond MapReduce
Steve Loughran
 
PPTX
HDFS: Hadoop Distributed Filesystem
Steve Loughran
 
PPTX
HA Hadoop -ApacheCon talk
Steve Loughran
 
Hadoop Vectored IO
Steve Loughran
 
The age of rename() is over
Steve Loughran
 
What does Rename Do: (detailed version)
Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PUT is the new rename()
Steve Loughran
 
Extreme Programming Deployed
Steve Loughran
 
I hate mocking
Steve Loughran
 
What does rename() do?
Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Apache Spark and Object Stores
Steve Loughran
 
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
Datacentre stack
Steve Loughran
 
Help! My Hadoop doesn't work!
Steve Loughran
 
2014 01-02-patching-workflow
Steve Loughran
 
Hadoop: Beyond MapReduce
Steve Loughran
 
HDFS: Hadoop Distributed Filesystem
Steve Loughran
 
HA Hadoop -ApacheCon talk
Steve Loughran
 

Recently uploaded (20)

PDF
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
PDF
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
PDF
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
PDF
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
PPTX
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
PDF
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
PPTX
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
PDF
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
PDF
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
PPTX
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
PDF
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
PDF
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
PPTX
Explanation about Structures in C language.pptx
Veeral Rathod
 
PDF
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
PPTX
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
PDF
Exploring AI Agents in Process Industries
amoreira6
 
PDF
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
PPTX
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
PPTX
Presentation about variables and constant.pptx
kr2589474
 
PDF
Bandai Playdia The Book - David Glotz
BluePanther6
 
Applitools Platform Pulse: What's New and What's Coming - July 2025
Applitools
 
What to consider before purchasing Microsoft 365 Business Premium_PDF.pdf
Q-Advise
 
Enhancing Healthcare RPM Platforms with Contextual AI Integration
Cadabra Studio
 
On Software Engineers' Productivity - Beyond Misleading Metrics
Romén Rodríguez-Gil
 
ConcordeApp: Engineering Global Impact & Unlocking Billions in Event ROI with AI
chastechaste14
 
Appium Automation Testing Tutorial PDF: Learn Mobile Testing in 7 Days
jamescantor38
 
Odoo Integration Services by Candidroot Solutions
CandidRoot Solutions Private Limited
 
New Download FL Studio Crack Full Version [Latest 2025]
imang66g
 
MiniTool Power Data Recovery Crack New Pre Activated Version Latest 2025
imang66g
 
Contractor Management Platform and Software Solution for Compliance
SHEQ Network Limited
 
ShowUs: Pharo Stream Deck (ESUG 2025, Gdansk)
ESUG
 
Key Features to Look for in Arizona App Development Services
Net-Craft.com
 
Explanation about Structures in C language.pptx
Veeral Rathod
 
New Download MiniTool Partition Wizard Crack Latest Version 2025
imang66g
 
Visualising Data with Scatterplots in IBM SPSS Statistics.pptx
Version 1 Analytics
 
Exploring AI Agents in Process Industries
amoreira6
 
Download iTop VPN Free 6.1.0.5882 Crack Full Activated Pre Latest 2025
imang66g
 
Web Testing.pptx528278vshbuqffqhhqiwnwuq
studylike474
 
Presentation about variables and constant.pptx
kr2589474
 
Bandai Playdia The Book - David Glotz
BluePanther6
 

YARN Services

  • 1. Hadoop YARN Services Steve Loughran– Hortonworks stevel at hortonworks.com @steveloughran ApacheCon EU, November 2014
  • 2. Apache Hadoop + YARN: An OS for data
  • 3. An OS can do more than SQL statements
  • 4. An OS can do more than run admin-installed apps
  • 5. An OS lets you run whatever you want!
  • 6. An OS Offers • Persistent Storage • Execution of code • jobs & services • scheduling • Communications • Security
  • 7. YARN Services: Long lived applications within a Hadoop cluster
  • 8. Background: YARN YARN Node Manager HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager HDFS YARN Node Manager HDFS • Servers run YARN Node Managers (NM) • NM's heartbeat to Resource Manager (RM) • RM schedules work over cluster • RM allocates containers to apps • NMs start containers • NMs report container health
  • 9. Client creates App Master YARN Node Manager HDFS YARN Resource Manager “The RM” HDFS YARN Node Manager HDFS YARN Node Manager HDFS Client Application Master
  • 10. “AM” requests containers YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager HDFS YARN Node Manager Application Master HDFS Container Container Container
  • 11. Short lived applications • failure: clean restart • logs: collect at end • placement: by data • security: Kerberos delegation tokens • discovery: launcher app can track
  • 12. Long-lived services • failure: stay available • logs: ongoing collection • placement: availability, performance • security: ?? • discovery: ???
  • 13. YARN-896 Support for YARN services:
  • 14. YARN-896 Log aggregation Kerberos token renewal Gang scheduling Service registration & discovery Net & Disk resources Windowed failure tracking Container reuse Anti-affinity placement Container resource flexing Container signalling Labelled nodes & queues Applications to continue over AM restart REST
  • 15. Hadoop 2.6 Log aggregation Kerberos token renewal Gang scheduling Service registration & discovery Net & Disk resources Windowed failure tracking Container reuse Anti-affinity placement Container resource flexing Container signalling Labelled nodes & queues Applications to continue over AM restart (Docker) REST
  • 16. YARN-913 Service Registry $ slider resolve --path ~/services/org-apache-slider/storm1 { "type" : "JSONServiceRecord", "external" : [ { "api" : "http://", "addressType" : "uri", "protocolType" : "webui", "addresses" : [ { "uri" : "http://nn.example.com:46132" } ] }, { "api" : "classpath:org.apache.slider.publisher.configurations", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "http://nn.example.com:46132/ws/v1/slider/publisher/slider" }] } } ] }
  • 17. Internal and external "internal" : [ { "api" : "classpath:org.apache.slider.agents.secure", "addressType" : "uri", "protocolType" : "REST", "addresses" : [ { "uri" : "https://nn.example.com:47749/ws/v1/slider/agents" } ] } ]
  • 18. Failures YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager HDFS YARN Node Manager Application Master HDFS Container Container Container
  • 19. Failures YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager HDFS Container Container
  • 20. Failures YARN Node Manager HDFS YARN Resource Manager HDFS YARN Node Manager HDFS Application Master Container Container container 1 container 2 lost: container 3
  • 21. Easy: enabling // Client amLauncher.setKeepContainersOverRestarts(true); amLauncher.setMaxAppAttempts(8); // Server List<Container> liveContainers = amRegistrationData.getContainersFromPreviousAttempts();
  • 22. Harder: rebuilding state Persisted Rebuilt Transient Node Map Specification Placement History Event History Component Map Container Queues
  • 23. Log Aggregation <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property>
  • 24. $ yarn rmadmin ... -addToClusterNodeLabels [label1,label2,label3] -removeFromClusterNodeLabels [label1,label2,label3] -replaceLabelsOnNode [node1:port,label1,label2] -directlyAccessNodeLabelStore Labels
  • 25. Labels offer • Separation of workloads • Separation of service roles • Separation of production & dev code • Allocation to specific hardware classes
  • 26. Security • Token expiry a core Kerberos feature • Token expiry inimical to service longevity • Specifically: token delegation
  • 27. Security YARN: AM/RM token renewal NM HDFS access for AM container relaunch You: embrace keytabs, test lots
  • 28. …so you can now • Write long lived apps • with failure resilience • centralised log viewing • labelled/isolated placement • in secure clusters
  • 29. Why not just use Mesos?
  • 31. Hadoop 2.7+ Log aggregation Kerberos token renewal Gang scheduling Service registration & discovery Docker Net & Disk resources Windowed failure tracking Container reuse Anti-affinity placement Container resource flexing Container signalling Labelled nodes & queues Applications to continue over AM restart REST