SlideShare a Scribd company logo
Apache Hadoop YARN: Present and
Future
Vinod Kumar Vavilapalli
Hortonworks
© Hortonworks Inc. 2014
Apache Hadoop YARN
Present and Future
Vinod Kumar Vavilapalli
vinodkv [at] apache.org
@tshooter
Page 2
© Hortonworks Inc. 2014
A quick show of hands..
• Hadoop 2
Page 3
Architecting the Future of Big Data
Real life Hadoop Logo
© Hortonworks Inc. 2014
Who am I?
• 6.75 Hadoop-years old
• Last thing at School – a two node Tomcat cluster. Three months
later, first thing at job, brought down a 800 node cluster ;)
• Previously @Yahoo!
• Now @Hortonworks
• Two hats
– Hortonworks: Hadoop MapReduce and YARN Development lead
– Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member
• Worked/working on
– YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop
security
– Apache Ambari: Kickstarted the project and its first release
– Stinger: High performance data processing with Hadoop/Hive
• Lots of trouble shooting on clusters
• 99% + code in Apache, Hadoop
Page 4
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Agenda
• Apache Hadoop 2 : Overview
• Past
• Present
• Future
Page 5
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop 2
Next Generation Architecture
Architecting the Future of Big Data
Page 6
© Hortonworks Inc. 2014
What is YARN?
• Resource Management Platform
– MapReduce v2
– Beyond MapReduce with Tez, Storm, Spark; in Hadoop!
– Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider?
• How is it different from Hadoop 1? ..
Page 7
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Hadoop 1 vs Hadoop 2
HADOOP 1.0
HDFS
(redundant, reliable storage)
MapReduce
(cluster resource management
& data processing)
HDFS2
(redundant, highly-available & reliable storage)
YARN
(cluster resource management)
MapReduce
(data processing)
Others
HADOOP 2.0
Single Use System
Batch Apps
Multi Purpose Platform
Batch, Interactive, Online, Streaming, …
Page 8
© Hortonworks Inc. 2014
Key Benefits of YARN
• Scale
• New Programming Models & Services
• Improved cluster utilization
• Agility
• To infinity and beyond ..
Page 9
© Hortonworks Inc. 2014
Why Migrate?
• 2.0 >= 2 * 1.0
– HDFS: Lots of ground-breaking features
– YARN: Next generation architecture
• Return on Investment: 2x throughput on same hardware!
• Ready for improvements in hardware
• Not convinced? Let’s see what others are saying!
Page 10
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Yahoo!
• Leader/Visionary on all things Hadoop!
• On YARN (0.23.x)
• Moving fast to 2.x
Page 11
Architecting the Future of Big Data
http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
© Hortonworks Inc. 2014
Twitter
Page 12
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Ebay
• Has one of the largest Hadoop clusters in the industry with many
petabytes of data
• Migrated production clusters to Hadoop-2
• Go to Mayank’s talk
– “Hadoop-2 @ ebay”!
– Thursday, April 3
– Track : Deployment and Operations
• Should be convinced by now .. . No?
Page 13
Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN: the Data Operating System
Page 14
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Present
Architecting the Future of Big Data
Page 15
© Hortonworks Inc. 2014
Apache Hadoop releases
• 15 October, 2013
• The 1st GA release of Apache Hadoop 2.x
• YARN
– First stable and supported release of YARN
– Binary Compatibility for MapReduce applications built on hadoop-1.x
– YARN level APIs solidified for the future
– Performance
– Scale!
• HDFS
– High Availability for HDFS
– HDFS Federation
– HDFS Snapshots
– NFSv3 access to data in HDFS
• Support for running Hadoop on Microsoft Windows
• Substantial amount of integration testing with rest of projects in the
ecosystem
Page 16
Architecting the Future of Big Data
Apache Hadoop 2.2
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• 24 February, 2014
• First post GA release for the year 2014
• Alpha features in YARN
– ResourceManager HA
– Application History
– Will cover in the 2.4 content
• HDFS
– Details follow..
• Number of bug-fixes, enhancements
Page 17
Architecting the Future of Big Data
Apache Hadoop 2.3
© Hortonworks Inc. 2014
HDFS: Heterogeneous Storage
Page 18
Architecting the Future of Big Data
© Hortonworks Inc. 2014
HDFS: DataNode caching
Page 19
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Apache Hadoop releases (contd)
• Very soon!
• YARN
– Details follow..
– ResourceManager restart fail-over for high availability
– Preemption
– Application History and timeline
• HDFS
– FileSystem ACLs
– Rolling upgrades
Page 20
Architecting the Future of Big Data
Apache Hadoop 2.4
© Hortonworks Inc. 2014
ResourceManager Restart and fail-over
Page 21
Architecting the Future of Big Data
ZooKeeper
© Hortonworks Inc. 2014
Capacity Scheduler Preemption
Page 22
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Application History and Timeline
• Few MR specific implementations: History and web-UI
• Not just MR anymore!
• History
– MapReduce specific Job History Server
– Beyond ResourceManager Restart
• Timeline
– Framework specific event collection and UIs
• Run analytics on historical apps!
Page 23
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future
Architecting the Future of Big Data
Page 24
© Hortonworks Inc. 2014
Future: Operational enhancements
• Rolling upgrades
– No/minimal impact to users
– Ideal: Always rolling!
• HDFS in
• YARN
Page 25
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future: Enabling more apps
• Beyond MR
• Discussing next
– Long running services
– Isolation
– Multi-dimensional resource
scheduling
Page 26
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Future: Long running services
• You can run them already!
• Few enhancements needed
– Logs
– Security
– Management/monitoring
• Resource sharing across
workload types
• Project Slider
Page 27
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Fine-grain isolation for multi-tenancy
• Custom memory-monitoring
• Cgroups
• Linux Containers
• VMs
Page 28
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Multi-resource scheduling
• Today – memory & cpu
– Physical memory / virtual memory
– Cpu Cores – Virtual cores
• CPU stuff: More bake in
• Disks
– Space
– IOPS
• Network
Page 29
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Other features
• Application SLAs
• Node labels
• Node affinity/anti-affinity
• Better online queue-management
Page 30
Architecting the Future of Big Data
© Hortonworks Inc. 2014
YARN Ecosystem
Beyond the core YARN project: Briefly
Architecting the Future of Big Data
Page 31
© Hortonworks Inc. 2014
Eco-system
Page 32
Applications Powered by YARN
Apache Giraph – Graph Processing
Apache Hama – BSP
Apache Hadoop MapReduce – Batch
Apache Tez – Batch/Interactive
Apache S4 – Stream Processing
Apache Samza – Stream Processing
Apache Storm – Stream Processing
Apache Spark – Iterative applications
HOYA – HBase on YARN
YARN Frameworks
Apache Twill
REEF by Microsoft
Spring support for Hadoop 2
There's an app for that...
YARN App Marketplace!
© Hortonworks Inc. 2014
Apache TEZ
• Moving beyond MR
• A data processing framework that can execute a complex DAG of
tasks.
• “Apache Tez - A New Chapter in Hadoop Data Processing”
– By Siddharth Seth: YARN & Tez Committer/PMC Member
– Thursday, April 3 (4:20-5:00pm)
Page 33
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Recap
Architecting the Future of Big Data
Page 34
© Hortonworks Inc. 2014
Recap
Page 35
Architecting the Future of Big Data
• Apache Hadoop 2 is, at least, twice as good!
• Exciting journey with Hadoop for this decade…
– Hadoop is no longer a one-trick pony, err elephant
– Beyond just HDFS & MapReduce
• Architecture for the future
– Centralized data
– Exciting spectrum of application types, workloads and usecases
© Hortonworks Inc. 2014
Couple more things..
Architecting the Future of Big Data
Page 36
© Hortonworks Inc. 2014
The Book is out!
Page 37
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Page 38
Architecting the Future of Big Data
© Hortonworks Inc. 2014
Thank you!
Page 39
Download Sandbox: Experience Apache Hadoop
Both 2.x and 1.x Versions Available!
http://hortonworks.com/products/hortonworks-sandbox/
Questions Time!
Apache Hadoop YARN: Present and Future

More Related Content

PPTX
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
PPTX
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
PPTX
YARN - Hadoop's Resource Manager
VertiCloud Inc
 
PPTX
Writing Yarn Applications Hadoop Summit 2012
Hortonworks
 
PPTX
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
PDF
Yarns About Yarn
Cloudera, Inc.
 
PPTX
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
 
PPTX
Apache Hadoop YARN: best practices
DataWorks Summit
 
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
YARN - Next Generation Compute Platform fo Hadoop
Hortonworks
 
YARN - Hadoop's Resource Manager
VertiCloud Inc
 
Writing Yarn Applications Hadoop Summit 2012
Hortonworks
 
YARN - Hadoop Next Generation Compute Platform
Bikas Saha
 
Yarns About Yarn
Cloudera, Inc.
 
Hadoop Summit Europe 2015 - YARN Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN: best practices
DataWorks Summit
 

What's hot (20)

ODP
An Introduction to Apache Hadoop Yarn
Mike Frampton
 
PDF
Apache Hadoop YARN
Adam Kawa
 
PPTX
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
PDF
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
 
PDF
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
PPTX
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
 
PPTX
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
 
PDF
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
PDF
Introduction to Hadoop
Vigen Sahakyan
 
PDF
Yarn
Yu Xia
 
PPTX
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit
 
PDF
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
PDF
Hadoop 2 - More than MapReduce
Uwe Printz
 
PPTX
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
PPTX
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 
PDF
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Big Data Joe™ Rossi
 
PPTX
MHUG - YARN
Joseph Niemiec
 
PDF
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
PDF
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Sumeet Singh
 
PPTX
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
 
An Introduction to Apache Hadoop Yarn
Mike Frampton
 
Apache Hadoop YARN
Adam Kawa
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn
David Kaiser
 
Apache Hadoop YARN - The Future of Data Processing with Hadoop
Hortonworks
 
Enabling Diverse Workload Scheduling in YARN
DataWorks Summit
 
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
 
Apache Hadoop YARN – Multi-Tenancy, Capacity Scheduler & Preemption - Stamped...
StampedeCon
 
Introduction to Hadoop
Vigen Sahakyan
 
Yarn
Yu Xia
 
Towards SLA-based Scheduling on YARN Clusters
DataWorks Summit
 
Developing Applications with Hadoop 2.0 and YARN by Abhijit Lele
Hakka Labs
 
Hadoop 2 - More than MapReduce
Uwe Printz
 
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 
Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition
Big Data Joe™ Rossi
 
MHUG - YARN
Joseph Niemiec
 
Hadoop Summit Amsterdam 2014: Capacity Planning In Multi-tenant Hadoop Deploy...
Sumeet Singh
 
Hadoop Summit San Jose 2015: Towards SLA-based Scheduling on YARN Clusters
Sumeet Singh
 
Yarns about YARN: Migrating to MapReduce v2
DataWorks Summit
 
Ad

Viewers also liked (16)

PPTX
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
PPTX
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera, Inc.
 
PDF
BranchReduce Distributed Branch-and-Bound on YARN
DataWorks Summit
 
PDF
Hadoop, HDFS and MapReduce
fvanvollenhoven
 
PPTX
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
PPTX
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
 
PPTX
Hadoop Developer
Edureka!
 
PPTX
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
 
PDF
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
PDF
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
vwchu
 
PPTX
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
PDF
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
PPT
Hadoop MapReduce Fundamentals
Lynn Langit
 
PDF
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
PPTX
Hadoop & HDFS for Beginners
Rahul Jain
 
PPTX
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
In15orlesss hadoop
Worapol Alex Pongpech, PhD
 
Cloudera Sessions - Clinic 1 - Getting Started With Hadoop
Cloudera, Inc.
 
BranchReduce Distributed Branch-and-Bound on YARN
DataWorks Summit
 
Hadoop, HDFS and MapReduce
fvanvollenhoven
 
The Future of Hadoop: A deeper look at Apache Spark
Cloudera, Inc.
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
 
Hadoop Developer
Edureka!
 
Introduction to Machine Learning on Apache Spark MLlib by Juliet Hougland, Se...
Cloudera, Inc.
 
Which Hadoop Distribution to use: Apache, Cloudera, MapR or HortonWorks?
Edureka!
 
Hypervisors and Virtualization - VMware, Hyper-V, XenServer, and KVM
vwchu
 
Apache Hadoop YARN: Understanding the Data Operating System of Hadoop
Hortonworks
 
Apache Hadoop YARN - Enabling Next Generation Data Applications
Hortonworks
 
Hadoop MapReduce Fundamentals
Lynn Langit
 
sudoers: Benchmarking Hadoop with ALOJA
Nicolas Poggi
 
Hadoop & HDFS for Beginners
Rahul Jain
 
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Ad

Similar to Apache Hadoop YARN: Present and Future (20)

PPTX
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
 
PPTX
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
 
PPTX
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
PDF
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PPTX
Hadoop In Action
Bigdata Meetup Kochi
 
PPTX
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
PDF
Welcome to Hadoop2Land!
Uwe Printz
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
PPTX
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
PDF
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
PDF
YARN - Strata 2014
Hortonworks
 
PPTX
Apache Hadoop YARN: state of the union
DataWorks Summit
 
PPTX
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
PPTX
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
PPTX
YARN - Past, Present, & Future
DataWorks Summit
 
PPTX
Get Started Building YARN Applications
Hortonworks
 
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
PPTX
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
PDF
Discover.hdp2.2.h base.final[2]
Hortonworks
 
Hadoop Summit Europe Talk 2014: Apache Hadoop YARN: Present and Future
Vinod Kumar Vavilapalli
 
Apache Hadoop YARN 2015: Present and Future
DataWorks Summit
 
A Multi Colored YARN
DataWorks Summit/Hadoop Summit
 
Bikas saha:the next generation of hadoop– hadoop 2 and yarn
hdhappy001
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Hadoop In Action
Bigdata Meetup Kochi
 
Apache Hadoop 3.0 What's new in YARN and MapReduce
DataWorks Summit/Hadoop Summit
 
Welcome to Hadoop2Land!
Uwe Printz
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit
 
Hadoop Summit San Jose 2015: YARN - Past, Present and Future
Vinod Kumar Vavilapalli
 
2013 Nov 20 Toronto Hadoop User Group (THUG) - Hadoop 2.2.0
Adam Muise
 
YARN - Strata 2014
Hortonworks
 
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Dataworks Berlin Summit 18' - Apache hadoop YARN State Of The Union
Wangda Tan
 
Apache Hadoop 3 updates with migration story
Sunil Govindan
 
YARN - Past, Present, & Future
DataWorks Summit
 
Get Started Building YARN Applications
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Apache Hadoop YARN: Present and Future
DataWorks Summit
 
Discover.hdp2.2.h base.final[2]
Hortonworks
 

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
PPTX
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
New ThousandEyes Product Innovations: Cisco Live June 2025
ThousandEyes
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
Doc9.....................................
SofiaCollazos
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Trying to figure out MCP by actually building an app from scratch with open s...
Julien SIMON
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 

Apache Hadoop YARN: Present and Future

  • 1. Apache Hadoop YARN: Present and Future Vinod Kumar Vavilapalli Hortonworks
  • 2. © Hortonworks Inc. 2014 Apache Hadoop YARN Present and Future Vinod Kumar Vavilapalli vinodkv [at] apache.org @tshooter Page 2
  • 3. © Hortonworks Inc. 2014 A quick show of hands.. • Hadoop 2 Page 3 Architecting the Future of Big Data Real life Hadoop Logo
  • 4. © Hortonworks Inc. 2014 Who am I? • 6.75 Hadoop-years old • Last thing at School – a two node Tomcat cluster. Three months later, first thing at job, brought down a 800 node cluster ;) • Previously @Yahoo! • Now @Hortonworks • Two hats – Hortonworks: Hadoop MapReduce and YARN Development lead – Apache: Apache Hadoop YARN lead. Apache Hadoop PMC, Apache Member • Worked/working on – YARN, Hadoop MapReduce, HadoopOnDemand, CapacityScheduler, Hadoop security – Apache Ambari: Kickstarted the project and its first release – Stinger: High performance data processing with Hadoop/Hive • Lots of trouble shooting on clusters • 99% + code in Apache, Hadoop Page 4 Architecting the Future of Big Data
  • 5. © Hortonworks Inc. 2014 Agenda • Apache Hadoop 2 : Overview • Past • Present • Future Page 5 Architecting the Future of Big Data
  • 6. © Hortonworks Inc. 2014 Apache Hadoop 2 Next Generation Architecture Architecting the Future of Big Data Page 6
  • 7. © Hortonworks Inc. 2014 What is YARN? • Resource Management Platform – MapReduce v2 – Beyond MapReduce with Tez, Storm, Spark; in Hadoop! – Did I mention Services like HBase, Accumulo on YARN with HoYA/Slider? • How is it different from Hadoop 1? .. Page 7 Architecting the Future of Big Data
  • 8. © Hortonworks Inc. 2014 Hadoop 1 vs Hadoop 2 HADOOP 1.0 HDFS (redundant, reliable storage) MapReduce (cluster resource management & data processing) HDFS2 (redundant, highly-available & reliable storage) YARN (cluster resource management) MapReduce (data processing) Others HADOOP 2.0 Single Use System Batch Apps Multi Purpose Platform Batch, Interactive, Online, Streaming, … Page 8
  • 9. © Hortonworks Inc. 2014 Key Benefits of YARN • Scale • New Programming Models & Services • Improved cluster utilization • Agility • To infinity and beyond .. Page 9
  • 10. © Hortonworks Inc. 2014 Why Migrate? • 2.0 >= 2 * 1.0 – HDFS: Lots of ground-breaking features – YARN: Next generation architecture • Return on Investment: 2x throughput on same hardware! • Ready for improvements in hardware • Not convinced? Let’s see what others are saying! Page 10 Architecting the Future of Big Data
  • 11. © Hortonworks Inc. 2014 Yahoo! • Leader/Visionary on all things Hadoop! • On YARN (0.23.x) • Moving fast to 2.x Page 11 Architecting the Future of Big Data http://developer.yahoo.com/blogs/ydn/hadoop-yahoo-more-ever-54421.html
  • 12. © Hortonworks Inc. 2014 Twitter Page 12 Architecting the Future of Big Data
  • 13. © Hortonworks Inc. 2014 Ebay • Has one of the largest Hadoop clusters in the industry with many petabytes of data • Migrated production clusters to Hadoop-2 • Go to Mayank’s talk – “Hadoop-2 @ ebay”! – Thursday, April 3 – Track : Deployment and Operations • Should be convinced by now .. . No? Page 13 Architecting the Future of Big Data
  • 14. © Hortonworks Inc. 2014 YARN: the Data Operating System Page 14 Architecting the Future of Big Data
  • 15. © Hortonworks Inc. 2014 Present Architecting the Future of Big Data Page 15
  • 16. © Hortonworks Inc. 2014 Apache Hadoop releases • 15 October, 2013 • The 1st GA release of Apache Hadoop 2.x • YARN – First stable and supported release of YARN – Binary Compatibility for MapReduce applications built on hadoop-1.x – YARN level APIs solidified for the future – Performance – Scale! • HDFS – High Availability for HDFS – HDFS Federation – HDFS Snapshots – NFSv3 access to data in HDFS • Support for running Hadoop on Microsoft Windows • Substantial amount of integration testing with rest of projects in the ecosystem Page 16 Architecting the Future of Big Data Apache Hadoop 2.2
  • 17. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • 24 February, 2014 • First post GA release for the year 2014 • Alpha features in YARN – ResourceManager HA – Application History – Will cover in the 2.4 content • HDFS – Details follow.. • Number of bug-fixes, enhancements Page 17 Architecting the Future of Big Data Apache Hadoop 2.3
  • 18. © Hortonworks Inc. 2014 HDFS: Heterogeneous Storage Page 18 Architecting the Future of Big Data
  • 19. © Hortonworks Inc. 2014 HDFS: DataNode caching Page 19 Architecting the Future of Big Data
  • 20. © Hortonworks Inc. 2014 Apache Hadoop releases (contd) • Very soon! • YARN – Details follow.. – ResourceManager restart fail-over for high availability – Preemption – Application History and timeline • HDFS – FileSystem ACLs – Rolling upgrades Page 20 Architecting the Future of Big Data Apache Hadoop 2.4
  • 21. © Hortonworks Inc. 2014 ResourceManager Restart and fail-over Page 21 Architecting the Future of Big Data ZooKeeper
  • 22. © Hortonworks Inc. 2014 Capacity Scheduler Preemption Page 22 Architecting the Future of Big Data
  • 23. © Hortonworks Inc. 2014 Application History and Timeline • Few MR specific implementations: History and web-UI • Not just MR anymore! • History – MapReduce specific Job History Server – Beyond ResourceManager Restart • Timeline – Framework specific event collection and UIs • Run analytics on historical apps! Page 23 Architecting the Future of Big Data
  • 24. © Hortonworks Inc. 2014 Future Architecting the Future of Big Data Page 24
  • 25. © Hortonworks Inc. 2014 Future: Operational enhancements • Rolling upgrades – No/minimal impact to users – Ideal: Always rolling! • HDFS in • YARN Page 25 Architecting the Future of Big Data
  • 26. © Hortonworks Inc. 2014 Future: Enabling more apps • Beyond MR • Discussing next – Long running services – Isolation – Multi-dimensional resource scheduling Page 26 Architecting the Future of Big Data
  • 27. © Hortonworks Inc. 2014 Future: Long running services • You can run them already! • Few enhancements needed – Logs – Security – Management/monitoring • Resource sharing across workload types • Project Slider Page 27 Architecting the Future of Big Data
  • 28. © Hortonworks Inc. 2014 Fine-grain isolation for multi-tenancy • Custom memory-monitoring • Cgroups • Linux Containers • VMs Page 28 Architecting the Future of Big Data
  • 29. © Hortonworks Inc. 2014 Multi-resource scheduling • Today – memory & cpu – Physical memory / virtual memory – Cpu Cores – Virtual cores • CPU stuff: More bake in • Disks – Space – IOPS • Network Page 29 Architecting the Future of Big Data
  • 30. © Hortonworks Inc. 2014 Other features • Application SLAs • Node labels • Node affinity/anti-affinity • Better online queue-management Page 30 Architecting the Future of Big Data
  • 31. © Hortonworks Inc. 2014 YARN Ecosystem Beyond the core YARN project: Briefly Architecting the Future of Big Data Page 31
  • 32. © Hortonworks Inc. 2014 Eco-system Page 32 Applications Powered by YARN Apache Giraph – Graph Processing Apache Hama – BSP Apache Hadoop MapReduce – Batch Apache Tez – Batch/Interactive Apache S4 – Stream Processing Apache Samza – Stream Processing Apache Storm – Stream Processing Apache Spark – Iterative applications HOYA – HBase on YARN YARN Frameworks Apache Twill REEF by Microsoft Spring support for Hadoop 2 There's an app for that... YARN App Marketplace!
  • 33. © Hortonworks Inc. 2014 Apache TEZ • Moving beyond MR • A data processing framework that can execute a complex DAG of tasks. • “Apache Tez - A New Chapter in Hadoop Data Processing” – By Siddharth Seth: YARN & Tez Committer/PMC Member – Thursday, April 3 (4:20-5:00pm) Page 33 Architecting the Future of Big Data
  • 34. © Hortonworks Inc. 2014 Recap Architecting the Future of Big Data Page 34
  • 35. © Hortonworks Inc. 2014 Recap Page 35 Architecting the Future of Big Data • Apache Hadoop 2 is, at least, twice as good! • Exciting journey with Hadoop for this decade… – Hadoop is no longer a one-trick pony, err elephant – Beyond just HDFS & MapReduce • Architecture for the future – Centralized data – Exciting spectrum of application types, workloads and usecases
  • 36. © Hortonworks Inc. 2014 Couple more things.. Architecting the Future of Big Data Page 36
  • 37. © Hortonworks Inc. 2014 The Book is out! Page 37 Architecting the Future of Big Data
  • 38. © Hortonworks Inc. 2014 Page 38 Architecting the Future of Big Data
  • 39. © Hortonworks Inc. 2014 Thank you! Page 39 Download Sandbox: Experience Apache Hadoop Both 2.x and 1.x Versions Available! http://hortonworks.com/products/hortonworks-sandbox/ Questions Time!

Editor's Notes

  • #33: Graph processing – Giraph, HamaStream proessing – Smaza, Storm, Spark, DataTorrentMapReduceTez – fast query executionWeave/REEF – frameworks to help with writing applicationsList of some of the applications which already support YARN, in some form.Smaza, Storm, S4 and DataTorrent are streaming frameworksVarious types of graph processing frameworks – Giraph and Hama are graph processing systemsThere’s some github projects – caching systems, on-demand web-server spin up Wave and REEF are frameworks on top of YARN to make writing applications easier