SlideShare a Scribd company logo
Page1 © Hortonworks Inc. 2014
Enterprise-Grade Rolling Upgrade for a Live
Hadoop Cluster
Sanjay Radia, Vinod Kumar Vavilapalli, Hortonworks Inc
Page 1
Page2 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013 - Confidential
Agenda
•Introduction
•What is Rolling Upgrade?
•Problem – Several key issues to be addressed
–Wire compatibility and side-by-side installs are not sufficient!!
–Must Address: Data safety, Service degradation and disruption
•Enhancements to various components
–Packaging – side-by-side install
–HDFS, Yarn, Hive, Oozie
Page 2
Page3 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013 - Confidential
Hello, my name is Sanjay Radia
•Chief Architect, Founder, Hortonworks
•Part of the Hadoop team at Yahoo! since 2007
–Chief Architect of Hadoop Core at Yahoo!
–Apache Hadoop PMC and Committer
• Prior
–Data center automation, schedulers, virtualization, Java, HA, OSs, File
Systems
– (Startup, Sun Microsystems, Inria …)
–Ph.D., University of Waterloo
Page 3
Page4 © Hortonworks Inc. 2014
HDP Upgrade: Two Upgrade Modes
Stop the Cluster Upgrade
Shutdown services and cluster and then upgrade.
Traditionally this was the only way
Rolling Upgrade
Upgrade cluster and its services while cluster is
actively running jobs and applications
Note: Upgrade time is proportional to # nodes, not data size
Enterprises run critical services and data on a Hadoop cluster.
Need live cluster upgrade that maintains SLAs without degradation
Page5 © Hortonworks Inc. 2014
© Hortonworks Inc. 2013 - Confidential
But you can Revert to Prior State
Rollback
Revert bits and state of cluster and its services back to a
checkpoint’d state.
Why? This is an emergency procedure.
Downgrade
Downgrade the service and component to prior version, but
keep any new data and metadata that has been generated
Why? You are not happy with performance, or app compatibility, ….
Page6 © Hortonworks Inc. 2014
But aren’t wire compatibility and
side-by-side installs sufficient for
Rolling upgrades?
Unfortunately No!! Not if you want
• Data safety
• Keep running jobs/apps continue to run correctly
• Maintain SLAs
• Allow downgrade/rollbacks in case of problems
Page 6
Page7 © Hortonworks Inc. 2014
Issues that need to be addressed (1)
• Data safety
• HDFS’s upgrade checkpoint does not work for rolling upgrade
• Service degradation – note every daemon is restarted in rolling fashion
• HDFS write pipeline
• Yarn App masters restart
• Node manager restart
• Hive server is processing client queries – it cannot restart to new version without loss
• Client must not see failures – many components do not have retry
BUT Hadoop deals with failures, it will fix pipelines, restart tasks –
what is the big deal!!
Service degradation will be high because every daemon is restarted
Page8 © Hortonworks Inc. 2014
Issues that need to be addressed (2)
• Maintaining the job submitters context (correctness)
• Yarn tasks get their context from the local node
– In the past the submitters and node’s context were identical
– But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter
- Half of the job could execute with old binaries and the other with the new one!!
• Persistent state
• Backward compatibility for upgrade (or convert)
• Forward compatibility for downgrade (or convert)
• Wire compatibility
• With clients (forward and backward)
• Internally (Between Masters and Slaves or Peers)
– Note: the upgrade is in a rolling fashion
Page9 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page10 © Hortonworks Inc. 2014
Packaging: Side-by-side Installs (1)
• Need side-by-side installs of multiple versions on same node
• Some components are version N, while others are N+1
• For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN)
• HDP’s solution: Use OS-distro standard packaging solution
• Rejected a proprietary packing solution (no lock-in)
• Want to support RU via Ambari and Manually
• Standard packaging solutions like RPMs have useful tools and mechanisms
– Tools to install, uninstall, query, etc
– Manage dependencies automatically
– Admins do not need to learn new tools and formats
• Side benefits for ‘stop-the-world” upgrade:
• Can install the new binaries before the shutdown
Page11 © Hortonworks Inc. 2014
Packaging: Side-by-side installs (2)
• Layout: side-by-side
• /usr/hdp/2.2.0.0/hadoop
• /usr/hdp/2.2.0.0/hive
• /usr/hdp/2.3.0.0/hadoop
• /usr/hdp/2.3.0.0/hive
• Define what is current for each component’s
daemon and clients
• /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop
• /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop
• /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop
• Distro-select helps you manage the version switch
• Our solution: the package name contains the version number:
• E.g hadoop_2_2_0_0 is the RPM package name itself
– Hadoop_2_3_0_0 is different peer package
• Bin commands point to current:
/usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
Page12 © Hortonworks Inc. 2014
Packaging: Side-by-side installs (3)
• distro-select tool to select current binary
• Per-component, Per-daemon
• Maintain stack consistency – that is what QE tested
• Each component refers to its siblings of same stack version
• Each component knows the “hadoop home” of the same stack
– Wrapper bin-scripts set this up
• Config updates can be optionally synchronized with binary upgrade
• Configs can sit in their old location
• But what if the new binary version requires slightly different config?
• Each binary version has its own config pointer
– /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
Page13 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page14 © Hortonworks Inc. 2014
HDFS Enhancements (1)
Data safety
• Since version 2007, HDFS supported an upgrade-checkpoint
• Backups of HDFS not practical – too large
• Protects against HDFS bugs in new version deleting files
• Standard practice to use for ALL upgrade even patch releases
• But this only works for “stop-the-world” full upgrade and does not support downgrade
• Irresponsible to do rolling upgrade without such a mechanism
HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535)
• Markers for rollback
• “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code
• Old scheme had hardlinks but we now delay the deletes
• Added downgrade capability
• Protobuf based fsImage for compatible extensibility
Page15 © Hortonworks Inc. 2014
HDFS Enhancements (2)
Minimize service degradation and retain data safety
• Fast datanode restart (HDFS-5498)
• Write pipeline – every DN will be upgraded and hence many write
pipelines will break and repaired
• Umbrella Jira HDFS-5535
– Repair it to the same DN during RU (avoid replica data copy)
– Retain same number of replicas in pipeline
• Upgrade HA standby and failover (NN HA available for a long time)
Page16 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page17 © Hortonworks Inc. 2014
YARN Enhancements: Minimize Service Degradation
• YARN RM retains app/job queue (2013)
• YARN RM HA (2014)
• Note this retains the queues but ALL jobs are restarted
• Yarn RM can restart while retaining jobs (2015)
Page18 © Hortonworks Inc. 2014
YARN Enhancements: Minimize Service Degradation
• A restarted YARN NodeManager retains existing containers (2015)
• Recall restarting containers will cause serious SLA degradation
Page19 © Hortonworks Inc. 2014
YARN Enhancement: Compatibility
• Versioning of state-stores of RM and NMs
• Compatible evolution of tokens over time
• Wire compatibility between mixed versions of RM
Page20 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page21 © Hortonworks Inc. 2014
Retaining Job/App context
Previously a Job/Apps uses libraries from the local node
• Worked because client-node & compute-nodes had same version
• But during RU, the node manager has multiple versions
• Must use the same version as used by the client when submitting a job
• Solution:
• Framework libraries are now installed in HDFS
• Client-context sent as “distro-version” variable in job config
• Has side benefits
– Frameworks now installed in single node and then uploaded to HDFS
• Note Oozie also enhanced to maintain consistent context
Page22 © Hortonworks Inc. 2014
Component Enhancements
• Packaging – Side-by-side installs
• HDFS Enhancements
• Yarn Enhancements
• Retaining Job/App Context
• Hive Enhancements
Page23 © Hortonworks Inc. 2014
Hive Enhancements
• Fast restarts + client-side reconnection
• Hive metastore and Hive client
• Hive-server2: statefull server that submits the client’s query
• Need to keep it running till the old queries complete
• Solution:
• Allow multiple Hive-servers to run, each registered in Zookeeper
• New client requests go to new servers
• Old server completes old queries but does not receive any new ones
– Old-server is removed from Zookeeper
• Side benefits
• HA + Load balancing solution for Hiveserver2
Page24 © Hortonworks Inc. 2014
Automated Rolling Upgrade
Via Ambari
Via Your own cluster management scripts
Page25 © Hortonworks Inc. 2014
HDP Rolling Upgrades Runbook
Pre-requisites
• HA
• Configs
Prepare
• Install bits
• DB backups
• HDFS checkpoint
Rolling
Upgrade
Finalize
Rolling
Downgrade
Rollback
NOT Rolling. Shutdown all
services.
Note: Upgrade time is proportional to # nodes, not data size
Page28 © Hortonworks Inc. 2014
Both Manual and Automated Rolling Upgrade
• Ambari supports fully automated upgrades
• Verifies prerequisites
• Performs HDFS upgrade-checkpoint, prompts for DB backups
• Performs rolling upgrade
• All the components, in the right order
• Smoke tests at each critical stages
• Opportunities for Admin verification at critical stages
• Downgrade if you change your mind
• Have published the runbook for those that do not use Ambari
• You can do it manually or automate your own process
Page29 © Hortonworks Inc. 2014
Runbook: Rolling Upgrade
Ambari has automated
process for Rolling Upgrades
Services are switched over to
new version in rolling fashion
Any components not installed
on cluster are skipped
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Finalize
HDFS, YARN, MR,
Tez, HBase, Pig.
Hive, Phoenix,
Mahout
HDFS
YARN
HBase
Page30 © Hortonworks Inc. 2014
Runbook: Rolling Downgrade
Zookeeper
Ranger
Core Masters
Core Slaves
Hive
Oozie
Falcon
Clients
Kafka
Knox
Storm
Slider
Flume
Hue
Downgrade
Finalize
Page31 © Hortonworks Inc. 2014
Summary
• Enterprises run critical services and data on a Hadoop cluster.
• Need a live cluster upgrade without degradation and maintaining SLAs
• We enhanced Hadoop components for enterprise-grade rolling upgrade
• Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, )
• Data safety
– HDFS checkpoints and write-pipelines
• Maintain SLAs – solve a number of service degradation problems
– HDFS write pipelines, Yarn RM, NM state recovery, Hive, …
• Jobs/apps continue to run correctly with the right context
• Allow downgrade/rollbacks in case of problems
• All enhancements truly open source and pushed back to Apache?
• Yes of course – that is how Hortonworks does business …
Page32 © Hortonworks Inc. 2014
Backup slides
Page33 © Hortonworks Inc. 2014
Why didn’t you use alternatives
• Alternatives generally keep one version active, not two
• We need to move some services as a pack (clients)
• We need to support managing confs and binaries together and
separately
• Maybe we could have done it, but it was getting complex …..

More Related Content

What's hot (20)

PPTX
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
serge luca
 
PDF
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
Lars Platzdasch
 
PPTX
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
PPTX
Apache Ambari - What's New in 2.0.0
Hortonworks
 
ODP
Zero Downtime JEE Architectures
Alexander Penev
 
PPT
SharePoint Backup And Disaster Recovery with Joel Oleson
Joel Oleson
 
PPTX
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
Michael Noel
 
PPTX
Improvements in Hadoop Security
DataWorks Summit
 
PPTX
SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 - AUSPC2012
Michael Noel
 
PDF
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
Lars Platzdasch
 
PDF
33616611930205162156 upgrade internals_19c
Locuto Riorama
 
PPTX
Best Practices for Virtualizing Hadoop
DataWorks Summit
 
PPTX
Siebel server cloning
Jeroen Burgers
 
PDF
An Overview of Ambari
Chicago Hadoop Users Group
 
PDF
Simplifying systems management with Dell OpenManage on 13G Dell PowerEdge ser...
Principled Technologies
 
PDF
20618782218718364253 emea12 vldb
Locuto Riorama
 
PPTX
SAP_HANA_Infra_V_1.1
Sandeep Mahindra
 
PDF
ukoug-soa-sig-june-2016 v0.5
Bruno Alves
 
PPTX
Using oracle cloud to speed up e business suite 12.2 upgrade
vasuballa
 
PDF
6212883126866262792 performance testing_cloud
Locuto Riorama
 
Unbreakable SharePoint 2016 with SQL Server 2016 Always On Availability groups
serge luca
 
SQL Server Alwayson for SharePoint HA/DR Step by Step Guide
Lars Platzdasch
 
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Apache Ambari - What's New in 2.0.0
Hortonworks
 
Zero Downtime JEE Architectures
Alexander Penev
 
SharePoint Backup And Disaster Recovery with Joel Oleson
Joel Oleson
 
SQL 2014 AlwaysOn Availability Groups for SharePoint Farms - SPS Sydney 2014
Michael Noel
 
Improvements in Hadoop Security
DataWorks Summit
 
SQL 2012 AlwaysOn Availability Groups for SharePoint 2010 - AUSPC2012
Michael Noel
 
SQL AlwaysON for SharePoint HA/DR on Azure Global Azure Bootcamp 2017 Eisenac...
Lars Platzdasch
 
33616611930205162156 upgrade internals_19c
Locuto Riorama
 
Best Practices for Virtualizing Hadoop
DataWorks Summit
 
Siebel server cloning
Jeroen Burgers
 
An Overview of Ambari
Chicago Hadoop Users Group
 
Simplifying systems management with Dell OpenManage on 13G Dell PowerEdge ser...
Principled Technologies
 
20618782218718364253 emea12 vldb
Locuto Riorama
 
SAP_HANA_Infra_V_1.1
Sandeep Mahindra
 
ukoug-soa-sig-june-2016 v0.5
Bruno Alves
 
Using oracle cloud to speed up e business suite 12.2 upgrade
vasuballa
 
6212883126866262792 performance testing_cloud
Locuto Riorama
 

Viewers also liked (9)

PPTX
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
 
DOCX
Upgrading hadoop
Shashwat Shriparv
 
PPTX
Slider: Applications on YARN
Steve Loughran
 
PDF
Yahoo!ブラウザーにおける市場環境の分析と戦略化
Yahoo!デベロッパーネットワーク
 
PPTX
Kafkaを活用するためのストリーム処理の基本
Sotaro Kimura
 
PPTX
Apache Slider
Shivaji Dutta
 
PDF
Twitterのリアルタイム分散処理システム「Storm」入門
AdvancedTechNight
 
PDF
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Yahoo!デベロッパーネットワーク
 
PDF
最近のストリーム処理事情振り返り
Sotaro Kimura
 
Upgrade Without the Headache: Best Practices for Upgrading Hadoop in Production
Cloudera, Inc.
 
Upgrading hadoop
Shashwat Shriparv
 
Slider: Applications on YARN
Steve Loughran
 
Yahoo!ブラウザーにおける市場環境の分析と戦略化
Yahoo!デベロッパーネットワーク
 
Kafkaを活用するためのストリーム処理の基本
Sotaro Kimura
 
Apache Slider
Shivaji Dutta
 
Twitterのリアルタイム分散処理システム「Storm」入門
AdvancedTechNight
 
Automation of Rolling Upgrade of Hadoop Cluster without Data Lost and Job Fai...
Yahoo!デベロッパーネットワーク
 
最近のストリーム処理事情振り返り
Sotaro Kimura
 
Ad

Similar to Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster (20)

PPTX
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
PPTX
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
DataWorks Summit/Hadoop Summit
 
PDF
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
PPTX
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 
PPTX
Managing Enterprise Hadoop Clusters with Apache Ambari
Hortonworks
 
PPTX
What's new in Ambari
DataWorks Summit
 
PPTX
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Ian Lumb
 
PPTX
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
PPTX
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
 
PPTX
Keep your hadoop cluster at its best! v4
Chris Nauroth
 
PDF
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
PPTX
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Hortonworks
 
PPTX
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
PPTX
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
PPTX
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
PDF
Keep your Hadoop cluster at its best!
Sheetal Dolas
 
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
PPTX
Hadoop: today and tomorrow
Steve Loughran
 
PDF
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
PPTX
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
 
Migrating your clusters and workloads from Hadoop 2 to Hadoop 3
DataWorks Summit
 
Apache Ambari - HDP Cluster Upgrades Operational Deep Dive and Troubleshooting
DataWorks Summit/Hadoop Summit
 
Hortonworks Technical Workshop: What's New in HDP 2.3
Hortonworks
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Jayush Luniya
 
Managing Enterprise Hadoop Clusters with Apache Ambari
Hortonworks
 
What's new in Ambari
DataWorks Summit
 
How to Upgrade Your Hadoop Stack in 1 Step -- with Zero Downtime
Ian Lumb
 
Hadoop operations-2014-strata-new-york-v5
Chris Nauroth
 
Keep your Hadoop Cluster at its Best
DataWorks Summit/Hadoop Summit
 
Keep your hadoop cluster at its best! v4
Chris Nauroth
 
Discover.hdp2.2.ambari.final[1]
Hortonworks
 
Streamline Apache Hadoop Operations with Apache Ambari and SmartSense
Hortonworks
 
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
Hadoop operations-2015-hadoop-summit-san-jose-v5
Chris Nauroth
 
A First-Hand Look at What's New in HDP 2.3
DataWorks Summit
 
Keep your Hadoop cluster at its best!
Sheetal Dolas
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Hadoop: today and tomorrow
Steve Loughran
 
Hortonworks Technical Workshop: Interactive Query with Apache Hive
Hortonworks
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Learn Computer Forensics, Second Edition
AnuraShantha7
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Learn Computer Forensics, Second Edition
AnuraShantha7
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Windsurf Meetup Ottawa 2025-07-12 - Planning Mode at Reliza.pdf
Pavel Shukhman
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 

Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster

  • 1. Page1 © Hortonworks Inc. 2014 Enterprise-Grade Rolling Upgrade for a Live Hadoop Cluster Sanjay Radia, Vinod Kumar Vavilapalli, Hortonworks Inc Page 1
  • 2. Page2 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 - Confidential Agenda •Introduction •What is Rolling Upgrade? •Problem – Several key issues to be addressed –Wire compatibility and side-by-side installs are not sufficient!! –Must Address: Data safety, Service degradation and disruption •Enhancements to various components –Packaging – side-by-side install –HDFS, Yarn, Hive, Oozie Page 2
  • 3. Page3 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 - Confidential Hello, my name is Sanjay Radia •Chief Architect, Founder, Hortonworks •Part of the Hadoop team at Yahoo! since 2007 –Chief Architect of Hadoop Core at Yahoo! –Apache Hadoop PMC and Committer • Prior –Data center automation, schedulers, virtualization, Java, HA, OSs, File Systems – (Startup, Sun Microsystems, Inria …) –Ph.D., University of Waterloo Page 3
  • 4. Page4 © Hortonworks Inc. 2014 HDP Upgrade: Two Upgrade Modes Stop the Cluster Upgrade Shutdown services and cluster and then upgrade. Traditionally this was the only way Rolling Upgrade Upgrade cluster and its services while cluster is actively running jobs and applications Note: Upgrade time is proportional to # nodes, not data size Enterprises run critical services and data on a Hadoop cluster. Need live cluster upgrade that maintains SLAs without degradation
  • 5. Page5 © Hortonworks Inc. 2014 © Hortonworks Inc. 2013 - Confidential But you can Revert to Prior State Rollback Revert bits and state of cluster and its services back to a checkpoint’d state. Why? This is an emergency procedure. Downgrade Downgrade the service and component to prior version, but keep any new data and metadata that has been generated Why? You are not happy with performance, or app compatibility, ….
  • 6. Page6 © Hortonworks Inc. 2014 But aren’t wire compatibility and side-by-side installs sufficient for Rolling upgrades? Unfortunately No!! Not if you want • Data safety • Keep running jobs/apps continue to run correctly • Maintain SLAs • Allow downgrade/rollbacks in case of problems Page 6
  • 7. Page7 © Hortonworks Inc. 2014 Issues that need to be addressed (1) • Data safety • HDFS’s upgrade checkpoint does not work for rolling upgrade • Service degradation – note every daemon is restarted in rolling fashion • HDFS write pipeline • Yarn App masters restart • Node manager restart • Hive server is processing client queries – it cannot restart to new version without loss • Client must not see failures – many components do not have retry BUT Hadoop deals with failures, it will fix pipelines, restart tasks – what is the big deal!! Service degradation will be high because every daemon is restarted
  • 8. Page8 © Hortonworks Inc. 2014 Issues that need to be addressed (2) • Maintaining the job submitters context (correctness) • Yarn tasks get their context from the local node – In the past the submitters and node’s context were identical – But with RU, a node’s binaries are being upgraded and hence may be inconsistent with submitter - Half of the job could execute with old binaries and the other with the new one!! • Persistent state • Backward compatibility for upgrade (or convert) • Forward compatibility for downgrade (or convert) • Wire compatibility • With clients (forward and backward) • Internally (Between Masters and Slaves or Peers) – Note: the upgrade is in a rolling fashion
  • 9. Page9 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 10. Page10 © Hortonworks Inc. 2014 Packaging: Side-by-side Installs (1) • Need side-by-side installs of multiple versions on same node • Some components are version N, while others are N+1 • For same component, some daemons version N, others N+1 on the same node (e.g. NN and DN) • HDP’s solution: Use OS-distro standard packaging solution • Rejected a proprietary packing solution (no lock-in) • Want to support RU via Ambari and Manually • Standard packaging solutions like RPMs have useful tools and mechanisms – Tools to install, uninstall, query, etc – Manage dependencies automatically – Admins do not need to learn new tools and formats • Side benefits for ‘stop-the-world” upgrade: • Can install the new binaries before the shutdown
  • 11. Page11 © Hortonworks Inc. 2014 Packaging: Side-by-side installs (2) • Layout: side-by-side • /usr/hdp/2.2.0.0/hadoop • /usr/hdp/2.2.0.0/hive • /usr/hdp/2.3.0.0/hadoop • /usr/hdp/2.3.0.0/hive • Define what is current for each component’s daemon and clients • /usr/hdp/current/hdfs-nn->/usr/hdp/2.3.0.0/hadoop • /usr/hdp/current/hadoop-client->/usr/hdp/2.2.0.0/hadoop • /usr/hdp/current/hdfs-dn->/usr/hdp/2.2.0.0/hadoop • Distro-select helps you manage the version switch • Our solution: the package name contains the version number: • E.g hadoop_2_2_0_0 is the RPM package name itself – Hadoop_2_3_0_0 is different peer package • Bin commands point to current: /usr/bin/hadoop->/usr/hdp/current/hadoop-client/bin/hadoop
  • 12. Page12 © Hortonworks Inc. 2014 Packaging: Side-by-side installs (3) • distro-select tool to select current binary • Per-component, Per-daemon • Maintain stack consistency – that is what QE tested • Each component refers to its siblings of same stack version • Each component knows the “hadoop home” of the same stack – Wrapper bin-scripts set this up • Config updates can be optionally synchronized with binary upgrade • Configs can sit in their old location • But what if the new binary version requires slightly different config? • Each binary version has its own config pointer – /usr/hdp/2.2.0.0/hadoop/conf -> /etc/hadoop/conf
  • 13. Page13 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 14. Page14 © Hortonworks Inc. 2014 HDFS Enhancements (1) Data safety • Since version 2007, HDFS supported an upgrade-checkpoint • Backups of HDFS not practical – too large • Protects against HDFS bugs in new version deleting files • Standard practice to use for ALL upgrade even patch releases • But this only works for “stop-the-world” full upgrade and does not support downgrade • Irresponsible to do rolling upgrade without such a mechanism HDP 2.2 has enhanced upgrade-checkpoint (HDFS-5535) • Markers for rollback • “Hardlinks” to protect against deletes due to bugs in the new version of HDFS code • Old scheme had hardlinks but we now delay the deletes • Added downgrade capability • Protobuf based fsImage for compatible extensibility
  • 15. Page15 © Hortonworks Inc. 2014 HDFS Enhancements (2) Minimize service degradation and retain data safety • Fast datanode restart (HDFS-5498) • Write pipeline – every DN will be upgraded and hence many write pipelines will break and repaired • Umbrella Jira HDFS-5535 – Repair it to the same DN during RU (avoid replica data copy) – Retain same number of replicas in pipeline • Upgrade HA standby and failover (NN HA available for a long time)
  • 16. Page16 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 17. Page17 © Hortonworks Inc. 2014 YARN Enhancements: Minimize Service Degradation • YARN RM retains app/job queue (2013) • YARN RM HA (2014) • Note this retains the queues but ALL jobs are restarted • Yarn RM can restart while retaining jobs (2015)
  • 18. Page18 © Hortonworks Inc. 2014 YARN Enhancements: Minimize Service Degradation • A restarted YARN NodeManager retains existing containers (2015) • Recall restarting containers will cause serious SLA degradation
  • 19. Page19 © Hortonworks Inc. 2014 YARN Enhancement: Compatibility • Versioning of state-stores of RM and NMs • Compatible evolution of tokens over time • Wire compatibility between mixed versions of RM
  • 20. Page20 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 21. Page21 © Hortonworks Inc. 2014 Retaining Job/App context Previously a Job/Apps uses libraries from the local node • Worked because client-node & compute-nodes had same version • But during RU, the node manager has multiple versions • Must use the same version as used by the client when submitting a job • Solution: • Framework libraries are now installed in HDFS • Client-context sent as “distro-version” variable in job config • Has side benefits – Frameworks now installed in single node and then uploaded to HDFS • Note Oozie also enhanced to maintain consistent context
  • 22. Page22 © Hortonworks Inc. 2014 Component Enhancements • Packaging – Side-by-side installs • HDFS Enhancements • Yarn Enhancements • Retaining Job/App Context • Hive Enhancements
  • 23. Page23 © Hortonworks Inc. 2014 Hive Enhancements • Fast restarts + client-side reconnection • Hive metastore and Hive client • Hive-server2: statefull server that submits the client’s query • Need to keep it running till the old queries complete • Solution: • Allow multiple Hive-servers to run, each registered in Zookeeper • New client requests go to new servers • Old server completes old queries but does not receive any new ones – Old-server is removed from Zookeeper • Side benefits • HA + Load balancing solution for Hiveserver2
  • 24. Page24 © Hortonworks Inc. 2014 Automated Rolling Upgrade Via Ambari Via Your own cluster management scripts
  • 25. Page25 © Hortonworks Inc. 2014 HDP Rolling Upgrades Runbook Pre-requisites • HA • Configs Prepare • Install bits • DB backups • HDFS checkpoint Rolling Upgrade Finalize Rolling Downgrade Rollback NOT Rolling. Shutdown all services. Note: Upgrade time is proportional to # nodes, not data size
  • 26. Page28 © Hortonworks Inc. 2014 Both Manual and Automated Rolling Upgrade • Ambari supports fully automated upgrades • Verifies prerequisites • Performs HDFS upgrade-checkpoint, prompts for DB backups • Performs rolling upgrade • All the components, in the right order • Smoke tests at each critical stages • Opportunities for Admin verification at critical stages • Downgrade if you change your mind • Have published the runbook for those that do not use Ambari • You can do it manually or automate your own process
  • 27. Page29 © Hortonworks Inc. 2014 Runbook: Rolling Upgrade Ambari has automated process for Rolling Upgrades Services are switched over to new version in rolling fashion Any components not installed on cluster are skipped Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Finalize HDFS, YARN, MR, Tez, HBase, Pig. Hive, Phoenix, Mahout HDFS YARN HBase
  • 28. Page30 © Hortonworks Inc. 2014 Runbook: Rolling Downgrade Zookeeper Ranger Core Masters Core Slaves Hive Oozie Falcon Clients Kafka Knox Storm Slider Flume Hue Downgrade Finalize
  • 29. Page31 © Hortonworks Inc. 2014 Summary • Enterprises run critical services and data on a Hadoop cluster. • Need a live cluster upgrade without degradation and maintaining SLAs • We enhanced Hadoop components for enterprise-grade rolling upgrade • Non-proprietary packaging solution using OS-standard solution (RPMs, Debs, ) • Data safety – HDFS checkpoints and write-pipelines • Maintain SLAs – solve a number of service degradation problems – HDFS write pipelines, Yarn RM, NM state recovery, Hive, … • Jobs/apps continue to run correctly with the right context • Allow downgrade/rollbacks in case of problems • All enhancements truly open source and pushed back to Apache? • Yes of course – that is how Hortonworks does business …
  • 30. Page32 © Hortonworks Inc. 2014 Backup slides
  • 31. Page33 © Hortonworks Inc. 2014 Why didn’t you use alternatives • Alternatives generally keep one version active, not two • We need to move some services as a pack (clients) • We need to support managing confs and binaries together and separately • Maybe we could have done it, but it was getting complex …..

Editor's Notes

  • #8: HDFS write pipeline – slow down writes, risk data Yarn App masters restart – app failure if App master does not have persistent state Node manager restart – Tasks fail, restarts, SLA degrades Hive server is processing client queries – it cannot restart for new version Client must not see failures – many components do not have retry