SlideShare a Scribd company logo
I accidentally the Namenode
HDFS reliability at Facebook


Andrew Ryan
Facebook
April 2012
The HDFS Namenode: SPOF by design

▪  Single   Point of Failure by                Namenode          Secondary
                                                                 Namenode
  design
▪  All
     metadata operations
  go through Namenode
▪  Earlydesigners made
  tradeoffs: features &                     Data          Datanode
  performance first               Clients



                                            Simplified HDFS
                                            Architecture:
                                            Namenode as SPOF
HDFS major use cases at Facebook
Data Warehouse and Facebook Messages

                         Data Warehouse       Facebook
                                              Messages
   # of clusters         <10                  10’s
   Size of clusters      Large                Small
                         (100’s – 1000’s of   (~100 nodes)
                         nodes)
   Processing workload   MapReduce batch      HBase
                         jobs                 transactions
   Namenode load         Very heavy           Very light
   End-user downtime     None                 Users without
   impact                                     Messages
HDFS at Facebook: 2009-2012
Some things have changed…


                                        2009         2012
  # HDFS clusters                       1            >100

  Largest HDFS cluster size (TB)        600TB        >100PB

  Largest HDFS cluster size (# files)   10 million   200 million

  HDFS cluster types                    MapReduce    MapReduce,
                                                     HBase, MySQL
                                                     backups, +more
HDFS at Facebook: 2009-2012
…and some things have not


                                     2009          2012
  Single points of failure in HDFS   Namenode      Namenode

  HDFS cluster restart time          60 minutes    60 minutes

  Namenode failover method           Manual,       Manual,
                                     complicated   complicated
  SPOF Namenode as a cause of        Unknown       Unknown
  downtime
Data Warehouse

▪  Storageand querying of           UI Tools
 structured log data using
 Hive and Hadoop               Workflow (Nocron)
 MapReduce
                                 Query (Hive)
▪  Composed of dozens of
 tools/components
                             Compute (MapReduce)
▪  A
   “vigorous and creative”
 user population                Storage (HDFS)
                                    Hadoop
Data Warehouse: all incidents
41% are HDFS-related
Data Warehouse: SPOF Namenode
incidents
10% are SPOF Namenode
Facebook Messages
       Clients
                                    User Directory Service
(www, chat, MTA, etc.)


Messages Cell                                Mail


 Application Server                 Anti-spam
                                                             Outbound
                                                             Mail
HBase/HDFS/ZK                       Mail Servers




                         Haystack
Messages: all incidents
16% are HDFS-related
Messages: SPOF Namenode incidents
10% are SPOF Namenode
What would happen if…
Instead of this…
                           Namenode           Secondary
                                              Namenode




                       Data           Datanode
            Clients



                   Simplified HDFS Architecture:
                   Namenode as SPOF
What would happen if…
We had this!
                             Primary           Standby
                            Namenode          Namenode




                         Data          Datanode
               Clients



                   Simplified HDFS Architecture:
                   Highly Available Namenode
AvatarNode is our solution




   AvatarNode client view    AvatarNode datanode view
AvatarNode is…
▪  A    two-node, highly available Namenode with manual failover
▪  In   production today at Facebook
▪  Open-sourced,  based on Hadoop 0.20:
 https://github.com/facebook/hadoop-20
AvatarNode does not…
▪  Eliminate   the dependency on shared storage for image/edits
▪  Provide   instant failover (~1 second per million blocks+files)
▪  Provide   automated failover
▪  Guarantee    I/O fencing for Primary/Standby (although precautions are
 taken)
▪  Require   Zookeeper at all times for proper normal operation (required for
 failover)
▪  Allow   for >2 Namenodes to participate in an HA cluster
▪  Have    any special network requirements
Wrapping up…
▪  The   SPOF Namenode is a weak link of HDFS’s design
▪  In   our services which use HDFS, we estimate we could eliminate:
  ▪    10% of service downtime from unscheduled outages
  ▪    20-50% of downtime from scheduled maintenance
▪  AvatarNode    is Facebook’s solution for 0.20, available today
▪  Other
      Namenode HA solutions are being worked on in HDFS trunk
 (HDFS-1623)
Questions?
Sessions will resume at 11:25am



                                  Page 19

More Related Content

What's hot (20)

PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
PPTX
Image compression models
priyadharshini murugan
 
PDF
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
PPTX
Deep learning
Ratnakar Pandey
 
PPTX
MANET in Mobile Computing
KABILESH RAMAR
 
PDF
Top 16 Applications of Computer Vision in Video Surveillance and Security.pdf
Sabhanayagham Thirugnanasambandam
 
ODP
An Introduction to Computer Vision
guestd1b1b5
 
PPT
Lzw coding technique for image compression
Tata Consultancy Services
 
PDF
Digital Image Processing - Image Restoration
Mathankumar S
 
PDF
Introduction to Recurrent Neural Network
Yan Xu
 
PPTX
Image processing ppt
Raviteja Chowdary Adusumalli
 
PPTX
1.arithmetic & logical operations
mukesh bhardwaj
 
PPTX
Unit ii
Chetan Selukar
 
PPTX
Fault tolerance in distributed systems
sumitjain2013
 
PPTX
Black hole attack
Richa Kumari
 
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
DOC
Distributed Operating System,Network OS and Middle-ware.??
Abdul Aslam
 
PPTX
AlexNet
Bertil Hatt
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Image compression models
priyadharshini murugan
 
Transforming deep into transformers – a computer vision approach
Ferdin Joe John Joseph PhD
 
Deep learning
Ratnakar Pandey
 
MANET in Mobile Computing
KABILESH RAMAR
 
Top 16 Applications of Computer Vision in Video Surveillance and Security.pdf
Sabhanayagham Thirugnanasambandam
 
An Introduction to Computer Vision
guestd1b1b5
 
Lzw coding technique for image compression
Tata Consultancy Services
 
Digital Image Processing - Image Restoration
Mathankumar S
 
Introduction to Recurrent Neural Network
Yan Xu
 
Image processing ppt
Raviteja Chowdary Adusumalli
 
1.arithmetic & logical operations
mukesh bhardwaj
 
Fault tolerance in distributed systems
sumitjain2013
 
Black hole attack
Richa Kumari
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
Distributed Operating System,Network OS and Middle-ware.??
Abdul Aslam
 
AlexNet
Bertil Hatt
 

Viewers also liked (20)

PDF
Hadoop Successes and Failures to Drive Deployment Evolution
Benoit Perroud
 
KEY
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Bill Graham
 
PDF
Storage Infrastructure Behind Facebook Messages
yarapavan
 
PDF
Hadoop 101 v1
John Berns
 
PDF
HBase @ Twitter
ctrezzo
 
PPTX
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Cloudera, Inc.
 
PDF
storm at twitter
Krishna Gade
 
KEY
Intro To Hadoop
Bill Graham
 
PPTX
Hadoop fault tolerance
Pallav Jha
 
PDF
Big data: Loading your data with flume and sqoop
Christophe Marchal
 
PPTX
Big data components - Introduction to Flume, Pig and Sqoop
Jeyamariappan Guru
 
PDF
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Treasure Data, Inc.
 
PDF
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Spark Summit
 
PDF
Transperancy & Accountability
Nusret Guclu
 
PDF
HP Vertica basics
Vijayananda Mohire
 
PPTX
Cloudera's Flume
Cloudera, Inc.
 
PPTX
Vertica
Samchu Li
 
PPTX
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
 
PPT
Scalable Web Architecture
Aleksandr Tsertkov
 
PPTX
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Hadoop Successes and Failures to Drive Deployment Evolution
Benoit Perroud
 
Hadoop Summit 2012 - Hadoop and Vertica: The Data Analytics Platform at Twitter
Bill Graham
 
Storage Infrastructure Behind Facebook Messages
yarapavan
 
Hadoop 101 v1
John Berns
 
HBase @ Twitter
ctrezzo
 
Hadoop Summit 2012 | HBase Consistency and Performance Improvements
Cloudera, Inc.
 
storm at twitter
Krishna Gade
 
Intro To Hadoop
Bill Graham
 
Hadoop fault tolerance
Pallav Jha
 
Big data: Loading your data with flume and sqoop
Christophe Marchal
 
Big data components - Introduction to Flume, Pig and Sqoop
Jeyamariappan Guru
 
Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012
Treasure Data, Inc.
 
Sqoop on Spark for Data Ingestion-(Veena Basavaraj and Vinoth Chandar, Uber)
Spark Summit
 
Transperancy & Accountability
Nusret Guclu
 
HP Vertica basics
Vijayananda Mohire
 
Cloudera's Flume
Cloudera, Inc.
 
Vertica
Samchu Li
 
Apache Tez: Accelerating Hadoop Query Processing
Hortonworks
 
Scalable Web Architecture
Aleksandr Tsertkov
 
Using Kafka and Kudu for fast, low-latency SQL analytics on streaming data
Mike Percy
 
Ad

Similar to Hadoop Distributed File System Reliability and Durability at Facebook (20)

PDF
Hadoop Distributed File System
elliando dias
 
PDF
635 642
Editor IJARCET
 
PDF
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
PDF
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
PPTX
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
NIKHILGR3
 
PPT
Hadoop
Girish Khanzode
 
PPT
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
yaevents
 
PPTX
Hadoop.pptx
sonukumar379092
 
PPTX
Hadoop.pptx
arslanhaneef
 
PPTX
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
PPTX
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
ODP
Apache Hadoop HDFS
Mike Frampton
 
PPTX
Managing Big data with Hadoop
Nalini Mehta
 
PPTX
NameNode Analytics - Querying HDFS Namespace in Real Time
Plamen Jeliazkov
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PDF
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PPTX
Hadoop Distributed File System
NilaNila16
 
PPTX
Hadoop ppt1
chariorienit
 
PPTX
Hadoop Fundamentals
its_skm
 
Hadoop Distributed File System
elliando dias
 
IRJET- Performing Load Balancing between Namenodes in HDFS
IRJET Journal
 
Apache Hadoop YARN, NameNode HA, HDFS Federation
Adam Kawa
 
Lecture10_CloudServicesModel_MapReduceHDFS.pptx
NIKHILGR3
 
Масштабируемость Hadoop в Facebook. Дмитрий Мольков, Facebook
yaevents
 
Hadoop.pptx
sonukumar379092
 
Hadoop.pptx
arslanhaneef
 
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Apache Hadoop HDFS
Mike Frampton
 
Managing Big data with Hadoop
Nalini Mehta
 
NameNode Analytics - Querying HDFS Namespace in Real Time
Plamen Jeliazkov
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Hadoop Distributed File System
NilaNila16
 
Hadoop ppt1
chariorienit
 
Hadoop Fundamentals
its_skm
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
July Patch Tuesday
Ivanti
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
July Patch Tuesday
Ivanti
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
What Makes Contify’s News API Stand Out: Key Features at a Glance
Contify
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 

Hadoop Distributed File System Reliability and Durability at Facebook

  • 1. I accidentally the Namenode HDFS reliability at Facebook Andrew Ryan Facebook April 2012
  • 2. The HDFS Namenode: SPOF by design ▪  Single Point of Failure by Namenode Secondary Namenode design ▪  All metadata operations go through Namenode ▪  Earlydesigners made tradeoffs: features & Data Datanode performance first Clients Simplified HDFS Architecture: Namenode as SPOF
  • 3. HDFS major use cases at Facebook Data Warehouse and Facebook Messages Data Warehouse Facebook Messages # of clusters <10 10’s Size of clusters Large Small (100’s – 1000’s of (~100 nodes) nodes) Processing workload MapReduce batch HBase jobs transactions Namenode load Very heavy Very light End-user downtime None Users without impact Messages
  • 4. HDFS at Facebook: 2009-2012 Some things have changed… 2009 2012 # HDFS clusters 1 >100 Largest HDFS cluster size (TB) 600TB >100PB Largest HDFS cluster size (# files) 10 million 200 million HDFS cluster types MapReduce MapReduce, HBase, MySQL backups, +more
  • 5. HDFS at Facebook: 2009-2012 …and some things have not 2009 2012 Single points of failure in HDFS Namenode Namenode HDFS cluster restart time 60 minutes 60 minutes Namenode failover method Manual, Manual, complicated complicated SPOF Namenode as a cause of Unknown Unknown downtime
  • 6. Data Warehouse ▪  Storageand querying of UI Tools structured log data using Hive and Hadoop Workflow (Nocron) MapReduce Query (Hive) ▪  Composed of dozens of tools/components Compute (MapReduce) ▪  A “vigorous and creative” user population Storage (HDFS) Hadoop
  • 7. Data Warehouse: all incidents 41% are HDFS-related
  • 8. Data Warehouse: SPOF Namenode incidents 10% are SPOF Namenode
  • 9. Facebook Messages Clients User Directory Service (www, chat, MTA, etc.) Messages Cell Mail Application Server Anti-spam Outbound Mail HBase/HDFS/ZK Mail Servers Haystack
  • 10. Messages: all incidents 16% are HDFS-related
  • 11. Messages: SPOF Namenode incidents 10% are SPOF Namenode
  • 12. What would happen if… Instead of this… Namenode Secondary Namenode Data Datanode Clients Simplified HDFS Architecture: Namenode as SPOF
  • 13. What would happen if… We had this! Primary Standby Namenode Namenode Data Datanode Clients Simplified HDFS Architecture: Highly Available Namenode
  • 14. AvatarNode is our solution AvatarNode client view AvatarNode datanode view
  • 15. AvatarNode is… ▪  A two-node, highly available Namenode with manual failover ▪  In production today at Facebook ▪  Open-sourced, based on Hadoop 0.20: https://github.com/facebook/hadoop-20
  • 16. AvatarNode does not… ▪  Eliminate the dependency on shared storage for image/edits ▪  Provide instant failover (~1 second per million blocks+files) ▪  Provide automated failover ▪  Guarantee I/O fencing for Primary/Standby (although precautions are taken) ▪  Require Zookeeper at all times for proper normal operation (required for failover) ▪  Allow for >2 Namenodes to participate in an HA cluster ▪  Have any special network requirements
  • 17. Wrapping up… ▪  The SPOF Namenode is a weak link of HDFS’s design ▪  In our services which use HDFS, we estimate we could eliminate: ▪  10% of service downtime from unscheduled outages ▪  20-50% of downtime from scheduled maintenance ▪  AvatarNode is Facebook’s solution for 0.20, available today ▪  Other Namenode HA solutions are being worked on in HDFS trunk (HDFS-1623)
  • 19. Sessions will resume at 11:25am Page 19