SlideShare a Scribd company logo
Plugging the Holes:
Security and Compatibility
       Owen O’Malley
    Yahoo! Hadoop Team
    owen@yahoo-inc.com
Who Am I?

  •  Software Architect working on Hadoop since Jan 2006
     –  Before Hadoop worked on Yahoo Search’s WebMap
     –  My first patch on Hadoop was Nutch-197
     –  First Yahoo Hadoop committer
     –  Most prolific contributor to Hadoop (by patch count)
     –  Won the 2008 1TB and 2009 Minute and 100TB Sort
        Benchmarks
  •  Apache VP of Hadoop
     –  Chair of the Hadoop Project Management Committee
     –  Quarterly reports on the state of Hadoop for Apache Board



Hadoop World NYC - 2009
What are the Problems?

  •  Our shared clusters increase:
     –  Developer and operations productivity
     –  Hardware utilization
     –  Access to data
  •  Yahoo! wants to put customer and financial data on our
     Hadoop clusters.
     –  Great for providing access to all of the parts of Yahoo!
     –  Need to make sure that only the authorized people have
        access.
  •  Rolling out new versions of Hadoop is painful
     –  Clients need to change and recompile their code

Hadoop World NYC - 2009
Hadoop Security

  •  Currently, the Hadoop servers trust the users to declare
     who they are.
     –  It is very easy to spoof, especially with open source.
     –  For private clusters, we will leave non-security as option
  •  We need to ensure that users are who they claim to be.
  •  All access to HDFS (and therefore MapReduce) must
     be authenticated.
  •  The standard distributed authentication service is
     Kerberos (including ActiveDirectory).
  •  User code isn’t affected, since the security happens in
     the RPC layer.
Hadoop World NYC - 2009
HDFS Security

  •  Hadoop security is grounded in HDFS security.
     –  Other services such as MapReduce store their state in HDFS.
  •  Use of Kerberos allows a single sign on where the
     Hadoop commands pick up and use the user’s tickets.
  •  The framework authenticates the user to the Name
     Node using Kerberos before any operations.
  •  The Name Node is also authenticated to the user.
  •  Client can request an HDFS Access Token to get
     access later without going through Kerberos again.
     –  Prevents authorization storms as MapReduce jobs launch!


Hadoop World NYC - 2009
Accessing a File

  •  User uses Kerberos (or a HDFS Access Token) to
     authenticate to the Name Node.
  •  They request to open a file X.
  •  If they have permission to file X, the Name Node
     returns a token for reading the blocks of X.
  •  The user uses these tokens when communicating with
     the Data Nodes to show they have access.
  •  There are also tokens for writing blocks when the file is
     being created.



Hadoop World NYC - 2009
MapReduce Security

  •  Framework authenticates user to Job Tracker before
     they can submit, modify, or kill jobs.
  •  The Job Tracker authenticates itself to the user.
  •  Job’s logs (including stdout) are only visible to the user.
  •  Map and Reduce tasks actually run as the user.
  •  Tasks’ working directories are protected from others.
  •  The Job Tracker’s system directory is no longer
     readable and writable by everyone.
  •  Only the reduce tasks can get the map outputs.


Hadoop World NYC - 2009
Interactions with HDFS

  •  MapReduce jobs need to read and write HDFS files as
     the user.
  •  Currently, we store the user name in the job.
  •  With security enabled, we will store HDFS Access
     Tokens in the job.
  •  The job needs a token for each HDFS cluster.
  •  The tokens will be renewed by the Job Tracker so they
     don’t expire for long running jobs.
  •  When the job completes, the tokens will be cancelled.



Hadoop World NYC - 2009
Interactions with Higher Layers

  •  Yahoo uses a workflow manager named Oozie to
     submits MapReduce jobs on behalf of the user.
  •  We could store the user’s credentials with a modifier
     (oom/oozie) in Oozie to access Hadoop as the user.
  •  Or we could create Token granting Tokens for HDFS
     and MapReduce and store those in Oozie.
  •  In either case, such proxies are a potential source of
     security problems, since they are storing large number
     of user’s access credentials.



Hadoop World NYC - 2009
Web UIs

  •  Hadoop and especially MapReduce make heavy use of
     the Web Uis.
  •  These need to be authenticated also…
  •  Fortunately, there is a standard solution for Kerberos
     and HTTP, named SPNEGO.
  •  SPNEGO is supported by all of the major browsers.
  •  All of the servlets will use SPNEGO to authenticate the
     user and enforce permissions appropriately.




Hadoop World NYC - 2009
Remaining Security Issues

  •  We are not encrypting on the wire.
     –  It will be possible within the framework, but not in 0.22.
  •  We are not encrypting on disk.
     –  For either HDFS or MapReduce.
  •  Encryption is expensive in terms of CPU and IO speed.
  •  Our current threat model is that the attacker has access
     to a user account, but not root.
     –  They can’t sniff the packets on the network.




Hadoop World NYC - 2009
Backwards Compatibility

  •  API
  •  Protocols
  •  File Formats
  •  Configuration




Hadoop World NYC - 2009
API Compatibility

  •  Need to mark APIs with
     –  Audience: Public, Limited Private, Private
     –  Stability: Stable, Evolving, Unstable
     @InterfaceAudience.Public
     @InterfaceStability.Stable
     public class Xxxx {…}
     –  Developers need to ensure that 0.22 is backwards compatible
        with 0.21
  •  Defined new APIs designed to be future-proof:
     –  MapReduce – Context objects in org.apache.hadoop.mapreduce
     –  HDFS – FileContext in org.apache.hadoop.fs

Hadoop World NYC - 2009
Protocol Compatibility

  •  Currently all clients of a server must be the same
     version (0.18, 0.19, 0.20, 0.21).
  •  Want to enable forward and backward compatibility
  •  Started work on Avro
     –  Includes the schema of the information as well as the data
     –  Can support different schemas on the client and server
     –  Still need to make the code tolerant of version differences
     –  Avro provides the mechanisms
  •  Avro will be used for file version compatibility too



Hadoop World NYC - 2009
Configuration

  •  Configuration in Hadoop is a string to string map.
  •  Maintaining backwards compatibility of configuration
     knobs was done case by case.
  •  Now we have standard infrastructure for declaring old
     knobs deprecated.
  •  Also have cleaned up a lot of the names in 0.21.




Hadoop World NYC - 2009
Questions?

  •  Thanks for coming!
  •  Mailing lists:
     –  common-user@hadoop.apache.org
     –  hdfs-user@hadoop.apache.org
     –  mapreduce-user@hadoop.apache.org
  •  Slides posted on the Hadoop wiki page
     –  http://wiki.apache.org/hadoop/HadoopPresentations




Hadoop World NYC - 2009

More Related Content

What's hot (19)

PPTX
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 
PDF
HBase Advanced - Lars George
JAX London
 
PPTX
HDFS- What is New and Future
DataWorks Summit
 
PPTX
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop & Security - Past, Present, Future
Uwe Printz
 
PPTX
Hdp security overview
Hortonworks
 
PDF
Hadoop security
Biju Nair
 
PPTX
Hadoop Security Today and Tomorrow
DataWorks Summit
 
PPTX
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
 
PPTX
Best Practices for Virtualizing Hadoop
DataWorks Summit
 
PPTX
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
PDF
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit
 
PDF
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
PDF
Difference between hadoop 2 vs hadoop 3
Manish Chopra
 
ODP
Hadoop Ecosystem Overview
Gerrit van Vuuren
 
PDF
Hadoop security overview_hit2012_1117rev
Jason Shih
 
PPTX
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
PPTX
Hello OpenStack, Meet Hadoop
DataWorks Summit
 
Hadoop Meetup Jan 2019 - Overview of Ozone
Erik Krogen
 
HBase Advanced - Lars George
JAX London
 
HDFS- What is New and Future
DataWorks Summit
 
Apache HBase: State of the Union
DataWorks Summit/Hadoop Summit
 
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Hdp security overview
Hortonworks
 
Hadoop security
Biju Nair
 
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Compression Options in Hadoop - A Tale of Tradeoffs
DataWorks Summit
 
Best Practices for Virtualizing Hadoop
DataWorks Summit
 
Hadoop Operations - Best Practices from the Field
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Accumulo Summit
 
Improving Hadoop Cluster Performance via Linux Configuration
Alex Moundalexis
 
Difference between hadoop 2 vs hadoop 3
Manish Chopra
 
Hadoop Ecosystem Overview
Gerrit van Vuuren
 
Hadoop security overview_hit2012_1117rev
Jason Shih
 
Storage and-compute-hdfs-map reduce
Chris Nauroth
 
Hello OpenStack, Meet Hadoop
DataWorks Summit
 

Similar to Hw09 Security And Api Compatibility (20)

PPTX
Getting started big data
Kibrom Gebrehiwot
 
PPTX
Improvements in Hadoop Security
DataWorks Summit
 
PPTX
Improvements in Hadoop Security
Chris Nauroth
 
PPTX
Hadoop.pptx
arslanhaneef
 
PPTX
Hadoop.pptx
sonukumar379092
 
PPTX
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
PDF
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CloudIDSummit
 
PPTX
Hadoop ppt1
chariorienit
 
PPTX
Big data - Online Training
Learntek1
 
PPTX
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
PPTX
Hive
Manas Nayak
 
PDF
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
PPTX
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
PPTX
Hadoop and their in big data analysis EcoSystem.pptx
Rahul Borate
 
PPTX
Apache hadoop technology : Beginners
Shweta Patnaik
 
PPTX
Apache hadoop technology : Beginners
Shweta Patnaik
 
PPTX
Apache hadoop technology : Beginners
Shweta Patnaik
 
PPTX
Hadoop And Their Ecosystem ppt
sunera pathan
 
PPTX
Hadoop And Their Ecosystem
sunera pathan
 
PDF
Petabyte scale on commodity infrastructure
elliando dias
 
Getting started big data
Kibrom Gebrehiwot
 
Improvements in Hadoop Security
DataWorks Summit
 
Improvements in Hadoop Security
Chris Nauroth
 
Hadoop.pptx
arslanhaneef
 
Hadoop.pptx
sonukumar379092
 
List of Engineering Colleges in Uttarakhand
Roorkee College of Engineering, Roorkee
 
CIS13: Big Data Platform Vendor’s Perspective: Insights from the Bleeding Edge
CloudIDSummit
 
Hadoop ppt1
chariorienit
 
Big data - Online Training
Learntek1
 
Introduction to Hadoop Administration
Ramesh Pabba - seeking new projects
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
tcloudcomputing-tw
 
02 Hadoop.pptx HADOOP VENNELA DONTHIREDDY
Venneladonthireddy1
 
Hadoop and their in big data analysis EcoSystem.pptx
Rahul Borate
 
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Shweta Patnaik
 
Apache hadoop technology : Beginners
Shweta Patnaik
 
Hadoop And Their Ecosystem ppt
sunera pathan
 
Hadoop And Their Ecosystem
sunera pathan
 
Petabyte scale on commodity infrastructure
elliando dias
 
Ad

More from Cloudera, Inc. (20)

PPTX
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
PPTX
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
PPTX
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
PPTX
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
PPTX
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
PPTX
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
PPTX
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
PPTX
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
PPTX
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
PPTX
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
PPTX
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
PPTX
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
PPTX
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
PPTX
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
PPTX
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
PPTX
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
PPTX
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Partner Briefing_January 25 (FINAL).pptx
Cloudera, Inc.
 
Cloudera Data Impact Awards 2021 - Finalists
Cloudera, Inc.
 
2020 Cloudera Data Impact Awards Finalists
Cloudera, Inc.
 
Edc event vienna presentation 1 oct 2019
Cloudera, Inc.
 
Machine Learning with Limited Labeled Data 4/3/19
Cloudera, Inc.
 
Data Driven With the Cloudera Modern Data Warehouse 3.19.19
Cloudera, Inc.
 
Introducing Cloudera DataFlow (CDF) 2.13.19
Cloudera, Inc.
 
Introducing Cloudera Data Science Workbench for HDP 2.12.19
Cloudera, Inc.
 
Shortening the Sales Cycle with a Modern Data Warehouse 1.30.19
Cloudera, Inc.
 
Leveraging the cloud for analytics and machine learning 1.29.19
Cloudera, Inc.
 
Modernizing the Legacy Data Warehouse – What, Why, and How 1.23.19
Cloudera, Inc.
 
Leveraging the Cloud for Big Data Analytics 12.11.18
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 3
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 2
Cloudera, Inc.
 
Modern Data Warehouse Fundamentals Part 1
Cloudera, Inc.
 
Extending Cloudera SDX beyond the Platform
Cloudera, Inc.
 
Federated Learning: ML with Privacy on the Edge 11.15.18
Cloudera, Inc.
 
Analyst Webinar: Doing a 180 on Customer 360
Cloudera, Inc.
 
Build a modern platform for anti-money laundering 9.19.18
Cloudera, Inc.
 
Introducing the data science sandbox as a service 8.30.18
Cloudera, Inc.
 
Ad

Recently uploaded (20)

PDF
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
PDF
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
How Startups Are Growing Faster with App Developers in Australia.pdf
India App Developer
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Python basic programing language for automation
DanialHabibi2
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
LLMs.txt: Easily Control How AI Crawls Your Site
Keploy
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
From Sci-Fi to Reality: Exploring AI Evolution
Svetlana Meissner
 
Empower Inclusion Through Accessible Java Applications
Ana-Maria Mihalceanu
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 

Hw09 Security And Api Compatibility

  • 1. Plugging the Holes: Security and Compatibility Owen O’Malley Yahoo! Hadoop Team [email protected]
  • 2. Who Am I? •  Software Architect working on Hadoop since Jan 2006 –  Before Hadoop worked on Yahoo Search’s WebMap –  My first patch on Hadoop was Nutch-197 –  First Yahoo Hadoop committer –  Most prolific contributor to Hadoop (by patch count) –  Won the 2008 1TB and 2009 Minute and 100TB Sort Benchmarks •  Apache VP of Hadoop –  Chair of the Hadoop Project Management Committee –  Quarterly reports on the state of Hadoop for Apache Board Hadoop World NYC - 2009
  • 3. What are the Problems? •  Our shared clusters increase: –  Developer and operations productivity –  Hardware utilization –  Access to data •  Yahoo! wants to put customer and financial data on our Hadoop clusters. –  Great for providing access to all of the parts of Yahoo! –  Need to make sure that only the authorized people have access. •  Rolling out new versions of Hadoop is painful –  Clients need to change and recompile their code Hadoop World NYC - 2009
  • 4. Hadoop Security •  Currently, the Hadoop servers trust the users to declare who they are. –  It is very easy to spoof, especially with open source. –  For private clusters, we will leave non-security as option •  We need to ensure that users are who they claim to be. •  All access to HDFS (and therefore MapReduce) must be authenticated. •  The standard distributed authentication service is Kerberos (including ActiveDirectory). •  User code isn’t affected, since the security happens in the RPC layer. Hadoop World NYC - 2009
  • 5. HDFS Security •  Hadoop security is grounded in HDFS security. –  Other services such as MapReduce store their state in HDFS. •  Use of Kerberos allows a single sign on where the Hadoop commands pick up and use the user’s tickets. •  The framework authenticates the user to the Name Node using Kerberos before any operations. •  The Name Node is also authenticated to the user. •  Client can request an HDFS Access Token to get access later without going through Kerberos again. –  Prevents authorization storms as MapReduce jobs launch! Hadoop World NYC - 2009
  • 6. Accessing a File •  User uses Kerberos (or a HDFS Access Token) to authenticate to the Name Node. •  They request to open a file X. •  If they have permission to file X, the Name Node returns a token for reading the blocks of X. •  The user uses these tokens when communicating with the Data Nodes to show they have access. •  There are also tokens for writing blocks when the file is being created. Hadoop World NYC - 2009
  • 7. MapReduce Security •  Framework authenticates user to Job Tracker before they can submit, modify, or kill jobs. •  The Job Tracker authenticates itself to the user. •  Job’s logs (including stdout) are only visible to the user. •  Map and Reduce tasks actually run as the user. •  Tasks’ working directories are protected from others. •  The Job Tracker’s system directory is no longer readable and writable by everyone. •  Only the reduce tasks can get the map outputs. Hadoop World NYC - 2009
  • 8. Interactions with HDFS •  MapReduce jobs need to read and write HDFS files as the user. •  Currently, we store the user name in the job. •  With security enabled, we will store HDFS Access Tokens in the job. •  The job needs a token for each HDFS cluster. •  The tokens will be renewed by the Job Tracker so they don’t expire for long running jobs. •  When the job completes, the tokens will be cancelled. Hadoop World NYC - 2009
  • 9. Interactions with Higher Layers •  Yahoo uses a workflow manager named Oozie to submits MapReduce jobs on behalf of the user. •  We could store the user’s credentials with a modifier (oom/oozie) in Oozie to access Hadoop as the user. •  Or we could create Token granting Tokens for HDFS and MapReduce and store those in Oozie. •  In either case, such proxies are a potential source of security problems, since they are storing large number of user’s access credentials. Hadoop World NYC - 2009
  • 10. Web UIs •  Hadoop and especially MapReduce make heavy use of the Web Uis. •  These need to be authenticated also… •  Fortunately, there is a standard solution for Kerberos and HTTP, named SPNEGO. •  SPNEGO is supported by all of the major browsers. •  All of the servlets will use SPNEGO to authenticate the user and enforce permissions appropriately. Hadoop World NYC - 2009
  • 11. Remaining Security Issues •  We are not encrypting on the wire. –  It will be possible within the framework, but not in 0.22. •  We are not encrypting on disk. –  For either HDFS or MapReduce. •  Encryption is expensive in terms of CPU and IO speed. •  Our current threat model is that the attacker has access to a user account, but not root. –  They can’t sniff the packets on the network. Hadoop World NYC - 2009
  • 12. Backwards Compatibility •  API •  Protocols •  File Formats •  Configuration Hadoop World NYC - 2009
  • 13. API Compatibility •  Need to mark APIs with –  Audience: Public, Limited Private, Private –  Stability: Stable, Evolving, Unstable @InterfaceAudience.Public @InterfaceStability.Stable public class Xxxx {…} –  Developers need to ensure that 0.22 is backwards compatible with 0.21 •  Defined new APIs designed to be future-proof: –  MapReduce – Context objects in org.apache.hadoop.mapreduce –  HDFS – FileContext in org.apache.hadoop.fs Hadoop World NYC - 2009
  • 14. Protocol Compatibility •  Currently all clients of a server must be the same version (0.18, 0.19, 0.20, 0.21). •  Want to enable forward and backward compatibility •  Started work on Avro –  Includes the schema of the information as well as the data –  Can support different schemas on the client and server –  Still need to make the code tolerant of version differences –  Avro provides the mechanisms •  Avro will be used for file version compatibility too Hadoop World NYC - 2009
  • 15. Configuration •  Configuration in Hadoop is a string to string map. •  Maintaining backwards compatibility of configuration knobs was done case by case. •  Now we have standard infrastructure for declaring old knobs deprecated. •  Also have cleaned up a lot of the names in 0.21. Hadoop World NYC - 2009
  • 16. Questions? •  Thanks for coming! •  Mailing lists: –  [email protected] –  [email protected] –  [email protected] •  Slides posted on the Hadoop wiki page –  http://wiki.apache.org/hadoop/HadoopPresentations Hadoop World NYC - 2009