SlideShare a Scribd company logo
© Hortonworks Inc. 2015
Hadoop and Kerberos:
The madness beyond the
gate
Steve Loughran
stevel@hortonworks.com
@steveloughran
2015
Page 2
Me: Before Kerberos
© Hortonworks Inc.
Page 3
Me: After Kerberos
© Hortonworks Inc.
Page 4
HP Lovecraft Kerberos
Evil lurking in New England MIT Project Athena
Ancient, inhuman deities Kerberos Domain Controller
Manuscripts to drive the reader
insane
IETF RFC 4120
Entities never spoken of aloud UserGroupInformation
Doomed explorers of darkness You
© Hortonworks Inc. 2015
Leave now if you want
to retain your life of
naïve innocence
Page 5
© Hortonworks Inc.
Page 6
© Hortonworks Inc. 2015
export HADOOP_USER="root"
Page 7
© Hortonworks Inc. 2015
Modern Hadoop clusters
are locked down
through Kerberos
Page 8
© Hortonworks Inc. 2015
Discover Kerberos
before Kerberos
discovers you
Page 9
© Hortonworks Inc. 2015
Kerberos:
the gateway to hell
Page 10
© Hortonworks Inc.
This is not a metaphor
Art: Andrés Álvarez Iglesias
© Hortonworks Inc. 2015
KP
Kerberos is the gateway
Page 12
Authentication Service
Ticket Granting Service
Principal
user@REALM
user/hostname@REALM
(P, TGS, n1)
{KP.TGS, n1}KP, {ticket(P,TGS)} KTGS
Ticket(P, TGS) =
(TGS, P, tstart, tend, KPT)
KP
{KP.S, n2}KP, {ticket(P,S)} KS
{auth(P)}KP.TGS,{ticket(P,TGS)}KTGS,S,n2
KTGS
Kerberos Domain ControllerClient
auth(P)KP.TGS = {P, time)}KP.TGS
© Hortonworks Inc
Every service is a principal
alice@REALM
bob@REALM
oozie/ooziehost@REALM
namenode/nn1@REALM
hdfs/_HOST@REALM
hdfs/r04s12@REALM
hdfs/r04s13@REALM
yarn/_HOST@REALM
yarn/r04s12@REALM
HTTP/_HOST@REALM
Page 13
short names:
alice
bob
oozie
namenode
hdfs
yarn
HTTP
© Hortonworks Inc.
Page 14
Entering the darkness
Hadoop and Kerberos: the Madness Beyond the Gate
© Hortonworks Inc.
Page 16
© Hortonworks Inc. 2015
HDFS Bootstrap: Kerberos Login
Page 17
shared keytab in /etc/hadoop
log in to kerberos
datanode/_HOST@REALM
tickets for TGS
namenode/nn@REALM
© Hortonworks Inc. 2015
HDFS Bootstrap: DNs register with NN
Page 18
shared keytab in /etc/hadoop
DN registration
Ticket for namenode/nn@REALM
ExportedBlockKeys
Request ticket for namenode/nn@REALM
namenode/nn@REALM
datanode/_HOST@REALM
© Hortonworks Inc.
Hadoop Tokens
• Issued and tracked by individual services
(HDFS, WebHDFS, Timeline Server, YARN RM, …)
• Grant some form of access:
Block tokens, Delegation Tokens
• Can be passed on to other processes
• Renewable via service APIs (RPC, HTTP)
• Revocable in server via service APIs
Page 19
read: O'Malley 2009, Hadoop Security Architecture
© Hortonworks Inc. 2015
HDFS IO: Block Tokens
Page 20
alice@REALM
Obtain ticket for namenode/nn@REALM
BlockToken
BlockToken
BlockToken: userId, (BlockPoolId, BlockId), keyId, expiryDate, access-modes
namenode/nn@REALM
open("file")
© Hortonworks Inc. 2015
service/host@REALM
Delegation Tokens delegate access
Page 21
alice@REALM BlockToken
HDFS
Delegation
Token
BlockToken
HDFS
Delegation
Token
HDFS
Delegation
Token
namenode/nn@REALM
Token
Obtain ticket for namenode/nn@REALM
Request delegation
token
© Hortonworks Inc. 2015
Launch Context
YARN app launch
Page 22
alice@REALM
HDFS
Delegation
Token
HDFS
resourcemanager/rm@REALM
nodemanager/_HOST@REALMalice
Launch Context
AM/RM
HDFS AM/RM
HDFS
HDFS
HDFS
AM/RM
namenode/nn@REALM
Obtain ticket for resourcemanager/rm@REALM
Request delegation
token
AM/RM
Token
Obtain tickvet for namenode/nn@REALM
AM/RM'
AM/RM'
AM/RM'
Refresh AM/RM
© Hortonworks Inc
That which must not be named: UGI
if(!UserGroupInformation.isSecurityEnabled()) {
stayInALifeOfNaiveInnocence();
} else {
sufferTheEnternalPainOfKerberos();
}
UserGroupInformation.checkTGTAndReloginFromKeytab();
UserGroupInformation.getLoginUser() // principal logged in as
UserGroupInformation.getCurrentUser() // principal acting as
Page 23
© Hortonworks Inc
UGI.doAs()
UserGroupInformation bob =
UserGroupInformation.createProxyUser("bob",
UserGroupInformation.getLoginUser());
FileSystem userFS = bob.doAs(
new PrivilegedExceptionAction<FileSystem>() {
public FileSystem run() throws Exception {
return FileSystem.get(FileSystem.getDefaultUri(), conf);
}
});
Page 24
© Hortonworks Inc
Hadoop RPC
@KerberosInfo(serverPrincipal = "my.kerberos.principal")
public interface MyRpc extends VersionedProtocol { … }
public class MyRpcPolicyProvider extends PolicyProvider {
public Service[] getServices() {
return new Service[] {
new Service("my.protocol.acl", MyRpc.class)
};
}
}
public class MyRpcSecurityInfo extends SecurityInfo { … }
META-INF/services/org.apache.hadoop.security.SecurityInfo
org.example.rpc.MyRpcSecurityInfo
Page 25
© Hortonworks Inc
IPC Server: get the current user identity
Messages.KillResponse killContainer(Messages.KillRequest request) {
UserGroupInformation callerUGI;
try {
callerUGI = UserGroupInformation.getCurrentUser();
} catch (IOException ie) {
LOG.info("Error getting UGI ", ie);
AuditLogger.logFailure("UNKNOWN", "Error getting UGI");
throw RPCUtil.getRemoteException(ie);
}
…
Page 26
© Hortonworks Inc
IPC Server: Authorize
String user = callerUGI.getShortUserName();
if (!checkAccess(callerUGI, MODIFY)) {
AuditLog.unauthorized(user,
KILL_CONTAINER_REQUEST,
"User doesn't have permissions to " + MODIFY);
throw RPCUtil.getRemoteException(
new AccessControlException(
+ user + " lacks access "
+ MODIFY_APP.name()));
}
AuditLog.authorized(user, KILL_CONTAINER_REQUEST)
Page 27
© Hortonworks Inc. 2015
SASL: RFC4422
Page 28
© Hortonworks Inc.
REST: SPNEGO (+ Delegation tokens)
Page 29
• Jersey + java.net
• httpclient? “if lucky it'll work”
HADOOP-11825: Move timeline client
Jersey+Kerberos+UGI support into a public implementation
© Hortonworks Inc.
Testing
Page 30
© Hortonworks Inc.
Error messages to fear
Art: Andrés Álvarez Iglesias
Failure unspecified at GSS-API level (Checksum failed)
No valid credentials provided (Failed to find any Kerberos tgt)
Server not found in Kerberos database
Clock skew too great
Principal not found
No valid credentials provided (Illegal key size)
© Hortonworks Inc.
Topics Avoided Not Covered
• Zookeeper
• JAAS
• Trying to use HTTPS in a YARN application
• Trying to use Full REST in a YARN application
• System properties to debug Kerberos & SPNEGO
• Group management
• HADOOP_PROXY_USER
Page 32
© Hortonworks Inc.
gitbook.com/@steveloughran
Questions?
Art: Andrés Álvarez Iglesias
© Hortonworks Inc.
Zookeeper
• SASL to negotiate security:
System.setProperty("zookeeper.sasl.client", "true");
• Permissions are not transitive down the tree
Page 34
List<ACL> perms = new ArrayList<>();
if (UserGroupInformation.isSecurityEnabled()) {
perms(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.AUTH_IDS));
perms.add(new ACL(ZooDefs.Perms.READ,ZooDefs.Ids.ANYONE_ID_UNSAFE));
} else {
perms.add(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.ANYONE_ID_UNSAFE));
}
zk.createPath(path, null, perms, CreateMode.PERSISTENT);
© Hortonworks Inc
System Properties for debugging
-Dsun.security.krb5.debug=true
-Dsun.security.spnego.debug=true
export HADOOP_JAAS_DEBUG=true
Page 35
© Hortonworks Inc.
Services
• RPC authentication via annotations & metadata in JAR
• YARN Web UIs: rely on RM proxy for authentication
• Authentication != Authorization
• Add audit logs on service endpoints
• YARN services: come up with a token refresh strategy:
keytab everywhere; keytab in AM; update from client
Page 36
© Hortonworks Inc.
JAAS
• Java Authentication and Authorization Service
• Core Kerberos classes and types (Principal)
• Text files to configure
–Different for different JVMs
–Need to double escape  for windows paths
• UGI handles setting up a JAAS context & logging in
Page 37
© Hortonworks Inc.
Glossary
• Simple Authentication and Security Layer (SASL)
• GSSAPI Generic Security Service Application Program
Interface (RFC-2743+ others)
• JAAS: Java Authentication and Authorization Service
• Simple and Protected GSSAPI Negotiation Mechanism
(SPNEGO)
Page 38

More Related Content

PPTX
Inside hadoop-dev
Steve Loughran
 
PPTX
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
PDF
2014 sept 4_hadoop_security
Adam Muise
 
PDF
Hadoop security
shrey mehrotra
 
PPTX
Hdp security overview
Hortonworks
 
PPTX
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
PPTX
Hadoop security
Kashif Khan
 
PDF
Nl HUG 2016 Feb Hadoop security from the trenches
Bolke de Bruin
 
Inside hadoop-dev
Steve Loughran
 
Hadoop and Kerberos: the Madness Beyond the Gate: January 2016 edition
Steve Loughran
 
2014 sept 4_hadoop_security
Adam Muise
 
Hadoop security
shrey mehrotra
 
Hdp security overview
Hortonworks
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
Hadoop security
Kashif Khan
 
Nl HUG 2016 Feb Hadoop security from the trenches
Bolke de Bruin
 

What's hot (20)

PPTX
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
PPTX
Hadoop security
Shivaji Dutta
 
PDF
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
PPTX
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
PDF
Hadoop & Security - Past, Present, Future
Uwe Printz
 
PDF
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
PPTX
YARN Services
Steve Loughran
 
PDF
Hadoop security overview_hit2012_1117rev
Jason Shih
 
PPTX
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
PPTX
Hadoop Security Today and Tomorrow
DataWorks Summit
 
PDF
Hadoop Security Now and Future
tcloudcomputing-tw
 
PPT
Hadoop Security Architecture
Owen O'Malley
 
PDF
Hadoop security
Biju Nair
 
PPTX
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
PPTX
Hadoop ClusterClient Security Using Kerberos
Sarvesh Meena
 
PDF
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Big Data Spain
 
PPTX
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
 
PPTX
Open Source Security Tools for Big Data
Rommel Garcia
 
PPTX
Improvements in Hadoop Security
DataWorks Summit
 
PPTX
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Hadoop REST API Security with Apache Knox Gateway
DataWorks Summit
 
Hadoop security
Shivaji Dutta
 
HadoopCon2015 Multi-Cluster Live Synchronization with Kerberos Federated Hadoop
Yafang Chang
 
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
YARN Services
Steve Loughran
 
Hadoop security overview_hit2012_1117rev
Jason Shih
 
Apache Knox setup and hive and hdfs Access using KNOX
Abhishek Mallick
 
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Hadoop Security Now and Future
tcloudcomputing-tw
 
Hadoop Security Architecture
Owen O'Malley
 
Hadoop security
Biju Nair
 
CBlocks - Posix compliant files systems for HDFS
DataWorks Summit
 
Hadoop ClusterClient Security Using Kerberos
Sarvesh Meena
 
Securing Big Data at rest with encryption for Hadoop, Cassandra and MongoDB o...
Big Data Spain
 
Structor - Automated Building of Virtual Hadoop Clusters
Owen O'Malley
 
Open Source Security Tools for Big Data
Rommel Garcia
 
Improvements in Hadoop Security
DataWorks Summit
 
Effective Spark on Multi-Tenant Clusters
DataWorks Summit/Hadoop Summit
 
Ad

Viewers also liked (18)

PDF
Hadoop Security: Overview
Cloudera, Inc.
 
PDF
Administer Hadoop Cluster
Edureka!
 
PPTX
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
PPT
Hadoop Operations: How to Secure and Control Cluster Access
Cloudera, Inc.
 
PPTX
Introduction to sentry
mozillazg
 
PDF
Apache Solr Workshop
Saumitra Srivastav
 
PDF
Secure Hadoop Cluster With Kerberos
Edureka!
 
PDF
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
PPTX
Deploying Enterprise-grade Security for Hadoop
Cloudera, Inc.
 
ODP
Hadoop admin
Balaji Rajan
 
PPTX
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
PPTX
Hadoop and Data Access Security
Cloudera, Inc.
 
PDF
Sentry - An Introduction
Alexander Alten
 
PDF
Hadoop Administration pdf
Edureka!
 
PPTX
Kerberos, Token and Hadoop
Kai Zheng
 
PDF
Hadoop and Kerberos
Yuta Imai
 
PPTX
Kerberos
Sudeep Shouche
 
PPTX
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop Security: Overview
Cloudera, Inc.
 
Administer Hadoop Cluster
Edureka!
 
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Hadoop Operations: How to Secure and Control Cluster Access
Cloudera, Inc.
 
Introduction to sentry
mozillazg
 
Apache Solr Workshop
Saumitra Srivastav
 
Secure Hadoop Cluster With Kerberos
Edureka!
 
Apache Sentry for Hadoop security
bigdatagurus_meetup
 
Deploying Enterprise-grade Security for Hadoop
Cloudera, Inc.
 
Hadoop admin
Balaji Rajan
 
Introduction to Cloudera's Administrator Training for Apache Hadoop
Cloudera, Inc.
 
Hadoop and Data Access Security
Cloudera, Inc.
 
Sentry - An Introduction
Alexander Alten
 
Hadoop Administration pdf
Edureka!
 
Kerberos, Token and Hadoop
Kai Zheng
 
Hadoop and Kerberos
Yuta Imai
 
Kerberos
Sudeep Shouche
 
Big Data Analytics with Hadoop
Philippe Julio
 
Ad

Similar to Hadoop and Kerberos: the Madness Beyond the Gate (20)

PPTX
Practical Kerberos with Apache HBase
Josh Elser
 
PPTX
HBaseConEast2016: Practical Kerberos with Apache HBase
Michael Stack
 
PPTX
Improvements in Hadoop Security
Chris Nauroth
 
PPTX
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
PPTX
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
PPTX
Open Source Security Tools for Big Data
Great Wide Open
 
PDF
Practical Kerberos
Accumulo Summit
 
PPTX
Built-In Security for the Cloud
DataWorks Summit
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Apache Ranger
Rommel Garcia
 
PDF
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
PPTX
Managing enterprise users in Hadoop ecosystem
DataWorks Summit
 
PPTX
Apache Kafka Security
DataWorks Summit/Hadoop Summit
 
PPTX
Kafka Security
DataWorks Summit/Hadoop Summit
 
PPT
Setting_up_hadoop_cluster_Detailed-overview
oyqhmysnxozaxsqfac
 
PPTX
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
PPTX
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 
Practical Kerberos with Apache HBase
Josh Elser
 
HBaseConEast2016: Practical Kerberos with Apache HBase
Michael Stack
 
Improvements in Hadoop Security
Chris Nauroth
 
Troubleshooting Kerberos in Hadoop: Taming the Beast
DataWorks Summit
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Open Source Security Tools for Big Data
Great Wide Open
 
Practical Kerberos
Accumulo Summit
 
Built-In Security for the Cloud
DataWorks Summit
 
Saving the elephant—now, not later
DataWorks Summit
 
Apache Ranger
Rommel Garcia
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
Managing enterprise users in Hadoop ecosystem
DataWorks Summit
 
Apache Kafka Security
DataWorks Summit/Hadoop Summit
 
Setting_up_hadoop_cluster_Detailed-overview
oyqhmysnxozaxsqfac
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
Taming the Elephant: Efficient and Effective Apache Hadoop Management
DataWorks Summit/Hadoop Summit
 

More from Steve Loughran (20)

PPTX
Hadoop Vectored IO
Steve Loughran
 
PPTX
The age of rename() is over
Steve Loughran
 
PPTX
What does Rename Do: (detailed version)
Steve Loughran
 
PPTX
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
PPTX
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PPTX
PUT is the new rename()
Steve Loughran
 
PPT
Extreme Programming Deployed
Steve Loughran
 
PPT
Testing
Steve Loughran
 
PPTX
I hate mocking
Steve Loughran
 
PPTX
What does rename() do?
Steve Loughran
 
PPTX
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
PPTX
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
PPTX
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
PPTX
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
PPTX
Apache Spark and Object Stores
Steve Loughran
 
PPTX
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
PPTX
Slider: Applications on YARN
Steve Loughran
 
PPTX
Datacentre stack
Steve Loughran
 
PPTX
Overview of slider project
Steve Loughran
 
PPTX
Help! My Hadoop doesn't work!
Steve Loughran
 
Hadoop Vectored IO
Steve Loughran
 
The age of rename() is over
Steve Loughran
 
What does Rename Do: (detailed version)
Steve Loughran
 
Put is the new rename: San Jose Summit Edition
Steve Loughran
 
@Dissidentbot: dissent will be automated!
Steve Loughran
 
PUT is the new rename()
Steve Loughran
 
Extreme Programming Deployed
Steve Loughran
 
I hate mocking
Steve Loughran
 
What does rename() do?
Steve Loughran
 
Dancing Elephants: Working with Object Storage in Apache Spark and Hive
Steve Loughran
 
Apache Spark and Object Stores —for London Spark User Group
Steve Loughran
 
Spark Summit East 2017: Apache spark and object stores
Steve Loughran
 
Hadoop, Hive, Spark and Object Stores
Steve Loughran
 
Apache Spark and Object Stores
Steve Loughran
 
Household INFOSEC in a Post-Sony Era
Steve Loughran
 
Slider: Applications on YARN
Steve Loughran
 
Datacentre stack
Steve Loughran
 
Overview of slider project
Steve Loughran
 
Help! My Hadoop doesn't work!
Steve Loughran
 

Recently uploaded (20)

PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 

Hadoop and Kerberos: the Madness Beyond the Gate

  • 1. © Hortonworks Inc. 2015 Hadoop and Kerberos: The madness beyond the gate Steve Loughran [email protected] @steveloughran 2015
  • 2. Page 2 Me: Before Kerberos
  • 3. © Hortonworks Inc. Page 3 Me: After Kerberos
  • 4. © Hortonworks Inc. Page 4 HP Lovecraft Kerberos Evil lurking in New England MIT Project Athena Ancient, inhuman deities Kerberos Domain Controller Manuscripts to drive the reader insane IETF RFC 4120 Entities never spoken of aloud UserGroupInformation Doomed explorers of darkness You
  • 5. © Hortonworks Inc. 2015 Leave now if you want to retain your life of naïve innocence Page 5
  • 7. © Hortonworks Inc. 2015 export HADOOP_USER="root" Page 7
  • 8. © Hortonworks Inc. 2015 Modern Hadoop clusters are locked down through Kerberos Page 8
  • 9. © Hortonworks Inc. 2015 Discover Kerberos before Kerberos discovers you Page 9
  • 10. © Hortonworks Inc. 2015 Kerberos: the gateway to hell Page 10
  • 11. © Hortonworks Inc. This is not a metaphor Art: Andrés Álvarez Iglesias
  • 12. © Hortonworks Inc. 2015 KP Kerberos is the gateway Page 12 Authentication Service Ticket Granting Service Principal user@REALM user/hostname@REALM (P, TGS, n1) {KP.TGS, n1}KP, {ticket(P,TGS)} KTGS Ticket(P, TGS) = (TGS, P, tstart, tend, KPT) KP {KP.S, n2}KP, {ticket(P,S)} KS {auth(P)}KP.TGS,{ticket(P,TGS)}KTGS,S,n2 KTGS Kerberos Domain ControllerClient auth(P)KP.TGS = {P, time)}KP.TGS
  • 13. © Hortonworks Inc Every service is a principal alice@REALM bob@REALM oozie/ooziehost@REALM namenode/nn1@REALM hdfs/_HOST@REALM hdfs/r04s12@REALM hdfs/r04s13@REALM yarn/_HOST@REALM yarn/r04s12@REALM HTTP/_HOST@REALM Page 13 short names: alice bob oozie namenode hdfs yarn HTTP
  • 14. © Hortonworks Inc. Page 14 Entering the darkness
  • 17. © Hortonworks Inc. 2015 HDFS Bootstrap: Kerberos Login Page 17 shared keytab in /etc/hadoop log in to kerberos datanode/_HOST@REALM tickets for TGS namenode/nn@REALM
  • 18. © Hortonworks Inc. 2015 HDFS Bootstrap: DNs register with NN Page 18 shared keytab in /etc/hadoop DN registration Ticket for namenode/nn@REALM ExportedBlockKeys Request ticket for namenode/nn@REALM namenode/nn@REALM datanode/_HOST@REALM
  • 19. © Hortonworks Inc. Hadoop Tokens • Issued and tracked by individual services (HDFS, WebHDFS, Timeline Server, YARN RM, …) • Grant some form of access: Block tokens, Delegation Tokens • Can be passed on to other processes • Renewable via service APIs (RPC, HTTP) • Revocable in server via service APIs Page 19 read: O'Malley 2009, Hadoop Security Architecture
  • 20. © Hortonworks Inc. 2015 HDFS IO: Block Tokens Page 20 alice@REALM Obtain ticket for namenode/nn@REALM BlockToken BlockToken BlockToken: userId, (BlockPoolId, BlockId), keyId, expiryDate, access-modes namenode/nn@REALM open("file")
  • 21. © Hortonworks Inc. 2015 service/host@REALM Delegation Tokens delegate access Page 21 alice@REALM BlockToken HDFS Delegation Token BlockToken HDFS Delegation Token HDFS Delegation Token namenode/nn@REALM Token Obtain ticket for namenode/nn@REALM Request delegation token
  • 22. © Hortonworks Inc. 2015 Launch Context YARN app launch Page 22 alice@REALM HDFS Delegation Token HDFS resourcemanager/rm@REALM nodemanager/_HOST@REALMalice Launch Context AM/RM HDFS AM/RM HDFS HDFS HDFS AM/RM namenode/nn@REALM Obtain ticket for resourcemanager/rm@REALM Request delegation token AM/RM Token Obtain tickvet for namenode/nn@REALM AM/RM' AM/RM' AM/RM' Refresh AM/RM
  • 23. © Hortonworks Inc That which must not be named: UGI if(!UserGroupInformation.isSecurityEnabled()) { stayInALifeOfNaiveInnocence(); } else { sufferTheEnternalPainOfKerberos(); } UserGroupInformation.checkTGTAndReloginFromKeytab(); UserGroupInformation.getLoginUser() // principal logged in as UserGroupInformation.getCurrentUser() // principal acting as Page 23
  • 24. © Hortonworks Inc UGI.doAs() UserGroupInformation bob = UserGroupInformation.createProxyUser("bob", UserGroupInformation.getLoginUser()); FileSystem userFS = bob.doAs( new PrivilegedExceptionAction<FileSystem>() { public FileSystem run() throws Exception { return FileSystem.get(FileSystem.getDefaultUri(), conf); } }); Page 24
  • 25. © Hortonworks Inc Hadoop RPC @KerberosInfo(serverPrincipal = "my.kerberos.principal") public interface MyRpc extends VersionedProtocol { … } public class MyRpcPolicyProvider extends PolicyProvider { public Service[] getServices() { return new Service[] { new Service("my.protocol.acl", MyRpc.class) }; } } public class MyRpcSecurityInfo extends SecurityInfo { … } META-INF/services/org.apache.hadoop.security.SecurityInfo org.example.rpc.MyRpcSecurityInfo Page 25
  • 26. © Hortonworks Inc IPC Server: get the current user identity Messages.KillResponse killContainer(Messages.KillRequest request) { UserGroupInformation callerUGI; try { callerUGI = UserGroupInformation.getCurrentUser(); } catch (IOException ie) { LOG.info("Error getting UGI ", ie); AuditLogger.logFailure("UNKNOWN", "Error getting UGI"); throw RPCUtil.getRemoteException(ie); } … Page 26
  • 27. © Hortonworks Inc IPC Server: Authorize String user = callerUGI.getShortUserName(); if (!checkAccess(callerUGI, MODIFY)) { AuditLog.unauthorized(user, KILL_CONTAINER_REQUEST, "User doesn't have permissions to " + MODIFY); throw RPCUtil.getRemoteException( new AccessControlException( + user + " lacks access " + MODIFY_APP.name())); } AuditLog.authorized(user, KILL_CONTAINER_REQUEST) Page 27
  • 28. © Hortonworks Inc. 2015 SASL: RFC4422 Page 28
  • 29. © Hortonworks Inc. REST: SPNEGO (+ Delegation tokens) Page 29 • Jersey + java.net • httpclient? “if lucky it'll work” HADOOP-11825: Move timeline client Jersey+Kerberos+UGI support into a public implementation
  • 31. © Hortonworks Inc. Error messages to fear Art: Andrés Álvarez Iglesias Failure unspecified at GSS-API level (Checksum failed) No valid credentials provided (Failed to find any Kerberos tgt) Server not found in Kerberos database Clock skew too great Principal not found No valid credentials provided (Illegal key size)
  • 32. © Hortonworks Inc. Topics Avoided Not Covered • Zookeeper • JAAS • Trying to use HTTPS in a YARN application • Trying to use Full REST in a YARN application • System properties to debug Kerberos & SPNEGO • Group management • HADOOP_PROXY_USER Page 32
  • 34. © Hortonworks Inc. Zookeeper • SASL to negotiate security: System.setProperty("zookeeper.sasl.client", "true"); • Permissions are not transitive down the tree Page 34 List<ACL> perms = new ArrayList<>(); if (UserGroupInformation.isSecurityEnabled()) { perms(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.AUTH_IDS)); perms.add(new ACL(ZooDefs.Perms.READ,ZooDefs.Ids.ANYONE_ID_UNSAFE)); } else { perms.add(new ACL(ZooDefs.Perms.ALL, ZooDefs.Ids.ANYONE_ID_UNSAFE)); } zk.createPath(path, null, perms, CreateMode.PERSISTENT);
  • 35. © Hortonworks Inc System Properties for debugging -Dsun.security.krb5.debug=true -Dsun.security.spnego.debug=true export HADOOP_JAAS_DEBUG=true Page 35
  • 36. © Hortonworks Inc. Services • RPC authentication via annotations & metadata in JAR • YARN Web UIs: rely on RM proxy for authentication • Authentication != Authorization • Add audit logs on service endpoints • YARN services: come up with a token refresh strategy: keytab everywhere; keytab in AM; update from client Page 36
  • 37. © Hortonworks Inc. JAAS • Java Authentication and Authorization Service • Core Kerberos classes and types (Principal) • Text files to configure –Different for different JVMs –Need to double escape for windows paths • UGI handles setting up a JAAS context & logging in Page 37
  • 38. © Hortonworks Inc. Glossary • Simple Authentication and Security Layer (SASL) • GSSAPI Generic Security Service Application Program Interface (RFC-2743+ others) • JAAS: Java Authentication and Authorization Service • Simple and Protected GSSAPI Negotiation Mechanism (SPNEGO) Page 38

Editor's Notes

  • #4: Th
  • #7: Enough people like Dunkin Donut's decaf coffee that you can buy it for home use —and supermarkets will stock it next to the MacDonalds coffee.
  • #8: This is your get out clause. Turn off encryption. Users are who they claim to be; the environment variable HADOOP_USER can change it on a whim.
  • #9: ..which is why production clusters are all locked down with kerberos. Callout: this doesn't cover authorization/access control (exception: Hadoop IPC acls), wire encryption, HTTPS or data encryption.
  • #10: So you can't ignore Kerberos. You only get a choice about when to encounter it -early on in your coding and testing -during final integration tests -in late night support calls.
  • #12: Photo: https://www.flickr.com/photos/doctorserone/4635167170/ Andrés Álvarez Iglesias
  • #13: The KDC is managed by the enterprise security team. They are either paranoid about security, or your organisation is 0wned by everyone from Anonymous to North Korea. They don't trust you, they don't trust Hadoop, and make the rest of the network ops people seem welcoming. You will need to work with these people.
  • #30: AuthenticatedURL DelegationTokenAuthenticatedURL org.apache.hadoop.hdfs.web.URLConnectionFactory org/apache/spark/deploy/history/yarn/rest in SPARK-1537
  • #31: There is a mini KDC, "MiniKDC" in the Hadoop codebase. I've used this in the YARN-913 registry work; its good for verifying that you got through the permissions logic, and for learning various acronyms. And at the end of the run you get tests that Jenkins can run every build. But I've embraced testing against kerberized VMs, where you do the work of creating keytabs, filling in the configuration files, requiring SPENGO authed web browsers, having your command line account kinit in regularly, services having tokens expire, etc. etc. Why? Because its what the real world is like. L
  • #32: Error messages with UGI are usually a sign of trouble Photo: https://www.flickr.com/photos/doctorserone/4635167170/ Andrés Álvarez Iglesias
  • #34: Photo: https://www.flickr.com/photos/doctorserone/4635167170/ Andrés Álvarez Iglesias