SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
BEST PRACTICES FOR ENTERPRISE
USER MANAGEMENT IN HADOOP
ENVIRONMENT
Sailaja Polavarapu
Sr. Software Engineer
Hortonworks
Dataworks Summit 2017 Munich
Don Bosco Durai
Cofounder &
Chief Security Architect
Privacera
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Don Bosco Durai
⬢Cofounder and Chief Security Architect at Privacera
⬢Committer in Apache Ranger and Apache Ambari
⬢Contributor in most Apache projects for security
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Sailaja Polavarapu
⬢ Apache Ranger contributor since 2015
⬢ Apache Ranger Committer
⬢ Contributed major improvements for Usersync module in
Ranger
⬢Currently working at Hortonworks Security Team
⬢ Contact: spolavarapu@apache.org
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved
Agenda
◆ Authentication and Users in Hadoop
◆ Integrating Ranger with AD/LDAP
◆ Common Use cases
◆ LDAP connection check tool
◆ Best practices
◆ Demo
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Most commonly asked question
If I have Ranger, do I need Kerberos?
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Why Authenticate Users?
Authentication
Authorization
Auditing
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Service Types
Infrastructure
HDFS
Oozie
Storm
YARN
Hive
Server
HBase
Zookeeper Kafka
Apps
Zeppelin
Ambari
Views
Ambari
Admin
Ranger
Atlas
LogSearch
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Master Node
Infrastructure - Kerberos
YARN
Resource Manager
Hive Server
HDFS
Name Node
Node 1
YARN
Node Manager
HDFS
Data Node
Linux
Process
Linux
Process
Node 2
YARN
Node Manager
HDFS
Data Node
Linux
Process
Linux
Process
2
3 3
4 4
5
6 6
Users
1
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
PortalsNotebooks/Viewer
Apps - Username & Password
Hive Server2
ZeppelinAmbari Views
HDFS
Ambari
Atlas
Ranger
BI Tools
Spark
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Knox - Gateway & SSO
Ambari
WebHDFS (HDFS)
Templeton (HCatalog)
Stargate (HBase)
Oozie
Hive/JDBC
Yarn RM
Storm
Name Node UI
Job History UI
Oozie UI
HBase UI
Yarn UI
Spark UI
Ambari UI
Ranger Admin Console
Services UIs
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Authentication and User Source
Hive JDBC
Web Apps
(Zeppelin, Ranger,
Ambari, Atlas)
CLI/ API
(HDFS, Hive Beeline,
HBase, etc.)
LDAP/Kerberos
LDAP
Kerberos
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ranger
UserSync
Ranger
Admin
Database
AD/
LDAP
Sync
Users/Groups
User/Group Synchronization in Ranger
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User sources
⬢ AD/LDAP
–Syncs users and groups from LDAP Organizational Units (OU)
⬢Unix Native Users
–Syncs users and groups from /etc/passwd and /etc/group files
⬢ File Sources
–Syncs users and groups from a file specified in the configuration.
–Supports many file formats like - CSV, JSON, etc...
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Integrating Ranger with AD/LDAP
⬢ Understanding your deployment
– What kind of directory server: Active Directory, OpenLdap
server, etc…?
– Is the communication between hadoop cluster and directory
server secure or unsecure?
– Do you have atleast a read-only LDAP user for binding?
– Any firewall restrictions for communication between hadoop
and directory server?
– Is Centrify being used as Ldap proxy?
– Does your AD have spaces or special characters in
username
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
⬢ Gathering details of the directory server structure
– AD/LDAP url and bind credentials
– Any specific OU(s) for hadoop users and groups?
– How many users and groups in the Domain and/or in Ous?
– What kind of filters for user search and/or group search to
be configured in order to limit the users and groups synced to
hadoop?
– What all the available attributes on the directory server for
users and groups like uid, sAMAccountname, memberof,
objectclass, etc…
– Authorization policies to be configured at user level or
group level?
Requirements for User Management
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
DC=ad01,DC=hadoop,DC=com
OU=Hadoop Users
OU=Hadoop Groups
sAMAccountName=jdoe
cn=John Doe
sAMAccountName=bhall
cn=Bob Hall
sAMAccountName=asmith
cn=Andy Smith
sAMAccountName=acaroll
cn=Ashley Caroll
(|(memberof=cn=hdp_testing,ou=Hadoop
Groups,dc=hortonworks,dc=com)(membe
rof=cn=hdp_admin,ou=Hadoop
Groups,dc=hortonworks,dc=com)(membe
rof=cn=dev_ops,ou=Hadoop
Groups,dc=hortonworks,dc=com))
cn=hdp_testing
cn=dev_ops
cn=hdp_admin
sAMAccountName=jdoe
cn=John Doe
sAMAccountName=bhall
cn=Bob Hall
sAMAccountName=asmith
cn=Andy Smith
sAMAccountName=acaroll
cn=Ashley Caroll
Sample Active Directory Server Structure
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Use Case
⬢ Sync all the users that belong to groups -
“hdp_testing”, “hdp_admin”, or “dev_ops”
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
User based Search
⬢ Filter based on “memberof” attribute of the user
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
(| (memberof=cn=hdp_testing,ou=Hadoop
Groups, dc=hortonworks,dc=com)
(memberof=cn=hdp_admin, ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=dev_ops, ou=Hadoop Groups,
dc=hortonworks,dc=com) )
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
sAMAccountName
(|(memberof=cn=hdp_testing,ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=hdp_admin, ou=Hadoop Groups,
dc=hortonworks,dc=com)
(memberof=cn=dev_ops, ou=Hadoop Groups,
dc=hortonworks,dc=com))
OU=Hadoop Users,dc=hortonworks,dc=com
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Group based Search
⬢ Filter based on the group name or “cn” attribute of the group
(|(cn=hdp_*)(cn=dev_*))
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
cn
OU=Hadoop Groups,dc=hortonworks,dc=com
member
(|(cn=dev_*)(cn=hdp_*))
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
LDAP connection check tool
⬢ Command line tool
⬢ Used for
–Discovering various LDAP attributes
– Validate the LDAP settings in Ranger, Ambari, or HDFS LDAP
Group Mapping
– To retrieve the total number of user and/or groups
⬢ Available as part of ranger installation
⬢ Requires basic information like ldap url, bind credentials, etc…
– Command line interface
– a template properties file to update the values specific to the
setup
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Tool usage
⬢usage: run.sh
-a ignore authentication properties
-d <arg> {all|users|groups}
-h show help.
-i <arg> Input file name
-o <arg> Output directory
-r <arg> {all|users|groups}
⬢ All these above parameters are optional
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
CLI option for the Ldap tool
⬢ CLI is provided when input file is not specified:
Ldap url [ldap://ldap.example.com:389]:
Bind DN [cn=admin,ou=users,dc=example,dc=com]:
Bind Password:
User Search Base [ou=users,dc=example,dc=com]:
User Search Filter [cn=user1]:
Sample Authentication User [user1]:
Sample Authentication Password:
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Demo
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Best practices and Strategies
⬢ Use LDAP/AD for application service authentication
⬢ Use Ranger for authorization
⬢ Verify the truststore certs are updated across the system in case
of SSL
⬢ Use LDAP Connection check tool to
–discover LDAP configuration attributes
–verify the number of users and groups to be sync’d to ranger
⬢ Verify if same case conversion and special characters for user and
group names are handled uniformly across hadoop environment
–Matching rules must be used in core-site.xml as well as in
ranger
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
user@ranger.apache.org

More Related Content

Viewers also liked (12)

PPTX
Solving Cyber at Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PPTX
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
PPTX
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
PPTX
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Metron: Community Driven Cyber Security
DataWorks Summit/Hadoop Summit
 
PDF
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
DataWorks Summit
 
PPTX
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Solving Cyber at Scale
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Dancing Elephants - Efficiently Working with Object Stories from Apache Spark...
DataWorks Summit/Hadoop Summit
 
Automatic Detection, Classification and Authorization of Sensitive Personal D...
DataWorks Summit/Hadoop Summit
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Apache Metron: Community Driven Cyber Security
DataWorks Summit/Hadoop Summit
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
DataWorks Summit
 
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 

Similar to Best Practices for Enterprise User Management in Hadoop Environment (20)

PPTX
Managing enterprise users in Hadoop ecosystem
DataWorks Summit
 
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
PPTX
Apache Ranger
Rommel Garcia
 
PPTX
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
PPTX
Securing Hadoop in an Enterprise Context (v2)
Hellmar Becker
 
PDF
TriHUG October: Apache Ranger
trihug
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PPTX
Hadoop security
Shivaji Dutta
 
PPTX
BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apa...
Krzysztof Adamski
 
PPTX
Hdp security overview
Hortonworks
 
PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
PPTX
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
PPTX
Securing Hadoop in an Enterprise Context
Hellmar Becker
 
PDF
2014 sept 4_hadoop_security
Adam Muise
 
PPTX
Hadoop & devOps : better together
Maxime Lanciaux
 
PPTX
Improvements in Hadoop Security
Chris Nauroth
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Managing enterprise users in Hadoop ecosystem
DataWorks Summit
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Apache Ranger
Rommel Garcia
 
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Securing Hadoop in an Enterprise Context (v2)
Hellmar Becker
 
TriHUG October: Apache Ranger
trihug
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Hadoop security
Shivaji Dutta
 
BigDataTech 2016 How to manage authorization rules on Hadoop cluster with Apa...
Krzysztof Adamski
 
Hdp security overview
Hortonworks
 
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Running Enterprise Workloads in the Cloud
DataWorks Summit
 
Securing Hadoop in an Enterprise Context
Hellmar Becker
 
2014 sept 4_hadoop_security
Adam Muise
 
Hadoop & devOps : better together
Maxime Lanciaux
 
Improvements in Hadoop Security
Chris Nauroth
 
Saving the elephant—now, not later
DataWorks Summit
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Hadoop Operations – Past, Present, and Future
DataWorks Summit
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Ad

Recently uploaded (20)

PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PPTX
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
PDF
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PDF
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
PPT
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
PPTX
Digital Circuits, important subject in CS
contactparinay1
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Future Tech Innovations 2025 – A TechLists Insight
TechLists
 
UPDF - AI PDF Editor & Converter Key Features
DealFuel
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
Kit-Works Team Study_20250627_한달만에만든사내서비스키링(양다윗).pdf
Wonjun Hwang
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
NASA A Researcher’s Guide to International Space Station : Physical Sciences ...
Dr. PANKAJ DHUSSA
 
Ericsson LTE presentation SEMINAR 2010.ppt
npat3
 
Digital Circuits, important subject in CS
contactparinay1
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
AI Agents in the Cloud: The Rise of Agentic Cloud Architecture
Lilly Gracia
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 

Best Practices for Enterprise User Management in Hadoop Environment

  • 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved BEST PRACTICES FOR ENTERPRISE USER MANAGEMENT IN HADOOP ENVIRONMENT Sailaja Polavarapu Sr. Software Engineer Hortonworks Dataworks Summit 2017 Munich Don Bosco Durai Cofounder & Chief Security Architect Privacera
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Don Bosco Durai ⬢Cofounder and Chief Security Architect at Privacera ⬢Committer in Apache Ranger and Apache Ambari ⬢Contributor in most Apache projects for security
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sailaja Polavarapu ⬢ Apache Ranger contributor since 2015 ⬢ Apache Ranger Committer ⬢ Contributed major improvements for Usersync module in Ranger ⬢Currently working at Hortonworks Security Team ⬢ Contact: [email protected]
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved4 © Hortonworks Inc. 2011 – 2017. All Rights Reserved Agenda ◆ Authentication and Users in Hadoop ◆ Integrating Ranger with AD/LDAP ◆ Common Use cases ◆ LDAP connection check tool ◆ Best practices ◆ Demo
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Most commonly asked question If I have Ranger, do I need Kerberos?
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Why Authenticate Users? Authentication Authorization Auditing
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Service Types Infrastructure HDFS Oozie Storm YARN Hive Server HBase Zookeeper Kafka Apps Zeppelin Ambari Views Ambari Admin Ranger Atlas LogSearch
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Master Node Infrastructure - Kerberos YARN Resource Manager Hive Server HDFS Name Node Node 1 YARN Node Manager HDFS Data Node Linux Process Linux Process Node 2 YARN Node Manager HDFS Data Node Linux Process Linux Process 2 3 3 4 4 5 6 6 Users 1
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved PortalsNotebooks/Viewer Apps - Username & Password Hive Server2 ZeppelinAmbari Views HDFS Ambari Atlas Ranger BI Tools Spark
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Knox - Gateway & SSO Ambari WebHDFS (HDFS) Templeton (HCatalog) Stargate (HBase) Oozie Hive/JDBC Yarn RM Storm Name Node UI Job History UI Oozie UI HBase UI Yarn UI Spark UI Ambari UI Ranger Admin Console Services UIs
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Authentication and User Source Hive JDBC Web Apps (Zeppelin, Ranger, Ambari, Atlas) CLI/ API (HDFS, Hive Beeline, HBase, etc.) LDAP/Kerberos LDAP Kerberos
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ranger UserSync Ranger Admin Database AD/ LDAP Sync Users/Groups User/Group Synchronization in Ranger
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User sources ⬢ AD/LDAP –Syncs users and groups from LDAP Organizational Units (OU) ⬢Unix Native Users –Syncs users and groups from /etc/passwd and /etc/group files ⬢ File Sources –Syncs users and groups from a file specified in the configuration. –Supports many file formats like - CSV, JSON, etc...
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Integrating Ranger with AD/LDAP ⬢ Understanding your deployment – What kind of directory server: Active Directory, OpenLdap server, etc…? – Is the communication between hadoop cluster and directory server secure or unsecure? – Do you have atleast a read-only LDAP user for binding? – Any firewall restrictions for communication between hadoop and directory server? – Is Centrify being used as Ldap proxy? – Does your AD have spaces or special characters in username
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ⬢ Gathering details of the directory server structure – AD/LDAP url and bind credentials – Any specific OU(s) for hadoop users and groups? – How many users and groups in the Domain and/or in Ous? – What kind of filters for user search and/or group search to be configured in order to limit the users and groups synced to hadoop? – What all the available attributes on the directory server for users and groups like uid, sAMAccountname, memberof, objectclass, etc… – Authorization policies to be configured at user level or group level? Requirements for User Management
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved DC=ad01,DC=hadoop,DC=com OU=Hadoop Users OU=Hadoop Groups sAMAccountName=jdoe cn=John Doe sAMAccountName=bhall cn=Bob Hall sAMAccountName=asmith cn=Andy Smith sAMAccountName=acaroll cn=Ashley Caroll (|(memberof=cn=hdp_testing,ou=Hadoop Groups,dc=hortonworks,dc=com)(membe rof=cn=hdp_admin,ou=Hadoop Groups,dc=hortonworks,dc=com)(membe rof=cn=dev_ops,ou=Hadoop Groups,dc=hortonworks,dc=com)) cn=hdp_testing cn=dev_ops cn=hdp_admin sAMAccountName=jdoe cn=John Doe sAMAccountName=bhall cn=Bob Hall sAMAccountName=asmith cn=Andy Smith sAMAccountName=acaroll cn=Ashley Caroll Sample Active Directory Server Structure
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Use Case ⬢ Sync all the users that belong to groups - “hdp_testing”, “hdp_admin”, or “dev_ops”
  • 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  • 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved User based Search ⬢ Filter based on “memberof” attribute of the user
  • 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved (| (memberof=cn=hdp_testing,ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=hdp_admin, ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=dev_ops, ou=Hadoop Groups, dc=hortonworks,dc=com) )
  • 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved sAMAccountName (|(memberof=cn=hdp_testing,ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=hdp_admin, ou=Hadoop Groups, dc=hortonworks,dc=com) (memberof=cn=dev_ops, ou=Hadoop Groups, dc=hortonworks,dc=com)) OU=Hadoop Users,dc=hortonworks,dc=com
  • 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Group based Search ⬢ Filter based on the group name or “cn” attribute of the group (|(cn=hdp_*)(cn=dev_*))
  • 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved cn OU=Hadoop Groups,dc=hortonworks,dc=com member (|(cn=dev_*)(cn=hdp_*))
  • 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LDAP connection check tool ⬢ Command line tool ⬢ Used for –Discovering various LDAP attributes – Validate the LDAP settings in Ranger, Ambari, or HDFS LDAP Group Mapping – To retrieve the total number of user and/or groups ⬢ Available as part of ranger installation ⬢ Requires basic information like ldap url, bind credentials, etc… – Command line interface – a template properties file to update the values specific to the setup
  • 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tool usage ⬢usage: run.sh -a ignore authentication properties -d <arg> {all|users|groups} -h show help. -i <arg> Input file name -o <arg> Output directory -r <arg> {all|users|groups} ⬢ All these above parameters are optional
  • 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved CLI option for the Ldap tool ⬢ CLI is provided when input file is not specified: Ldap url [ldap://ldap.example.com:389]: Bind DN [cn=admin,ou=users,dc=example,dc=com]: Bind Password: User Search Base [ou=users,dc=example,dc=com]: User Search Filter [cn=user1]: Sample Authentication User [user1]: Sample Authentication Password:
  • 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Demo
  • 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Best practices and Strategies ⬢ Use LDAP/AD for application service authentication ⬢ Use Ranger for authorization ⬢ Verify the truststore certs are updated across the system in case of SSL ⬢ Use LDAP Connection check tool to –discover LDAP configuration attributes –verify the number of users and groups to be sync’d to ranger ⬢ Verify if same case conversion and special characters for user and group names are handled uniformly across hadoop environment –Matching rules must be used in core-site.xml as well as in ranger
  • 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved [email protected]