SlideShare a Scribd company logo
Hadoop Security: Today and Tomorrow
Vinay Shukla
Hortonworks
© Hortonworks Inc. 2014
Hadoop Security Today &
Tomorrow
Amsterdam - April3rd, 2014
Vinay Shukla
Twitter: @NeoMythos
© Hortonworks Inc. 2014
Agenda
• What is Hadoop Security?
– 4 Security Pillars & Rings of Defense
• What security elements exists today?
– Authentication
– Authorization
– Audit
– Data Protection
• What is on the security roadmap?
– Coming soon
– Longer term projects
• Securing Hadoop with Apache Knox Gateway
– Knox overview
– Demo
• How to get involved
© Hortonworks Inc. 2014
What is Apache Hadoop Security?
Security in Apache Hadoop is
defined by four key pillars:
authentication, authorization,
accountability, and data protection.
© Hortonworks Inc. 2014
Two Reasons for Security in Hadoop
Hadoop Contains Sensitive Data
–As Hadoop adoption grows so too has the types of data
organizations look to store. Often the data is proprietary
or personal and it must be protected.
–In this context, Hadoop is governed by the same
security requirements as any data center platform.
Hadoop is subject to Compliance adherence
–Organizations are often subject to comply with
regulations such as HIPPA, PCI DSS, FISAM that
require protection of personal information.
–Adherence to other Corporate security policies.
1
2
© Hortonworks Inc. 2014
Security: Rings of Defense
Perimeter Level Security
• Network Security (i.e. Firewalls)
• Apache Knox (i.e. Gateways)
Data Protection
• Core Hadoop
• Partners
Authentication
• Kerberos
OS Security
Authorization
• MR ACLs
• HDFS Permissions
• HDFS ACLs
• HiveATZ-NG
• HBase ACLs
• Accumulo Label Security
Page 6
© Hortonworks Inc. 2014
Authentication in Hadoop Today…
Authentication
Who am I/prove it?
Control access to
cluster.
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in native
Apache Hadoop
Perimeter
Security with
Apache Knox
Gateway
© Hortonworks Inc. 2014
Kerberos Authentication in Hadoop
For more than 20 years, Kerberos has been the de-facto
standard for strong authentication.
…no other option exists.
The design and implementation of Kerberos security in native Apache
Hadoop was delivered by Hortonworker Owen O’Malley in 2010.
What does Kerberos Do?
– Establishes identity for clients, hosts and services
– Prevents impersonation/passwords are never sent over the wire
– Integrates w/ enterprise identity management tools such as LDAP & Active Directory
– More granular auditing of data access/job execution
© Hortonworks Inc. 2014
• Single Hadoop
access point
• REST API hierarchy
• Consolidated API
calls
• Multi-cluster
support
• Eliminates SSH
“edge node”
• Central API
management
• Central audit control
• Simple Service
level Authorization
• SSO Integration –
Siteminder, API
Key*, OAuth* &
SAML*
• LDAP & AD
integration
Perimeter Security with Apache Knox
Integrated with
existing systems to
simplify identity
maintenance
Incubated and led by Hortonworks,
Apache Knox provides a simple and open
framework for Hadoop perimeter security.
Single, simple point
of access for a
cluster
Central controls
ensure consistency
across one or more
clusters
© Hortonworks Inc. 2014
Authentication & Audit in Hadoop today…
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in native
Apache Hadoop
Perimeter
Security with
Apache Knox
Gateway
Native in Apache Hadoop
• MapReduce Access Control Lists
• HDFS Permissions
• Process Execution audit trail
Cell level access control in
Apache Accumulo
Authentication
Who am I/prove it?
Control access to
cluster.
© Hortonworks Inc. 2014
Authorization: Who can do what in Hadoop?
• Access Control Services exist for each of the Hadoop
components
–HDFS has file Permissions
–YARN, MapReduce, HBase has Access Control Lists (ACL)
–Accumulo Proves more granular label/cell level security
• Improvements to these services are being led by
Hortonworks Team:
–HDFS Improvements – Extended ACL, more flexible via multiple
policies on the same file or directory
–Hive Improvements – Hortonworks initiative called Hive ATZ-NG,
better integration allows familiar SQL/database syntax
(GRANT/REVOKE) and allows more clients (including partner
integrations) to be secure.
© Hortonworks Inc. 2014
Data Protection in Hadoop today…
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in native
Apache Hadoop
Perimeter
Security with
Apache Knox
Gateway
Native in Apache Hadoop
• MapReduce Access Control Lists
• HDFS Permissions
• Process Execution audit trail
Cell level access control in
Apache Accumulo
Wire encryption
in native Apache
Hadoop
Orchestrated
encryption with
3rd party tools
Authentication
Who am I/prove it?
Control access to
cluster.
© Hortonworks Inc. 2014
Data Protection in Hadoop
must be applied at three different
layers in Apache Hadoop
Storage: encrypt data while it is at rest
Direct data flows “into” and “out of” 3rd party encryption tools and/or
rely upon hardware specific techniques (i.e. drive-level encryption).
Transmission: encrypt data as it is in motion
Native Apache Hadoop 2.0 provides wire encryption.
Upon Access: apply restrictions when accessed
Direct data flows “into” and “out of” 3rd party encryption tools.
Data Protection
© Hortonworks Inc. 2014
Data Protection – Details - Today
• Encryption of Data at Rest
–Option 1: OS or Hardware Level Encryption (Out of the Box)
–Option 2: Custom Development
–Option 3: Certified Partners
–Work underway for encryption in Hive, HDFS and HBase as core
platform capabilities.
• Encryption of Data on the Wire
–All wire protocols can be encrypted by HDP platform (2.x). Wire-level
encryption enhancements led by HWX Team.
• Column Level Encryption
–No current out of the box support in Hadoop.
–Certified Partners provide these capabilities.
© Hortonworks Inc. 2014
What can be done today?
Authorization
Restrict access
to explicit data
Audit
Understand who
did what
Data Protection
Encrypt data at
rest & motion
Kerberos in
native Apache
Hadoop
Perimeter
Security with
Apache Knox
Gateway
Native in Apache Hadoop
• MapReduce Access Control Lists
• HDFS Permissions
• Process Execution audit trail
Cell level access control in
Apache Accumulo
Service level Authorization with
Knox
Access Audit with Knox
Wire encryption
in native Apache
Hadoop
Wire Encryption
with Knox
Orchestrated
encryption with
3rd party tools
Authentication
Who am I/prove it?
Control access to
cluster.
© Hortonworks Inc. 2014
Hadoop Security
Hortonworks is Delivering Secure Hadoop for the Enterprise
Security for Hadoop must be addressed within
every layer of the stack and integrated into existing frameworks
For a full description of what is available in Enterprise Hadoop
today across Authentication, Authorization, Accountability and
Data Protection please visit our security labs page
Governance
&Integration
Security
Operations
Data Access
Data
Management
HDP 2.1
New: Apache Knox
Perimeter security for Hadoop
 A common place to preform authentication
across Hadoop and all related projects
 Integrated to LDAP and AD
 Currently supports:
WebHDFS, WebHCAT, Oozie, Hive & HBase
 Broad community effort, incubated with
Microsoft, broad set of developers involved
Security Investments
Security Phase 3:
• Audit event correlation and Audit viewer
• Data Encryption in HDFS, Hive & HBase
• Knox for HDFS HA, Ambari & Falcon
• Support Token-Based AuthN beyond Kerb
Security Phase 2:
• ACLs for HDFS
• Knox: Hadoop REST API Security
• SQL-style Hive AuthZ (GRANT, REVOKE)
• SSL support for Hive Server 2
• SSL for DN/NN UI & WebHDFS
• PAM support for Hive
Phase 1
• Strong AuthN with Kerberos
• HBase, Hive, HDFS basic AuthZ
• Encryption with SSL for NN, JT, etc.
• Wire encryption with Shuffle, HDFS, JDBC
© Hortonworks Inc. 2014
Hadoop Security: Phase 2
Page 17
HDP 2.1 Features
Release Theme REST API Security, Improve AuthZ, Wire Encryption
Specific Features • Hadoop REST API Security with Apache Knox
• Eliminates SSH edge node
• Single Hadoop access point
• LDAP, AD based Authentication
• Service-level Authorization
• Audit support for REST access
• SQL style Hive Authorization with fine grain access
• HDFS Access Control Lists
• SSL support in HiveServer2
• SSL support in NN/DN UI & WebHDFS
• Pluggable Authentication Module (PAM) in Hive
Included
Components
Apache Knox, Hive, HDFS
© Hortonworks Inc. 2014
Why Knox?
From fb.com/hadoopmemes
Apache Knox Gateway
• REST/HTTP API security for
Hadoop
• Eliminates SSH edge node
• Single REST API access point
• Centralized Authentication,
Authorization, and Audit for
Hadoop REST/HTTP services
• LDAP/AD Authentication,
Service Authorization, Audit etc.
Knox Eliminates
• Client’s requirements for intimate knowledge of cluster topology
© Hortonworks Inc. 2014
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….
Master
Nodes
Rack 1
Switch
NN
SNN
….
Slave
Nodes
Rack 2
….
Slave
Nodes
Rack N
SwitchSwitch
DN DN
Web
Tier
LB
Knox
Hadoop
CLIs
© Hortonworks Inc. 2014
Hadoop REST API Security: Drill-Down
REST
Client
Enterprise
Identity
Provider
LDAP/AD
Knox
Gateway
GW
GW
Firewall
Firewall
DMZ
L
B
Edge
Node/H
adoop
CLIs
RPC
HTTP
HTTP HTTP
LDAP
Hadoop Cluster 1
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
Hadoop Cluster 2
Masters
Slaves
RM
NN
Web
HCat
Oozie
DN NM
HS2
HBase
HBase
© Hortonworks Inc. 2014
Selects appropriate
service filter chain
based on request URL
mapping rules
REST
Client
Protocol
Listener
Listens for requests on the
appropriate protocols
(e.g. HTTP/HTTPS)
Service
Selector
Service Specific Filter Chain
Identity
Asserter
Filter
Dispatch
Rewrite
Filter
AuthN
Filter
Hadoop
Service
Enforces propagation of
authenticated identity to Hadoop
by modifying request
Streams request and
response to and from
Hadoop service based
on rewritten URLs
Translates URLs in request and
response between external and
internal URLs based on service
specific rules
Enterprise
Identity
Provider
Enterprise/Cl
oud SSO
Provider
Challenges client for
credentials and authenticates
or validates SSO Token
Service filter chains are composed
and configured at deployment time
by service specific plugins
What is Knox? Client > Knox > Hadoop Cluster
Knox Gateway
© Hortonworks Inc. 2014© Hortonworks Inc. 2014
Knox Gateway in action
Submit MR job via Knox
© Hortonworks Inc. 2014
HDFS & MR Operations with Knox
• Create a few directories
curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777'
curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777"
curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777"
• Upload files
curl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib/hadoop-
examples.jar?op=CREATE
curl -iku guest:guest-password -X PUT -L -T README -X PUT
"https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/README?op=CREATE"
• Run MR job
curl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d
class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar
• Query the jobs for a user
curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue
• Query the status of a given job
curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id>
• Read the output file
curl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN
• Remove a directory
curl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true"
© Hortonworks Inc. 2014
How to get Involved
Resource Location
Security Labs http://hortonworks.com/labs/security/
Security Blogs http://hortonworks.com/blog/category/innovation/security/
Apache Knox
Tutorial
http://hortonworks.com/hadoop-tutorial/securing-hadoop-
infrastructure-apache-knox/
Need help? http://hortonworks.com/community/forums/forum/security/ or
vshukla@hortonworks.com
© Hortonworks Inc. 2014
Thank you! Amsterdam - April3rd, 2014
Vinay Shukla
Twitter: @NeoMythos
Hadoop Security Today and Tomorrow

More Related Content

What's hot (20)

PDF
Ansible
Knoldus Inc.
 
PPTX
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
PPTX
Kafka Security
DataWorks Summit/Hadoop Summit
 
PPTX
Terraform
Phil Wilkins
 
PDF
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
PDF
Docker 101: Introduction to Docker
Docker, Inc.
 
PPTX
Big Data and Hadoop
Flavio Vit
 
PDF
Use case and integration of ClickHouse with Apache Superset & Dremio
Altinity Ltd
 
PPTX
Terraform
Pathum Fernando ☁
 
PPTX
Introduction to Kubernetes
rajdeep
 
PPTX
Ambari: Agent Registration Flow
Hortonworks
 
PPTX
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
PPTX
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 
PDF
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
PDF
Super Sizing Youtube with Python
didip
 
PPTX
Vault Open Source vs Enterprise v2
Stenio Ferreira
 
PPTX
Kubernetes #1 intro
Terry Cho
 
PDF
Introduction to Apache Hive
Avkash Chauhan
 
Ansible
Knoldus Inc.
 
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Terraform
Phil Wilkins
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Databricks
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Docker 101: Introduction to Docker
Docker, Inc.
 
Big Data and Hadoop
Flavio Vit
 
Use case and integration of ClickHouse with Apache Superset & Dremio
Altinity Ltd
 
Introduction to Kubernetes
rajdeep
 
Ambari: Agent Registration Flow
Hortonworks
 
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Kafka Tutorial: Kafka Security
Jean-Paul Azar
 
Building Event Driven (Micro)services with Apache Kafka
Guido Schmutz
 
Super Sizing Youtube with Python
didip
 
Vault Open Source vs Enterprise v2
Stenio Ferreira
 
Kubernetes #1 intro
Terry Cho
 
Introduction to Apache Hive
Avkash Chauhan
 

Viewers also liked (20)

PDF
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Hortonworks
 
PPTX
Hdp security overview
Hortonworks
 
PDF
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
PDF
CIS13: Managing the Keys to the Kingdom: Next-Gen Role-based Access Control a...
CloudIDSummit
 
PPTX
Protecting Enterprise Data in Apache Hadoop
DataWorks Summit/Hadoop Summit
 
PDF
San Francisco Best Places to Work Roadshow | Centrify
Glassdoor
 
PDF
Talend Open Studio and Hortonworks Data Platform
Hortonworks
 
PPTX
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
PPTX
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
PPTX
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
PPTX
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
PDF
Hadoop Security: Overview
Cloudera, Inc.
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PPTX
Talend Big Data Capabilities Overview
Rajan Kanitkar
 
PDF
Hadoop Security
Timothy Spann
 
PDF
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
PPTX
Ansible + Hadoop
Michael Young
 
PPTX
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Simplify and Secure your Hadoop Environment with Hortonworks and Centrify
Hortonworks
 
Hdp security overview
Hortonworks
 
HDP Advanced Security: Comprehensive Security for Enterprise Hadoop
Hortonworks
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
CIS13: Managing the Keys to the Kingdom: Next-Gen Role-based Access Control a...
CloudIDSummit
 
Protecting Enterprise Data in Apache Hadoop
DataWorks Summit/Hadoop Summit
 
San Francisco Best Places to Work Roadshow | Centrify
Glassdoor
 
Talend Open Studio and Hortonworks Data Platform
Hortonworks
 
Securing Hadoop's REST APIs with Apache Knox Gateway Hadoop Summit June 6th, ...
Kevin Minder
 
Streamline Hadoop DevOps with Apache Ambari
DataWorks Summit/Hadoop Summit
 
Simplified Cluster Operation & Troubleshooting
DataWorks Summit/Hadoop Summit
 
Security and Governance on Hadoop with Apache Atlas and Apache Ranger by Srik...
Artem Ervits
 
Hadoop Security: Overview
Cloudera, Inc.
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Talend Big Data Capabilities Overview
Rajan Kanitkar
 
Hadoop Security
Timothy Spann
 
Big Data Taiwan 2014 Track2-2: Informatica Big Data Solution
Etu Solution
 
Ansible + Hadoop
Michael Young
 
Real-Time Data Flows with Apache NiFi
Manish Gupta
 
Ad

Similar to Hadoop Security Today and Tomorrow (20)

PDF
August 2014 HUG : Comprehensive Security for Hadoop
Yahoo Developer Network
 
PDF
2014 sept 4_hadoop_security
Adam Muise
 
PPTX
Hadoop security
Shivaji Dutta
 
PPTX
Improvements in Hadoop Security
DataWorks Summit
 
PPTX
Improvements in Hadoop Security
Chris Nauroth
 
PDF
Hadoop & Security - Past, Present, Future
Uwe Printz
 
PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
PDF
TriHUG October: Apache Ranger
trihug
 
PDF
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk
 
PPTX
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
PDF
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
PDF
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
PPTX
Securing Data in Hadoop at Uber
DataWorks Summit
 
PPTX
Hadoop and Data Access Security
Cloudera, Inc.
 
PDF
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
Cloudera, Inc.
 
PPTX
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
PPTX
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
 
PPTX
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
PPTX
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
August 2014 HUG : Comprehensive Security for Hadoop
Yahoo Developer Network
 
2014 sept 4_hadoop_security
Adam Muise
 
Hadoop security
Shivaji Dutta
 
Improvements in Hadoop Security
DataWorks Summit
 
Improvements in Hadoop Security
Chris Nauroth
 
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
TriHUG October: Apache Ranger
trihug
 
Apache Argus - How do I secure my entire Hadoop cluster? Olivier Renault @ Ho...
huguk
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Discover Enterprise Security Features in Hortonworks Data Platform 2.1: Apach...
Hortonworks
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Discover HDP 2.1: Apache Hadoop 2.4.0, YARN & HDFS
Hortonworks
 
Securing Data in Hadoop at Uber
DataWorks Summit
 
Hadoop and Data Access Security
Cloudera, Inc.
 
HBaseCon 2012 | HBase Security for the Enterprise - Andrew Purtell, Trend Micro
Cloudera, Inc.
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
Fortifying Multi-Cluster Hybrid Cloud Data Lakes using Apache Knox
DataWorks Summit
 
Bridle your Flying Islands and Castles in the Sky: Built-in Governance and Se...
DataWorks Summit
 
Introduction to the Hadoop EcoSystem
Shivaji Dutta
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 

Recently uploaded (20)

PDF
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PPTX
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Exolore The Essential AI Tools in 2025.pdf
Srinivasan M
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
Chris Elwell Woburn, MA - Passionate About IT Innovation
Chris Elwell Woburn, MA
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
NewMind AI - Journal 100 Insights After The 100th Issue
NewMind AI
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
UiPath Academic Alliance Educator Panels: Session 2 - Business Analyst Content
DianaGray10
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Log-Based Anomaly Detection: Enhancing System Reliability with Machine Learning
Mohammed BEKKOUCHE
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 

Hadoop Security Today and Tomorrow

  • 1. Hadoop Security: Today and Tomorrow Vinay Shukla Hortonworks
  • 2. © Hortonworks Inc. 2014 Hadoop Security Today & Tomorrow Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos
  • 3. © Hortonworks Inc. 2014 Agenda • What is Hadoop Security? – 4 Security Pillars & Rings of Defense • What security elements exists today? – Authentication – Authorization – Audit – Data Protection • What is on the security roadmap? – Coming soon – Longer term projects • Securing Hadoop with Apache Knox Gateway – Knox overview – Demo • How to get involved
  • 4. © Hortonworks Inc. 2014 What is Apache Hadoop Security? Security in Apache Hadoop is defined by four key pillars: authentication, authorization, accountability, and data protection.
  • 5. © Hortonworks Inc. 2014 Two Reasons for Security in Hadoop Hadoop Contains Sensitive Data –As Hadoop adoption grows so too has the types of data organizations look to store. Often the data is proprietary or personal and it must be protected. –In this context, Hadoop is governed by the same security requirements as any data center platform. Hadoop is subject to Compliance adherence –Organizations are often subject to comply with regulations such as HIPPA, PCI DSS, FISAM that require protection of personal information. –Adherence to other Corporate security policies. 1 2
  • 6. © Hortonworks Inc. 2014 Security: Rings of Defense Perimeter Level Security • Network Security (i.e. Firewalls) • Apache Knox (i.e. Gateways) Data Protection • Core Hadoop • Partners Authentication • Kerberos OS Security Authorization • MR ACLs • HDFS Permissions • HDFS ACLs • HiveATZ-NG • HBase ACLs • Accumulo Label Security Page 6
  • 7. © Hortonworks Inc. 2014 Authentication in Hadoop Today… Authentication Who am I/prove it? Control access to cluster. Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway
  • 8. © Hortonworks Inc. 2014 Kerberos Authentication in Hadoop For more than 20 years, Kerberos has been the de-facto standard for strong authentication. …no other option exists. The design and implementation of Kerberos security in native Apache Hadoop was delivered by Hortonworker Owen O’Malley in 2010. What does Kerberos Do? – Establishes identity for clients, hosts and services – Prevents impersonation/passwords are never sent over the wire – Integrates w/ enterprise identity management tools such as LDAP & Active Directory – More granular auditing of data access/job execution
  • 9. © Hortonworks Inc. 2014 • Single Hadoop access point • REST API hierarchy • Consolidated API calls • Multi-cluster support • Eliminates SSH “edge node” • Central API management • Central audit control • Simple Service level Authorization • SSO Integration – Siteminder, API Key*, OAuth* & SAML* • LDAP & AD integration Perimeter Security with Apache Knox Integrated with existing systems to simplify identity maintenance Incubated and led by Hortonworks, Apache Knox provides a simple and open framework for Hadoop perimeter security. Single, simple point of access for a cluster Central controls ensure consistency across one or more clusters
  • 10. © Hortonworks Inc. 2014 Authentication & Audit in Hadoop today… Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Authentication Who am I/prove it? Control access to cluster.
  • 11. © Hortonworks Inc. 2014 Authorization: Who can do what in Hadoop? • Access Control Services exist for each of the Hadoop components –HDFS has file Permissions –YARN, MapReduce, HBase has Access Control Lists (ACL) –Accumulo Proves more granular label/cell level security • Improvements to these services are being led by Hortonworks Team: –HDFS Improvements – Extended ACL, more flexible via multiple policies on the same file or directory –Hive Improvements – Hortonworks initiative called Hive ATZ-NG, better integration allows familiar SQL/database syntax (GRANT/REVOKE) and allows more clients (including partner integrations) to be secure.
  • 12. © Hortonworks Inc. 2014 Data Protection in Hadoop today… Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Wire encryption in native Apache Hadoop Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
  • 13. © Hortonworks Inc. 2014 Data Protection in Hadoop must be applied at three different layers in Apache Hadoop Storage: encrypt data while it is at rest Direct data flows “into” and “out of” 3rd party encryption tools and/or rely upon hardware specific techniques (i.e. drive-level encryption). Transmission: encrypt data as it is in motion Native Apache Hadoop 2.0 provides wire encryption. Upon Access: apply restrictions when accessed Direct data flows “into” and “out of” 3rd party encryption tools. Data Protection
  • 14. © Hortonworks Inc. 2014 Data Protection – Details - Today • Encryption of Data at Rest –Option 1: OS or Hardware Level Encryption (Out of the Box) –Option 2: Custom Development –Option 3: Certified Partners –Work underway for encryption in Hive, HDFS and HBase as core platform capabilities. • Encryption of Data on the Wire –All wire protocols can be encrypted by HDP platform (2.x). Wire-level encryption enhancements led by HWX Team. • Column Level Encryption –No current out of the box support in Hadoop. –Certified Partners provide these capabilities.
  • 15. © Hortonworks Inc. 2014 What can be done today? Authorization Restrict access to explicit data Audit Understand who did what Data Protection Encrypt data at rest & motion Kerberos in native Apache Hadoop Perimeter Security with Apache Knox Gateway Native in Apache Hadoop • MapReduce Access Control Lists • HDFS Permissions • Process Execution audit trail Cell level access control in Apache Accumulo Service level Authorization with Knox Access Audit with Knox Wire encryption in native Apache Hadoop Wire Encryption with Knox Orchestrated encryption with 3rd party tools Authentication Who am I/prove it? Control access to cluster.
  • 16. © Hortonworks Inc. 2014 Hadoop Security Hortonworks is Delivering Secure Hadoop for the Enterprise Security for Hadoop must be addressed within every layer of the stack and integrated into existing frameworks For a full description of what is available in Enterprise Hadoop today across Authentication, Authorization, Accountability and Data Protection please visit our security labs page Governance &Integration Security Operations Data Access Data Management HDP 2.1 New: Apache Knox Perimeter security for Hadoop  A common place to preform authentication across Hadoop and all related projects  Integrated to LDAP and AD  Currently supports: WebHDFS, WebHCAT, Oozie, Hive & HBase  Broad community effort, incubated with Microsoft, broad set of developers involved Security Investments Security Phase 3: • Audit event correlation and Audit viewer • Data Encryption in HDFS, Hive & HBase • Knox for HDFS HA, Ambari & Falcon • Support Token-Based AuthN beyond Kerb Security Phase 2: • ACLs for HDFS • Knox: Hadoop REST API Security • SQL-style Hive AuthZ (GRANT, REVOKE) • SSL support for Hive Server 2 • SSL for DN/NN UI & WebHDFS • PAM support for Hive Phase 1 • Strong AuthN with Kerberos • HBase, Hive, HDFS basic AuthZ • Encryption with SSL for NN, JT, etc. • Wire encryption with Shuffle, HDFS, JDBC
  • 17. © Hortonworks Inc. 2014 Hadoop Security: Phase 2 Page 17 HDP 2.1 Features Release Theme REST API Security, Improve AuthZ, Wire Encryption Specific Features • Hadoop REST API Security with Apache Knox • Eliminates SSH edge node • Single Hadoop access point • LDAP, AD based Authentication • Service-level Authorization • Audit support for REST access • SQL style Hive Authorization with fine grain access • HDFS Access Control Lists • SSL support in HiveServer2 • SSL support in NN/DN UI & WebHDFS • Pluggable Authentication Module (PAM) in Hive Included Components Apache Knox, Hive, HDFS
  • 18. © Hortonworks Inc. 2014 Why Knox? From fb.com/hadoopmemes Apache Knox Gateway • REST/HTTP API security for Hadoop • Eliminates SSH edge node • Single REST API access point • Centralized Authentication, Authorization, and Audit for Hadoop REST/HTTP services • LDAP/AD Authentication, Service Authorization, Audit etc. Knox Eliminates • Client’s requirements for intimate knowledge of cluster topology
  • 19. © Hortonworks Inc. 2014 Knox Deployment with Hadoop Cluster Application Tier DMZ Switch Switch …. Master Nodes Rack 1 Switch NN SNN …. Slave Nodes Rack 2 …. Slave Nodes Rack N SwitchSwitch DN DN Web Tier LB Knox Hadoop CLIs
  • 20. © Hortonworks Inc. 2014 Hadoop REST API Security: Drill-Down REST Client Enterprise Identity Provider LDAP/AD Knox Gateway GW GW Firewall Firewall DMZ L B Edge Node/H adoop CLIs RPC HTTP HTTP HTTP LDAP Hadoop Cluster 1 Masters Slaves RM NN Web HCat Oozie DN NM HS2 Hadoop Cluster 2 Masters Slaves RM NN Web HCat Oozie DN NM HS2 HBase HBase
  • 21. © Hortonworks Inc. 2014 Selects appropriate service filter chain based on request URL mapping rules REST Client Protocol Listener Listens for requests on the appropriate protocols (e.g. HTTP/HTTPS) Service Selector Service Specific Filter Chain Identity Asserter Filter Dispatch Rewrite Filter AuthN Filter Hadoop Service Enforces propagation of authenticated identity to Hadoop by modifying request Streams request and response to and from Hadoop service based on rewritten URLs Translates URLs in request and response between external and internal URLs based on service specific rules Enterprise Identity Provider Enterprise/Cl oud SSO Provider Challenges client for credentials and authenticates or validates SSO Token Service filter chains are composed and configured at deployment time by service specific plugins What is Knox? Client > Knox > Hadoop Cluster Knox Gateway
  • 22. © Hortonworks Inc. 2014© Hortonworks Inc. 2014 Knox Gateway in action Submit MR job via Knox
  • 23. © Hortonworks Inc. 2014 HDFS & MR Operations with Knox • Create a few directories curl -iku guest:guest-password -X PUT 'https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=MKDIRS&permission=777' curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input?op=MKDIRS&permission=777" curl -iku guest:guest-password -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib?op=MKDIRS&permission=777" • Upload files curl -iku guest:guest-password -L -T samples/hadoop-examples.jar -X PUT https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/lib/hadoop- examples.jar?op=CREATE curl -iku guest:guest-password -X PUT -L -T README -X PUT "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/input/README?op=CREATE" • Run MR job curl -iku guest:guest-password -X POST -d arg=/user/guest/test/input -d arg=/user/guest/test/output -d jar=/user/guest/test/lib/hadoop-examples.jar -d class=org.apache.hadoop.examples.WordCount https://localhost:8443/gateway/sandbox/templeton/v1/mapreduce/jar • Query the jobs for a user curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue • Query the status of a given job curl -iku guest:guest-password https://localhost:8443/gateway/sandbox/templeton/v1/queue/<job_id> • Read the output file curl -iku guest:guest-password -L -X GET https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test/output/part-r-00000?op=OPEN • Remove a directory curl -iku guest:guest-password -X DELETE "https://localhost:8443/gateway/sandbox/webhdfs/v1/user/guest/test?op=DELETE&recursive=true"
  • 24. © Hortonworks Inc. 2014 How to get Involved Resource Location Security Labs http://hortonworks.com/labs/security/ Security Blogs http://hortonworks.com/blog/category/innovation/security/ Apache Knox Tutorial http://hortonworks.com/hadoop-tutorial/securing-hadoop- infrastructure-apache-knox/ Need help? http://hortonworks.com/community/forums/forum/security/ or [email protected]
  • 25. © Hortonworks Inc. 2014 Thank you! Amsterdam - April3rd, 2014 Vinay Shukla Twitter: @NeoMythos

Editor's Notes

  • #19: BackgroundHortonworks led initiativeUseful for connecting to Hadoop from the outside the clusterWhen more client language flexibility is requiredi.e. Java binding not an optionNot intended for RPC callsCall it REST API Gateway for HadoopDon’t call it a firewallFirewalls are at the network layerDon’t call is perimeter securityPerimeter security is getting discredited as an incomplete security solution
  • #21: Node the arrows to Hadoop Cluster are simplificationsActually there will be multiple arrow – one per port open between Knox and Hadoop Services it supports (WebHDFS, WebHCAT, HiveServer2, HBase, Oozie) &amp; more in future
  • #22: Functions as HTTP reverse proxyRe-writes URLs to protect internal network topologyKnox Gateway embeds Jetty containerReads/Writes HTTP