SlideShare a Scribd company logo
© Cloudera, Inc. All rights reserved.
SECURING DATA IN HYBRID ENVIRONMENTS
USING APACHE RANGER
Don Bosco Durai, Privacera
Apache Ranger PMC
Madhan Neethiraj, Cloudera
Apache Ranger PMC, Apache Atlas PMC
© Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved.
DISCLAIMER
• This document may contain product features and technology directions that are under
development, may be under development in the future or may ultimately not be developed.
• Project capabilities are based on information that is publicly available within the Apache
Software Foundation project websites ("Apache"). Progress of the project capabilities can be
tracked from inception to release through Apache, however, technical feasibility, market
demand, user feedback and the overarching Apache Software Foundation community
development process can all effect timing and final delivery.
• This document’s description of these features and technology directions does not represent a
contractual commitment, promise or obligation from Cloudera and Privacera to deliver these
features in any generally available product.
• Product features and technology directions are subject to change, and must not be included in
contracts, purchase orders, or sales agreements of any kind.
• Since this document contains an outline of general product development plans, customers
should not rely upon it when making purchasing decisions.
© Cloudera, Inc. All rights reserved.
ABOUT PRIVACERA
Privacera Confidential
CLOUDACCESS MANAGER CLOUD
DISCOVERY
Storage
SQL
No SQL
Streaming,
Serverless,
ML
CLOUD
ANONYMIZATION
© Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved.
AGENDA
Apache Ranger overview
Security Challenges Hybrid Deployment
Implementing Hybrid Security using Ranger
New Features: Security Zones, Role Based Access Control, Conditions at Policy Scope
Demo
Questions
© Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved.
APACHE RANGER: OVERVIEW - HISTORY
Jul 2014
Enters Incubation
Nov 2014
Ranger 0.4.0
Jun 2015
Ranger 0.5.0
x
Jul 2016
Ranger 0.6.0
Nov 2016
Ranger 0.6.2x
Jan 2017
Ranger TLP
graduation!
Jun 2017
Ranger 0.7.1
Mar 2018
Ranger 1.0.0
• Committers: 29
• Contributors
from:
eBay, MSFT,
Huawei, Pandora,
Accenture, ING,
Talend, ZTE
Ranger 1.1.0Ranger 0.7.x
• Tag based Masking
• Export/import of Policies
• $User and macros
• User Sync Nested LDAP
Support
• Plugin status tab
• “Show columns” and
“describe extended support”
• Incremental LDAP Sync
• Time based policies
• Metadata security
• Audit only (compliance) role
• Hive UDF usage authorization
• Show Hive query in audits
• Policy labels
• Audit enhancements
Feb 2017
Ranger 0.7.0
Jul 2018
Ranger 1.1.0
May 2014
XASecure
Acquisition
Ranger 2.0.0
~May 2019
Ranger 2.0.0
Oct 2018
Ranger 1.2.0
Jan 2016
Ranger 0.5.1
Aug 2016
Ranger 0.6.1
• Hadoop3 version updates
• Security zones
• Policy level custom
conditions
• Role based authorization
• DB Schema optimization
for faster policy CRUD
• Hadoop Trusted-proxy
authentication
© Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved.
APACHE RANGER: OVERVIEW – FEATURES
• Centralized policy administration
• Centralized auditing
• Dynamic row filtering
• Dynamic data masking
• Tag based authorization and data-masking policies
• Rich & extendable policy enforcement engine
• Key Management System (KMS)
• New Feature: Security Zones
• New Feature: Support for Roles Based Access Control
• New Feature: Conditions at policy scope
© Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved.
APACHE RANGER: OVERVIEW – CENTRALIZED AUTHORIZATION
© Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved.
SECURITY IN HYBRID ENVIRONMENT
© Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: OVERVIEW
On Premise
HDFS Hive Kafka Spark
Hive
Ranger
HDInsight
HiveSpark
EMR
Ranger
Ranger DB
Presto
Security
Admins
Data
Stewards
© Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: SECURITY CHALLENGES
• Every environment has different security model
• Access policies needs to be set in each environment
• Policies needs to be consistent
• The granularity of access control are not the same
• Policies can go out of sync very soon
• Regulation and compliance requirements on what data
can be copied to cloud and whether it should be
encrypted or deidentified
© Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved.
HDInsight
Option #1
Restrict Data from On-premise
Option #2
Centralized Ranger
© Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: OPTION #1
• Filter & Redact data copied to cloud
• Use Hive to export data to S3
• Apply Ranger Row Level Filtering and Column Masking on ETL user (e.g.
s3etl)
• Setup cloud native access policies for copied data
© Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved.
APACHE RANGER: ROW-FILTER, COLUMN-MASKING POLICIES
ID CONSENT TAX_ID NAME EMAIL
1 Y 123456789 John john@acme.com
2 Y 987654321 Jane jane@acme.com
3 N 789654123 Mary mary@acme.com
4 Y 321789654 David david@acme.com
5 N 456321789 Max max@acme.com
ID CONSENT TAX_ID NAME EMAIL
1 Y xxxxxxxxxx John dkrx@acme.com
2 Y xxxxxxxxxx Jane yafe@acme.com
4 Y xxxxxxxxxx David aumd2@acme.com
© Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved.
APACHE RANGER: ROW-FILTER, COLUMN-MASKING POLICIES
© Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: OPTION #1 – PROS AND CONS
• Advantages
• Simple to implement
• Fine grained policies enforced on premise using Filtering, Redaction and Transformation
• Use cloud security policy for coarse grain policies
• Make data accessible to non-Ranger supported services like AWS Redshift, AWS Athena,
SageMaker, etc.
• Limitation
• Not real-time
• If policies changes, then data need to be recopied to cloud
• Need to manage policies on both the sides
© Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: OPTION #2 - CENTRALIZED SECURITY
On Premise
HDFS Hive Kafka Spark
Hive
Ranger
HDInsight
HiveSpark
EMR
Ranger
Ranger DB
Presto
Security
Admins
Data
Stewards
© Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: OPTION #2
• Common Ranger Admin or Ranger Database for all environments
• Single Ranger to manage the policies for all environments
• If you are using the same name for resources, e.g. Database, Table and
Column name, then a same policy would be used by all the environments
• Tag-based policies can be used to authorize access to cloud-specific data as
well
• Use new Ranger features under development to support central policy
management
• Security Zone
• Scoped Policy
• Roles in Ranger
© Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved.
HYBRID DEPLOYMENT: OPTION #2 – PROS AND CONS
• Advantages
• Centrally Manage security policies for all environments
• Policy changes applied in real-time in all environments
• Leverage Tag Based policies for consistent behavior
• Increasing support for Ranger by 3rd party vendors. Privacera, StarBurst, Dremio, Microsoft,
EMC Isilon, etc.
• Limitation
• Need reliable and secure network connectivity between premise and cloud (site to site VPN)
• All cloud components might be not supported by Open Source Ranger.
• Ranger integration for cloud environment is not supported by the community and will require
additional setup in the cloud services/deployments
© Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved.
PRIVACERA EXTENSION TO APACHE RANGER
© Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved.
DEMO
© Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved.
SECURITY ZONES
© Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved.
APACHE RANGER: SECURITY ZONES - INTRODUCTION
• Partition resources for easier administration of security policies
• Policies in a zone are applied only for resources included in the
zone. For example:
• a landing zone policy for db=* applies only for the resources of landing
zone. It will not impact other resources, like db=marketing
• Policy administration for each zone can be delegated to specific
users/groups
Zone HDFS Hive HBase Kafka
landing /landing/ db=*landing
staging /staging/ db=*staging table=*staging
marketing /marketing db=marketing table=marketing topic=mktg_campaign
© Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved.
APACHE RANGER: SECURITY ZONES - INTRODUCTION
• Audit log includes zone name, allows to quickly filter accesses to
resources of a zone
• REST API for Security Zone administration
• Example use cases:
• ‘on-prem’ zone for resources that should only be accessible from on-prem
clusters
• ‘test-data’ zone for resources that can be used for test purposes by wider
set of users/groups, without impacting production data
© Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved.
APACHE RANGER: SECURITY ZONES - ADMINISTRATION
© Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved.
APACHE RANGER: SECURITY ZONES - ADMINISTRATION
© Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved.
APACHE RANGER: SECURITY ZONES - POLICY ADMINISTRATION
• Users see only zones in
which they have admin
privileges
• Zone support extends to
access, data-masking,
row-filter and tag-based
policies
© Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved.
APACHE RANGER: SECURITY ZONES – AUDIT LOGS
• Shows zone of the
accessed resource
• Audits can be filtered by
zone
• Only policies in zone of
the accessed resource
are used to authorize
© Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved.
ROLE BASED ACCESS CONTROL
© Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved.
APACHE RANGER: ROLE BASED ACCESS CONTROL - INTRODUCTION
• Ranger policy model extended to support roles
• RBAC is widely used in enterprise applications & cloud environments
• Roles can be used in
• resource-based authorization policies
• tag-based authorization policies
• data-masking policies
• row-filtering policies
• Role management REST API
© Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved.
APACHE RANGER: ROLE BASED ACCESS CONTROL – ROLE ADMIN
© Cloudera, Inc. All rights reserved. 31© Cloudera, Inc. All rights reserved.
APACHE RANGER: ROLE BASED ACCESS CONTROL - POLICY
© Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved.
CONDITIONS AT POLICY SCOPE
© Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved.
APACHE RANGER: CONDITIONS AT POLICY SCOPE - INTRODUCTION
• Conditions can now be set at policy scope, in addition to policy-item scope
• Simplifies use of conditions in policies
• Example use cases:
• Policies specific to access cluster i.e. on-prem, cloud
• Multiple policies for a given tag, for different tag-attribute values
i.e. PII type=email, PII: type=ccn
© Cloudera, Inc. All rights reserved. 34© Cloudera, Inc. All rights reserved.
APACHE RANGER: CONDITIONS AT POLICY SCOPE - SAMPLE
Access cluster type: cloud
© Cloudera, Inc. All rights reserved. 35© Cloudera, Inc. All rights reserved.
APACHE RANGER: CONDITIONS AT POLICY SCOPE - SAMPLE
tagAttr.type == ‘ccn’
tagAttr.type == ‘email’
© Cloudera, Inc. All rights reserved.
THANK YOU

More Related Content

What's hot (20)

PPTX
Big Data Platform Industrialization
DataWorks Summit/Hadoop Summit
 
PPTX
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PDF
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
PPTX
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
PPTX
Scaling HDFS at Xiaomi
DataWorks Summit
 
PPTX
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
PPTX
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
PPTX
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
PPTX
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
PPTX
Data protection for hadoop environments
DataWorks Summit
 
PPTX
Storage Requirements and Options for Running Spark on Kubernetes
DataWorks Summit
 
PPTX
Accelerating Big Data Insights
DataWorks Summit
 
PPTX
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
 
PPTX
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
PDF
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
PPTX
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
PPTX
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 
Big Data Platform Industrialization
DataWorks Summit/Hadoop Summit
 
Hdfs 2016-hadoop-summit-san-jose-v4
Chris Nauroth
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Ozone: Evolution of HDFS scalability & built-in GDPR compliance
Dinesh Chitlangia
 
Hadoop in the Cloud - The what, why and how from the experts
DataWorks Summit/Hadoop Summit
 
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Scaling HDFS at Xiaomi
DataWorks Summit
 
Securing Spark Applications by Kostas Sakellis and Marcelo Vanzin
Spark Summit
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
DataWorks Summit
 
Multi-Tenant Operations with Cloudera 5.7 & BT
Cloudera, Inc.
 
Hybrid Data Platform
DataWorks Summit/Hadoop Summit
 
Data protection for hadoop environments
DataWorks Summit
 
Storage Requirements and Options for Running Spark on Kubernetes
DataWorks Summit
 
Accelerating Big Data Insights
DataWorks Summit
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
DataWorks Summit/Hadoop Summit
 
Enabling Modern Application Architecture using Data.gov open government data
DataWorks Summit
 
Leveraging docker for hadoop build automation and big data stack provisioning
Evans Ye
 
Faster Batch Processing with Cloudera 5.7: Hive-on-Spark is ready for production
Cloudera, Inc.
 
Securing data in hybrid environments using Apache Ranger
DataWorks Summit
 

Similar to Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger (20)

PPTX
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Cloudera, Inc.
 
PPTX
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
 
PDF
Apache Ranger
Mike Frampton
 
PPTX
Ranger admin dev overview
Tushar Dudhatra
 
PPTX
Hadoop and Data Access Security
Cloudera, Inc.
 
PDF
Cloudera GoDataFest Security and Governance
GoDataDriven
 
PPTX
Fighting cyber fraud with hadoop
Niel Dunnage
 
PPTX
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
 
PPTX
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
PPTX
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
PPTX
RecordService for Unified Access Control
Cloudera, Inc.
 
PPTX
Project Rhino: Enhancing Data Protection for Hadoop
Cloudera, Inc.
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
PDF
TriHUG October: Apache Ranger
trihug
 
PPTX
An Apache Hive Based Data Warehouse
DataWorks Summit
 
PPTX
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
PPTX
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop security implementationon 20171003
lee tracie
 
PPTX
Security implementation on hadoop
Wei-Chiu Chuang
 
Combat Cyber Threats with Cloudera Impala & Apache Hadoop
Cloudera, Inc.
 
Comprehensive Security for the Enterprise II: Guarding the Perimeter and Cont...
Cloudera, Inc.
 
Apache Ranger
Mike Frampton
 
Ranger admin dev overview
Tushar Dudhatra
 
Hadoop and Data Access Security
Cloudera, Inc.
 
Cloudera GoDataFest Security and Governance
GoDataDriven
 
Fighting cyber fraud with hadoop
Niel Dunnage
 
Configuring a Secure, Multitenant Cluster for the Enterprise
Cloudera, Inc.
 
Security needs in Hadoop’s Current and Future – How Apache Ranger can help?
DataWorks Summit
 
Hadoop security @ Philly Hadoop Meetup May 2015
Shravan (Sean) Pabba
 
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
RecordService for Unified Access Control
Cloudera, Inc.
 
Project Rhino: Enhancing Data Protection for Hadoop
Cloudera, Inc.
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 
TriHUG October: Apache Ranger
trihug
 
An Apache Hive Based Data Warehouse
DataWorks Summit
 
Hive edw-dataworks summit-eu-april-2017
alanfgates
 
Securing Spark Applications
DataWorks Summit/Hadoop Summit
 
Hadoop security implementationon 20171003
lee tracie
 
Security implementation on hadoop
Wei-Chiu Chuang
 
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
DataWorks Summit
 
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
PPTX
Managing the Dewey Decimal System
DataWorks Summit
 
PPTX
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
PPTX
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
PDF
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
PPTX
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
PPTX
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Data Science Crash Course
DataWorks Summit
 
Floating on a RAFT: HBase Durability with Apache Ratis
DataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
DataWorks Summit
 
Managing the Dewey Decimal System
DataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
DataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
DataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
DataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
DataWorks Summit
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
DataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
DataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
DataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
DataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
DataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
DataWorks Summit
 
Ad

Recently uploaded (20)

PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
PDF
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
PDF
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
PPTX
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
PPTX
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
PDF
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
PPTX
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PPTX
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
Future-Proof or Fall Behind? 10 Tech Trends You Can’t Afford to Ignore in 2025
DIGITALCONFEX
 
UiPath DevConnect 2025: Agentic Automation Community User Group Meeting
DianaGray10
 
Reverse Engineering of Security Products: Developing an Advanced Microsoft De...
nwbxhhcyjv
 
Agentforce World Tour Toronto '25 - Supercharge MuleSoft Development with Mod...
Alexandra N. Martinez
 
Mastering ODC + Okta Configuration - Chennai OSUG
HathiMaryA
 
“Computer Vision at Sea: Automated Fish Tracking for Sustainable Fishing,” a ...
Edge AI and Vision Alliance
 
MuleSoft MCP Support (Model Context Protocol) and Use Case Demo
shyamraj55
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
AI Penetration Testing Essentials: A Cybersecurity Guide for 2025
defencerabbit Team
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
Agentforce World Tour Toronto '25 - MCP with MuleSoft
Alexandra N. Martinez
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 

Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger

  • 1. © Cloudera, Inc. All rights reserved. SECURING DATA IN HYBRID ENVIRONMENTS USING APACHE RANGER Don Bosco Durai, Privacera Apache Ranger PMC Madhan Neethiraj, Cloudera Apache Ranger PMC, Apache Atlas PMC
  • 2. © Cloudera, Inc. All rights reserved. 2© Cloudera, Inc. All rights reserved. DISCLAIMER • This document may contain product features and technology directions that are under development, may be under development in the future or may ultimately not be developed. • Project capabilities are based on information that is publicly available within the Apache Software Foundation project websites ("Apache"). Progress of the project capabilities can be tracked from inception to release through Apache, however, technical feasibility, market demand, user feedback and the overarching Apache Software Foundation community development process can all effect timing and final delivery. • This document’s description of these features and technology directions does not represent a contractual commitment, promise or obligation from Cloudera and Privacera to deliver these features in any generally available product. • Product features and technology directions are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind. • Since this document contains an outline of general product development plans, customers should not rely upon it when making purchasing decisions.
  • 3. © Cloudera, Inc. All rights reserved. ABOUT PRIVACERA Privacera Confidential CLOUDACCESS MANAGER CLOUD DISCOVERY Storage SQL No SQL Streaming, Serverless, ML CLOUD ANONYMIZATION
  • 4. © Cloudera, Inc. All rights reserved. 4© Cloudera, Inc. All rights reserved. AGENDA Apache Ranger overview Security Challenges Hybrid Deployment Implementing Hybrid Security using Ranger New Features: Security Zones, Role Based Access Control, Conditions at Policy Scope Demo Questions
  • 5. © Cloudera, Inc. All rights reserved. 5© Cloudera, Inc. All rights reserved. APACHE RANGER: OVERVIEW - HISTORY Jul 2014 Enters Incubation Nov 2014 Ranger 0.4.0 Jun 2015 Ranger 0.5.0 x Jul 2016 Ranger 0.6.0 Nov 2016 Ranger 0.6.2x Jan 2017 Ranger TLP graduation! Jun 2017 Ranger 0.7.1 Mar 2018 Ranger 1.0.0 • Committers: 29 • Contributors from: eBay, MSFT, Huawei, Pandora, Accenture, ING, Talend, ZTE Ranger 1.1.0Ranger 0.7.x • Tag based Masking • Export/import of Policies • $User and macros • User Sync Nested LDAP Support • Plugin status tab • “Show columns” and “describe extended support” • Incremental LDAP Sync • Time based policies • Metadata security • Audit only (compliance) role • Hive UDF usage authorization • Show Hive query in audits • Policy labels • Audit enhancements Feb 2017 Ranger 0.7.0 Jul 2018 Ranger 1.1.0 May 2014 XASecure Acquisition Ranger 2.0.0 ~May 2019 Ranger 2.0.0 Oct 2018 Ranger 1.2.0 Jan 2016 Ranger 0.5.1 Aug 2016 Ranger 0.6.1 • Hadoop3 version updates • Security zones • Policy level custom conditions • Role based authorization • DB Schema optimization for faster policy CRUD • Hadoop Trusted-proxy authentication
  • 6. © Cloudera, Inc. All rights reserved. 6© Cloudera, Inc. All rights reserved. APACHE RANGER: OVERVIEW – FEATURES • Centralized policy administration • Centralized auditing • Dynamic row filtering • Dynamic data masking • Tag based authorization and data-masking policies • Rich & extendable policy enforcement engine • Key Management System (KMS) • New Feature: Security Zones • New Feature: Support for Roles Based Access Control • New Feature: Conditions at policy scope
  • 7. © Cloudera, Inc. All rights reserved. 7© Cloudera, Inc. All rights reserved. APACHE RANGER: OVERVIEW – CENTRALIZED AUTHORIZATION
  • 8. © Cloudera, Inc. All rights reserved. 8© Cloudera, Inc. All rights reserved. SECURITY IN HYBRID ENVIRONMENT
  • 9. © Cloudera, Inc. All rights reserved. 9© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: OVERVIEW On Premise HDFS Hive Kafka Spark Hive Ranger HDInsight HiveSpark EMR Ranger Ranger DB Presto Security Admins Data Stewards
  • 10. © Cloudera, Inc. All rights reserved. 10© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: SECURITY CHALLENGES • Every environment has different security model • Access policies needs to be set in each environment • Policies needs to be consistent • The granularity of access control are not the same • Policies can go out of sync very soon • Regulation and compliance requirements on what data can be copied to cloud and whether it should be encrypted or deidentified
  • 11. © Cloudera, Inc. All rights reserved. 11© Cloudera, Inc. All rights reserved. HDInsight Option #1 Restrict Data from On-premise Option #2 Centralized Ranger
  • 12. © Cloudera, Inc. All rights reserved. 12© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: OPTION #1 • Filter & Redact data copied to cloud • Use Hive to export data to S3 • Apply Ranger Row Level Filtering and Column Masking on ETL user (e.g. s3etl) • Setup cloud native access policies for copied data
  • 13. © Cloudera, Inc. All rights reserved. 13© Cloudera, Inc. All rights reserved. APACHE RANGER: ROW-FILTER, COLUMN-MASKING POLICIES ID CONSENT TAX_ID NAME EMAIL 1 Y 123456789 John [email protected] 2 Y 987654321 Jane [email protected] 3 N 789654123 Mary [email protected] 4 Y 321789654 David [email protected] 5 N 456321789 Max [email protected] ID CONSENT TAX_ID NAME EMAIL 1 Y xxxxxxxxxx John [email protected] 2 Y xxxxxxxxxx Jane [email protected] 4 Y xxxxxxxxxx David [email protected]
  • 14. © Cloudera, Inc. All rights reserved. 14© Cloudera, Inc. All rights reserved. APACHE RANGER: ROW-FILTER, COLUMN-MASKING POLICIES
  • 15. © Cloudera, Inc. All rights reserved. 15© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: OPTION #1 – PROS AND CONS • Advantages • Simple to implement • Fine grained policies enforced on premise using Filtering, Redaction and Transformation • Use cloud security policy for coarse grain policies • Make data accessible to non-Ranger supported services like AWS Redshift, AWS Athena, SageMaker, etc. • Limitation • Not real-time • If policies changes, then data need to be recopied to cloud • Need to manage policies on both the sides
  • 16. © Cloudera, Inc. All rights reserved. 16© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: OPTION #2 - CENTRALIZED SECURITY On Premise HDFS Hive Kafka Spark Hive Ranger HDInsight HiveSpark EMR Ranger Ranger DB Presto Security Admins Data Stewards
  • 17. © Cloudera, Inc. All rights reserved. 17© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: OPTION #2 • Common Ranger Admin or Ranger Database for all environments • Single Ranger to manage the policies for all environments • If you are using the same name for resources, e.g. Database, Table and Column name, then a same policy would be used by all the environments • Tag-based policies can be used to authorize access to cloud-specific data as well • Use new Ranger features under development to support central policy management • Security Zone • Scoped Policy • Roles in Ranger
  • 18. © Cloudera, Inc. All rights reserved. 18© Cloudera, Inc. All rights reserved. HYBRID DEPLOYMENT: OPTION #2 – PROS AND CONS • Advantages • Centrally Manage security policies for all environments • Policy changes applied in real-time in all environments • Leverage Tag Based policies for consistent behavior • Increasing support for Ranger by 3rd party vendors. Privacera, StarBurst, Dremio, Microsoft, EMC Isilon, etc. • Limitation • Need reliable and secure network connectivity between premise and cloud (site to site VPN) • All cloud components might be not supported by Open Source Ranger. • Ranger integration for cloud environment is not supported by the community and will require additional setup in the cloud services/deployments
  • 19. © Cloudera, Inc. All rights reserved. 19© Cloudera, Inc. All rights reserved. PRIVACERA EXTENSION TO APACHE RANGER
  • 20. © Cloudera, Inc. All rights reserved. 20© Cloudera, Inc. All rights reserved. DEMO
  • 21. © Cloudera, Inc. All rights reserved. 21© Cloudera, Inc. All rights reserved. SECURITY ZONES
  • 22. © Cloudera, Inc. All rights reserved. 22© Cloudera, Inc. All rights reserved. APACHE RANGER: SECURITY ZONES - INTRODUCTION • Partition resources for easier administration of security policies • Policies in a zone are applied only for resources included in the zone. For example: • a landing zone policy for db=* applies only for the resources of landing zone. It will not impact other resources, like db=marketing • Policy administration for each zone can be delegated to specific users/groups Zone HDFS Hive HBase Kafka landing /landing/ db=*landing staging /staging/ db=*staging table=*staging marketing /marketing db=marketing table=marketing topic=mktg_campaign
  • 23. © Cloudera, Inc. All rights reserved. 23© Cloudera, Inc. All rights reserved. APACHE RANGER: SECURITY ZONES - INTRODUCTION • Audit log includes zone name, allows to quickly filter accesses to resources of a zone • REST API for Security Zone administration • Example use cases: • ‘on-prem’ zone for resources that should only be accessible from on-prem clusters • ‘test-data’ zone for resources that can be used for test purposes by wider set of users/groups, without impacting production data
  • 24. © Cloudera, Inc. All rights reserved. 24© Cloudera, Inc. All rights reserved. APACHE RANGER: SECURITY ZONES - ADMINISTRATION
  • 25. © Cloudera, Inc. All rights reserved. 25© Cloudera, Inc. All rights reserved. APACHE RANGER: SECURITY ZONES - ADMINISTRATION
  • 26. © Cloudera, Inc. All rights reserved. 26© Cloudera, Inc. All rights reserved. APACHE RANGER: SECURITY ZONES - POLICY ADMINISTRATION • Users see only zones in which they have admin privileges • Zone support extends to access, data-masking, row-filter and tag-based policies
  • 27. © Cloudera, Inc. All rights reserved. 27© Cloudera, Inc. All rights reserved. APACHE RANGER: SECURITY ZONES – AUDIT LOGS • Shows zone of the accessed resource • Audits can be filtered by zone • Only policies in zone of the accessed resource are used to authorize
  • 28. © Cloudera, Inc. All rights reserved. 28© Cloudera, Inc. All rights reserved. ROLE BASED ACCESS CONTROL
  • 29. © Cloudera, Inc. All rights reserved. 29© Cloudera, Inc. All rights reserved. APACHE RANGER: ROLE BASED ACCESS CONTROL - INTRODUCTION • Ranger policy model extended to support roles • RBAC is widely used in enterprise applications & cloud environments • Roles can be used in • resource-based authorization policies • tag-based authorization policies • data-masking policies • row-filtering policies • Role management REST API
  • 30. © Cloudera, Inc. All rights reserved. 30© Cloudera, Inc. All rights reserved. APACHE RANGER: ROLE BASED ACCESS CONTROL – ROLE ADMIN
  • 31. © Cloudera, Inc. All rights reserved. 31© Cloudera, Inc. All rights reserved. APACHE RANGER: ROLE BASED ACCESS CONTROL - POLICY
  • 32. © Cloudera, Inc. All rights reserved. 32© Cloudera, Inc. All rights reserved. CONDITIONS AT POLICY SCOPE
  • 33. © Cloudera, Inc. All rights reserved. 33© Cloudera, Inc. All rights reserved. APACHE RANGER: CONDITIONS AT POLICY SCOPE - INTRODUCTION • Conditions can now be set at policy scope, in addition to policy-item scope • Simplifies use of conditions in policies • Example use cases: • Policies specific to access cluster i.e. on-prem, cloud • Multiple policies for a given tag, for different tag-attribute values i.e. PII type=email, PII: type=ccn
  • 34. © Cloudera, Inc. All rights reserved. 34© Cloudera, Inc. All rights reserved. APACHE RANGER: CONDITIONS AT POLICY SCOPE - SAMPLE Access cluster type: cloud
  • 35. © Cloudera, Inc. All rights reserved. 35© Cloudera, Inc. All rights reserved. APACHE RANGER: CONDITIONS AT POLICY SCOPE - SAMPLE tagAttr.type == ‘ccn’ tagAttr.type == ‘email’
  • 36. © Cloudera, Inc. All rights reserved. THANK YOU