SlideShare a Scribd company logo
1
Open Source Security
Tools For Big Data
Rommel Garcia
@rommelgarcia
Hortonworks
2
# whoami
 Global Security SME Lead @hortonworks
 Senior Solutions Engineer @hortonworks
 Book Author – Virtualizing Hadoop
 Co-organizer of Atlanta Hadoop User Group
 Regular Speaker at Big Data Conferences
Big Data Landscape
4
DATA – More Volume and More Types
I N C R E A S I N G D ATA V A R I E T Y A N D C O M P L E X I T Y
USER GENERATED CONTENT
MOBILE WEB
SMS/MMS
SENTIMENT
EXTERNAL
DEMOGRAPHICS
HD VIDEO
SPEECH TO TEXT
PRODUCT/
SERVICE LOGS
SOCIAL NETWORK
BUSINESS
DATA FEEDS
USER CLICK STREAM
WEB LOGS
OFFER HISTORY DYNAMIC PRICING
A/B TESTING
AFFILIATE NETWORKS
SEARCH MARKETING
BEHAVIORAL TARGETING
DYNAMIC FUNNELSPAYMENT
RECORD
SUPPORT
CONTACTS
CUSTOMER
TOUCHESPURCHASE DETAIL
PURCHASE
RECORD
SEGMENTATIONOFFER DETAILS
P E TA BY T E S
T E R A BY T E S
G I G A BY T E S
E X A BY T E S
E R P
BIG DATA
WEB
CR M
5
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modeling
Fraud detection
Compliance (AML, KYC)
Bank 3.0
Information security
Single view of customer
Trading applications
Market data management
ANALYSIS & VISUALIZATION
Security
Operations
Governance
&Integration
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS
(Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column
Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS
WORD
DOCUMENTS
LOCATION DATA SENSOR DATA
CUSTOMER
SENTIMENT
RESEARCH
REPORTS
6
• HIPAA - Health Insurance Portability and Accountability Act of 1996
• HITECH - The Health Information Technology for Economic and Clinical Health Act
• PCI DSS - Payment Card Industry Data Security Standard
• SOX - The Sarbanes-Oxley Act of 2003
• ISO - International Organization Standardization
• COBIT - Control Objectives for Information and Related Technology
• Corporate Security Policies
Compliance Adherences
Big Data Security
8
• Authentication
• Authorization
• Audit
• Data at rest/in-motion Encryption
• Centralized Administration
5 Pillars of Security
9
Big Data Ecosystem
Big Data Platform
DATA REPOSITORIES
Risk modeling
Fraud detection
Compliance (AML, KYC)
Bank 3.0
Information security
Single view of customer
Trading applications
Market data management
ANALYSIS & VISUALIZATION
Security
Operations
Governance
&Integration
°1 ° ° ° ° ° ° °
° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° N
YARN : Data Operating System
Script SQL NoSQL Stream Search Others
HDFS
(Hadoop Distributed File System)
In-Mem
TRADITIONAL SOURCES
EDW
OLAP Datamarts
Column
Databases
CRM
RDBMS
LENDING MARKETS TRADES COMPLIANCE DATA
CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA
EMERGING & NON-TRADITIONAL SOURCES
SERVER LOGS CALL CENTER EMAILS
WORD
DOCUMENTS
LOCATION DATA SENSOR DATA
CUSTOMER
SENTIMENT
RESEARCH
REPORTS
1
1 Knox
2 Kerberos
3 Ranger
4 HDFS Enc.
5 LDAP
2
3
4
5
10
• Authentication ->
• Authorization ->
• Audit ->
• Data Protection ->
• Centralized Administration ->
5 Pillars of Security
11
Knox
12
Why Knox?
Simplified Access
• Kerberos encapsulation
• Extends API reach
• Single access point
• Multi-cluster support
• Single SSL certificate
Centralized Control
• Central REST API auditing
• Service-level authorization
• Alternative to SSH “edge node”
Enterprise Integration
• LDAP integration
• Active Directory integration
• SSO integration
• Apache Shiro extensibility
• Custom extensibility
Enhanced Security
• Protect network details
• Partial SSL for non-SSL services
• WebApp vulnerability filter
13
Knox Deployment with Hadoop Cluster
Application Tier
DMZ
Switch Switch
….
Master Nodes
Rack 1
Switch
NN
SNN
….
Slave Nodes
Rack 2
….
Slave Nodes
Rack N
SwitchSwitch
DN DN
Web Tier
LB
Knox
Hadoop CLIs
14
REST API
Hadoop
Services
What does Perimeter Security really mean?
Gateway
Firewall
User
Firewall
required at
perimeter
(today)
Knox Gateway
controls all
Hadoop REST API
access through
firewall
Hadoop
cluster
mostly
unaffected
Firewall only allows
connections
through specific
ports from Knox
host
Hive Host
HBase Host
WebHDFS
HBase Host
HBase Host
15
Kerberos
16
Why Kerberos?
Provides Strong Authentication
Establishes identity for users, services and hosts
Prevents impersonation on unauthorized account
Supports token delegation model
Works with existing directory services
Basis for Authorization
Page 16
17
Don’t be afraid of Kerberos…..
18
Security Implications
$ whoami
baduser
$ hadoop fs -ls /tmp
Found 2 items
drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive
drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure
$ hadoop fs -ls /tmp/secure
ls: Permission denied: user=baduser, access=READ_EXECUTE,
inode="/tmp/secure":hdfs:hdfs:drwx------
Good right?
19
Security Implications
$ whoami
baduser
$ hadoop fs -ls /tmp
Found 2 items
drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive
drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure
$ hadoop fs -ls /tmp/secure
ls: Permission denied: user=baduser, access=READ_EXECUTE,
inode="/tmp/secure":hdfs:hdfs:drwx------
Good right? – Look Again!!!
$ HADOOP_USER_NAME=hdfs hadoop fs -ls /tmp/secure
Found 1 items
drwxr-xr-x - hdfs hdfs 0 2015-07-14 20:35 /tmp/secure/blah
20
Kerberos Primer
Page 20
Client
KDC
NN
DN
1. kinit - Login and get Ticket Granting Ticket (TGT)
3. Get NameNode Service Ticket (NN-ST)
2. Client Stores TGT in Ticket Cache
4. Client Stores NN-ST in Ticket Cache
5. Read/write file given NN-ST and
file name; returns block locations,
block IDs and Block Access Tokens
if access permitted
6. Read/write block given
Block Access Token and block ID
Client’s
Kerberos Ticket
Cache
21
Ranger
22
Plugin PluginPlugin PluginPlugin Plugin
Apache Ranger authZ Architecture
Hive YARN Knox Storm Solr Kafka
Plugin
HDFS
Plugin
Audit Server Policy Server
Administration Portal
REST APIs
DB
SOLR
HDFS
KMS
LDAP/AD
user/group
syncLog4j
HBase
23
Sample Simplified Workflow - HDFS
Policy
Manager
Plugin
Admin sets policies for HDFS
files/folder
Data scientist runs a
map reduce job
User
Application
Users access HDFS data through
application Name Node
IT users access
HDFS through CLI
Namenode uses
Plugin for
Authorization
Audit
Database Audit logs pushed to DB
Namenode provides
resource access to
user/client
1
2
2
2
3
4
5
24
Ranger Stacks
• Apache Ranger v0.5 supports stack-model to enable easier onboarding of new
components, without requiring code changes in Apache Ranger.
Ranger Side Changes
Define Service-type
Secured Components Side Changes
Develop Ranger Authorization Plugin
• Create a JSON file with following
details :
- Resources
- Access types
- Config to connect
• Load the JSON into Ranger.
• Include plugin library in the secure component.
• During initialization of the service: Init RangerBasePlugIn &
RangerDefaultAuditHandler class.
• To authorize access to a resource: Use
RangerAccessRequest.isAccessAllowed()
• To support resource lookup: Implement
RangerBaseService.lookupResource() &
RangerBaseService.validateConfig()
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
25
HDFS Encryption
26
Data Protection
Hadoop allows you to apply data protection policy at
two different layers across the Hadoop stack
Layer What? How ?
Storage Encrypt data in disk
Volume level: LUKS (Linux), BitLocker (Windows)
Native in Hadoop: HDFS Encryption
Partners: Voltage, Protegrity, DataGuise, Vormetric
OS level encrypt
Transmission Encrypt data as it moves
Native in Hadoop: SSL & SASL
AES 256 for SSL & DTP with SASL
27
Data at rest Encryption Protection
Volume Level Encryption (Open Source - LUKS, DMCrypt)
OS File Level Encryption (Open Source - eCryptfs)
Hadoop Level Encryption (HDFS TDE*, Hive CLE**, HBase** )
28
1
°
°
°
°
° °
° °
° °
° °
° N°
HDFS Encryption – How it works
DATA ACCESS
DATA MANAGEMENT
1 ° ° ° ° °
° ° ° ° ° °
° ° ° ° ° °
SECURITY
YARN
HDFS Client
° ° ° ° ° °
° ° ° ° ° °
° °
° °
° °
° °
°HDFS
(Hadoop Distributed File System)
Encryption Zone
(attributes - EZKey ID, version)
HDFS-6134
Encrypted File
(attributes - EDEK, IV)
Name Node
KeyProvider
API
KeyProvider
API
Key Management
System (KMS)
Hadoop-10433
KeyProvider API –
Hadoop-10141
EDEK
DEK
Crypto Stream
(r/w with DEK)
DEKs EZKs
Acronym Description
EZ Encryption Zone (an HDFS directory)
EZK Encryption Zone Key; master key associated with all
files in an EZ
DEK Data Encryption Key, unique key associated with each
file. EZ Key used to generate DEK
EDEK Encrypted DEK, Name Node only has access to
encrypted DEK.
IV Initialization Vector
EDEK
EDEK
29
As HDFS
Admin
HDFS Encryption – Common Commands
• Run KMS Server
– ./kms.sh run
• Create Encryption Key
– hadoop key create key1 -size 128
– # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file.
• List all Encryption Keys
– hadoop key list –metadata
• As an Admin(hdfs user) create an encryption Zone
– hdfs crypto -createZone -keyName key1 -path /secure1
– Point to an existing & empty directory
• List all Encryption Zones
– hdfs crypto –listZones
• Read/Write to HDFS unchanged
– hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1
– hdfs dfs -cat /securehive/sal.txt
Run this as user not in HDFS admin role
As HDFS
End-user
30
Encrypting Data In-Motion
Page 30
Protocol Communication Point Encryption Mechanism
• REST • WebHDFS (Client to Cluster)
• Client to Knox
• REST over SSL
• Knox Gateway SSL
• SPNEGO - provides a mechanism for extending Kerberos to
Web applications through the standard HTTP protocol
• HTTP • NameNode/JobTracker UI
• MapReduce Shuffle
• HTTPS
• Encrypted MapReduce Shuffle (MAPREDUCE-4117)
• RPC • Hadoop Client (Client to
Cluster, Intra-Cluster)
• SASL – The Hadoop RPC system implements SASL which
provides different QoP including encryption
• JDBC/ODBC • HiveServer2 • SSL
• TCP/IP • Data Transfer (Client to
Cluster, Intra-Cluster)
• Encrypted DataTransfer Protocol available in Hadoop
• Adding SASL support to the DataTransferProtocol
Real-world Implementation
32
Data Sources
Data
Sources
33
Thank You !

More Related Content

What's hot (20)

PPTX
Big Data Technology Stack : Nutshell
Khalid Imran
 
PPTX
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
 
PPTX
Oracle GoldenGate Performance Tuning
Bobby Curtis
 
PPTX
Hive 3 - a new horizon
Thejas Nair
 
PDF
How to operate MySQL InnoDB Cluster with MySQL Shell
Frederic Descamps
 
PDF
PostgreSQL on AWS: Tips & Tricks (and horror stories)
Alexander Kukushkin
 
PPTX
Apache Ranger
Rommel Garcia
 
PDF
Backup and recovery in oracle
sadegh salehi
 
PPTX
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
PDF
Oracle RAC 19c: Best Practices and Secret Internals
Anil Nair
 
PPTX
Hadoop security
Shivaji Dutta
 
PPTX
When is MyRocks good?
Alkin Tezuysal
 
PDF
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
PDF
Building Robust ETL Pipelines with Apache Spark
Databricks
 
PDF
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Markus Michalewicz
 
PPTX
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
PPTX
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
PDF
Understanding oracle rac internals part 1 - slides
Mohamed Farouk
 
PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
PPTX
Securing Hadoop with Apache Ranger
DataWorks Summit
 
Big Data Technology Stack : Nutshell
Khalid Imran
 
Apache Phoenix + Apache HBase
DataWorks Summit/Hadoop Summit
 
Oracle GoldenGate Performance Tuning
Bobby Curtis
 
Hive 3 - a new horizon
Thejas Nair
 
How to operate MySQL InnoDB Cluster with MySQL Shell
Frederic Descamps
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
Alexander Kukushkin
 
Apache Ranger
Rommel Garcia
 
Backup and recovery in oracle
sadegh salehi
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Oracle RAC 19c: Best Practices and Secret Internals
Anil Nair
 
Hadoop security
Shivaji Dutta
 
When is MyRocks good?
Alkin Tezuysal
 
Make Your Application “Oracle RAC Ready” & Test For It
Markus Michalewicz
 
Building Robust ETL Pipelines with Apache Spark
Databricks
 
The Top 5 Reasons to Deploy Your Applications on Oracle RAC
Markus Michalewicz
 
Security and Data Governance using Apache Ranger and Apache Atlas
DataWorks Summit/Hadoop Summit
 
MySQL Slow Query log Monitoring using Beats & ELK
YoungHeon (Roy) Kim
 
Understanding oracle rac internals part 1 - slides
Mohamed Farouk
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Securing Hadoop with Apache Ranger
DataWorks Summit
 

Viewers also liked (20)

PPTX
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
 
PDF
Hadoop Security
Suresh Mandava
 
PPTX
Hadoop Operations
Cloudera, Inc.
 
PDF
Hadoop Security
Timothy Spann
 
PDF
Big Data Security with Hadoop
Cloudera, Inc.
 
PPTX
Classification based security in Hadoop
Madhan Neethiraj
 
PPT
Hadoop Security Architecture
Owen O'Malley
 
PDF
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
In-Memory Computing Summit
 
PDF
PCI Security Standards on Big Data Platform (1)
Chris Cheng-Hsun Lin
 
PPTX
Big Data and Cyber Security
Napier University
 
PDF
Small intro to Big Data - Old version
SoftwareMill
 
PPTX
Open source big data landscape and possible ITS applications
SoftwareMill
 
PDF
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
Denodo
 
PPTX
Geek Sync | Understanding Oracle Database Security
IDERA Software
 
PDF
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 
PPTX
Atlas and ranger epam meetup
Alex Zeltov
 
PDF
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
PPTX
Introduction Apache Kafka
Joe Stein
 
PPTX
Design Patterns for working with Fast Data in Kafka
Ian Downard
 
PPTX
Developing with the Go client for Apache Kafka
Joe Stein
 
Hadoop Security in Big-Data-as-a-Service Deployments - Presented at Hadoop Su...
Abhiraj Butala
 
Hadoop Security
Suresh Mandava
 
Hadoop Operations
Cloudera, Inc.
 
Hadoop Security
Timothy Spann
 
Big Data Security with Hadoop
Cloudera, Inc.
 
Classification based security in Hadoop
Madhan Neethiraj
 
Hadoop Security Architecture
Owen O'Malley
 
IMCSummit 2015 - Day 1 IT Business Track - Designing a Big Data Analytics Pla...
In-Memory Computing Summit
 
PCI Security Standards on Big Data Platform (1)
Chris Cheng-Hsun Lin
 
Big Data and Cyber Security
Napier University
 
Small intro to Big Data - Old version
SoftwareMill
 
Open source big data landscape and possible ITS applications
SoftwareMill
 
Denodo Data Virtualization Platform: Security (session 5 from Architect to Ar...
Denodo
 
Geek Sync | Understanding Oracle Database Security
IDERA Software
 
Hadoop Security and Compliance - StampedeCon 2016
StampedeCon
 
Atlas and ranger epam meetup
Alex Zeltov
 
Hortonworks Protegrity Webinar: Leverage Security in Hadoop Without Sacrifici...
Hortonworks
 
Introduction Apache Kafka
Joe Stein
 
Design Patterns for working with Fast Data in Kafka
Ian Downard
 
Developing with the Go client for Apache Kafka
Joe Stein
 
Ad

Similar to Open Source Security Tools for Big Data (20)

PDF
BigData Security - A Point of View
Karan Alang
 
PPTX
Improvements in Hadoop Security
Chris Nauroth
 
PPTX
Saving the elephant—now, not later
DataWorks Summit
 
PPTX
Improvements in Hadoop Security
DataWorks Summit
 
PPTX
Hdp security overview
Hortonworks
 
PPTX
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
PPTX
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop & Security - Past, Present, Future
Uwe Printz
 
PDF
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
PPTX
Hadoop Security Today and Tomorrow
DataWorks Summit
 
PDF
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
PPTX
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
PPTX
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
PPTX
Big data security
Joey Echeverria
 
PPTX
Securing the Hadoop Ecosystem
DataWorks Summit
 
PDF
Practical Hadoop Security 1st ed. Edition Lakhe
kovachvidar
 
PPTX
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Luan Moreno Medeiros Maciel
 
PDF
2014 sept 4_hadoop_security
Adam Muise
 
PDF
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Big Data Spain
 
PPTX
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
BigData Security - A Point of View
Karan Alang
 
Improvements in Hadoop Security
Chris Nauroth
 
Saving the elephant—now, not later
DataWorks Summit
 
Improvements in Hadoop Security
DataWorks Summit
 
Hdp security overview
Hortonworks
 
Curb Your Insecurity - Tips for a Secure Cluster (with Spark too)!!
Pardeep Kumar Mishra (Big Data / Hadoop Consultant)
 
Curb your insecurity with HDP
DataWorks Summit/Hadoop Summit
 
Hadoop & Security - Past, Present, Future
Uwe Printz
 
Curb your insecurity with HDP - Tips for a Secure Cluster
ahortonworks
 
Hadoop Security Today and Tomorrow
DataWorks Summit
 
Technical tips for secure Apache Hadoop cluster #ApacheConAsia #ApacheCon
Yahoo!デベロッパーネットワーク
 
Treat your enterprise data lake indigestion: Enterprise ready security and go...
DataWorks Summit
 
Hadoop Security Today & Tomorrow with Apache Knox
Vinay Shukla
 
Big data security
Joey Echeverria
 
Securing the Hadoop Ecosystem
DataWorks Summit
 
Practical Hadoop Security 1st ed. Edition Lakhe
kovachvidar
 
Big Data Security on Microsoft Azure - HDInsight and HortonWorks
Luan Moreno Medeiros Maciel
 
2014 sept 4_hadoop_security
Adam Muise
 
Keeping your Enterprise’s Big Data Secure by Owen O’Malley at Big Data Spain ...
Big Data Spain
 
Securing Hadoop in an Enterprise Context
DataWorks Summit/Hadoop Summit
 
Ad

More from Rommel Garcia (11)

PPTX
The of Operational Analytics Data Store
Rommel Garcia
 
PDF
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Rommel Garcia
 
PDF
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
Rommel Garcia
 
PPTX
GPU 101: The Beast In Data Centers
Rommel Garcia
 
PDF
PCI Compliane With Hadoop
Rommel Garcia
 
PDF
Virtualizing Hadoop
Rommel Garcia
 
PPTX
Hadoop Meets Scrum
Rommel Garcia
 
PPTX
Realtime analytics + hadoop 2.0
Rommel Garcia
 
PPTX
Interactive query in hadoop
Rommel Garcia
 
PPTX
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
 
PPT
Hadoop 1.x vs 2
Rommel Garcia
 
The of Operational Analytics Data Store
Rommel Garcia
 
Apache Druid: The Foundation of Fortune 500 “Analytical Decision-Making"
Rommel Garcia
 
What does Netflix, NTT and Rubicon Project have in common? Apache Druid.
Rommel Garcia
 
GPU 101: The Beast In Data Centers
Rommel Garcia
 
PCI Compliane With Hadoop
Rommel Garcia
 
Virtualizing Hadoop
Rommel Garcia
 
Hadoop Meets Scrum
Rommel Garcia
 
Realtime analytics + hadoop 2.0
Rommel Garcia
 
Interactive query in hadoop
Rommel Garcia
 
YARN - Presented At Dallas Hadoop User Group
Rommel Garcia
 
Hadoop 1.x vs 2
Rommel Garcia
 

Recently uploaded (20)

PDF
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
PDF
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
PDF
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
PDF
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PDF
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
PPTX
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
PDF
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
PDF
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
PDF
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
PPTX
Human Resources Information System (HRIS)
Amity University, Patna
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
PDF
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
PPTX
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
PPTX
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
PDF
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
PPTX
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
PPTX
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 
Driver Easy Pro 6.1.1 Crack Licensce key 2025 FREE
utfefguu
 
MiniTool Partition Wizard Free Crack + Full Free Download 2025
bashirkhan333g
 
유니티에서 Burst Compiler+ThreadedJobs+SIMD 적용사례
Seongdae Kim
 
vMix Pro 28.0.0.42 Download vMix Registration key Bundle
kulindacore
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
HiHelloHR – Simplify HR Operations for Modern Workplaces
HiHelloHR
 
Hardware(Central Processing Unit ) CU and ALU
RizwanaKalsoom2
 
Automate Cybersecurity Tasks with Python
VICTOR MAESTRE RAMIREZ
 
Top Agile Project Management Tools for Teams in 2025
Orangescrum
 
Wondershare PDFelement Pro Crack for MacOS New Version Latest 2025
bashirkhan333g
 
Human Resources Information System (HRIS)
Amity University, Patna
 
Tally software_Introduction_Presentation
AditiBansal54083
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Alarm in Android-Scheduling Timed Tasks Using AlarmManager in Android.pdf
Nabin Dhakal
 
IDM Crack with Internet Download Manager 6.42 Build 43 with Patch Latest 2025
bashirkhan333g
 
Empowering Asian Contributions: The Rise of Regional User Groups in Open Sour...
Shane Coughlan
 
Agentic Automation Journey Session 1/5: Context Grounding and Autopilot for E...
klpathrudu
 
Download Canva Pro 2025 PC Crack Full Latest Version
bashirkhan333g
 
Transforming Mining & Engineering Operations with Odoo ERP | Streamline Proje...
SatishKumar2651
 
Change Common Properties in IBM SPSS Statistics Version 31.pptx
Version 1 Analytics
 

Open Source Security Tools for Big Data

  • 1. 1 Open Source Security Tools For Big Data Rommel Garcia @rommelgarcia Hortonworks
  • 2. 2 # whoami  Global Security SME Lead @hortonworks  Senior Solutions Engineer @hortonworks  Book Author – Virtualizing Hadoop  Co-organizer of Atlanta Hadoop User Group  Regular Speaker at Big Data Conferences
  • 4. 4 DATA – More Volume and More Types I N C R E A S I N G D ATA V A R I E T Y A N D C O M P L E X I T Y USER GENERATED CONTENT MOBILE WEB SMS/MMS SENTIMENT EXTERNAL DEMOGRAPHICS HD VIDEO SPEECH TO TEXT PRODUCT/ SERVICE LOGS SOCIAL NETWORK BUSINESS DATA FEEDS USER CLICK STREAM WEB LOGS OFFER HISTORY DYNAMIC PRICING A/B TESTING AFFILIATE NETWORKS SEARCH MARKETING BEHAVIORAL TARGETING DYNAMIC FUNNELSPAYMENT RECORD SUPPORT CONTACTS CUSTOMER TOUCHESPURCHASE DETAIL PURCHASE RECORD SEGMENTATIONOFFER DETAILS P E TA BY T E S T E R A BY T E S G I G A BY T E S E X A BY T E S E R P BIG DATA WEB CR M
  • 5. 5 Big Data Ecosystem Big Data Platform DATA REPOSITORIES Risk modeling Fraud detection Compliance (AML, KYC) Bank 3.0 Information security Single view of customer Trading applications Market data management ANALYSIS & VISUALIZATION Security Operations Governance &Integration °1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N YARN : Data Operating System Script SQL NoSQL Stream Search Others HDFS (Hadoop Distributed File System) In-Mem TRADITIONAL SOURCES EDW OLAP Datamarts Column Databases CRM RDBMS LENDING MARKETS TRADES COMPLIANCE DATA CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA EMERGING & NON-TRADITIONAL SOURCES SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT RESEARCH REPORTS
  • 6. 6 • HIPAA - Health Insurance Portability and Accountability Act of 1996 • HITECH - The Health Information Technology for Economic and Clinical Health Act • PCI DSS - Payment Card Industry Data Security Standard • SOX - The Sarbanes-Oxley Act of 2003 • ISO - International Organization Standardization • COBIT - Control Objectives for Information and Related Technology • Corporate Security Policies Compliance Adherences
  • 8. 8 • Authentication • Authorization • Audit • Data at rest/in-motion Encryption • Centralized Administration 5 Pillars of Security
  • 9. 9 Big Data Ecosystem Big Data Platform DATA REPOSITORIES Risk modeling Fraud detection Compliance (AML, KYC) Bank 3.0 Information security Single view of customer Trading applications Market data management ANALYSIS & VISUALIZATION Security Operations Governance &Integration °1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° N YARN : Data Operating System Script SQL NoSQL Stream Search Others HDFS (Hadoop Distributed File System) In-Mem TRADITIONAL SOURCES EDW OLAP Datamarts Column Databases CRM RDBMS LENDING MARKETS TRADES COMPLIANCE DATA CREDIT CARD CASH & EQUITY FINANCE & GL RISK DATA EMERGING & NON-TRADITIONAL SOURCES SERVER LOGS CALL CENTER EMAILS WORD DOCUMENTS LOCATION DATA SENSOR DATA CUSTOMER SENTIMENT RESEARCH REPORTS 1 1 Knox 2 Kerberos 3 Ranger 4 HDFS Enc. 5 LDAP 2 3 4 5
  • 10. 10 • Authentication -> • Authorization -> • Audit -> • Data Protection -> • Centralized Administration -> 5 Pillars of Security
  • 12. 12 Why Knox? Simplified Access • Kerberos encapsulation • Extends API reach • Single access point • Multi-cluster support • Single SSL certificate Centralized Control • Central REST API auditing • Service-level authorization • Alternative to SSH “edge node” Enterprise Integration • LDAP integration • Active Directory integration • SSO integration • Apache Shiro extensibility • Custom extensibility Enhanced Security • Protect network details • Partial SSL for non-SSL services • WebApp vulnerability filter
  • 13. 13 Knox Deployment with Hadoop Cluster Application Tier DMZ Switch Switch …. Master Nodes Rack 1 Switch NN SNN …. Slave Nodes Rack 2 …. Slave Nodes Rack N SwitchSwitch DN DN Web Tier LB Knox Hadoop CLIs
  • 14. 14 REST API Hadoop Services What does Perimeter Security really mean? Gateway Firewall User Firewall required at perimeter (today) Knox Gateway controls all Hadoop REST API access through firewall Hadoop cluster mostly unaffected Firewall only allows connections through specific ports from Knox host Hive Host HBase Host WebHDFS HBase Host HBase Host
  • 16. 16 Why Kerberos? Provides Strong Authentication Establishes identity for users, services and hosts Prevents impersonation on unauthorized account Supports token delegation model Works with existing directory services Basis for Authorization Page 16
  • 17. 17 Don’t be afraid of Kerberos…..
  • 18. 18 Security Implications $ whoami baduser $ hadoop fs -ls /tmp Found 2 items drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure $ hadoop fs -ls /tmp/secure ls: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------ Good right?
  • 19. 19 Security Implications $ whoami baduser $ hadoop fs -ls /tmp Found 2 items drwx-wx-wx - ambari-qa hdfs 0 2015-07-14 18:38 /tmp/hive drwx------ - hdfs hdfs 0 2015-07-14 20:33 /tmp/secure $ hadoop fs -ls /tmp/secure ls: Permission denied: user=baduser, access=READ_EXECUTE, inode="/tmp/secure":hdfs:hdfs:drwx------ Good right? – Look Again!!! $ HADOOP_USER_NAME=hdfs hadoop fs -ls /tmp/secure Found 1 items drwxr-xr-x - hdfs hdfs 0 2015-07-14 20:35 /tmp/secure/blah
  • 20. 20 Kerberos Primer Page 20 Client KDC NN DN 1. kinit - Login and get Ticket Granting Ticket (TGT) 3. Get NameNode Service Ticket (NN-ST) 2. Client Stores TGT in Ticket Cache 4. Client Stores NN-ST in Ticket Cache 5. Read/write file given NN-ST and file name; returns block locations, block IDs and Block Access Tokens if access permitted 6. Read/write block given Block Access Token and block ID Client’s Kerberos Ticket Cache
  • 22. 22 Plugin PluginPlugin PluginPlugin Plugin Apache Ranger authZ Architecture Hive YARN Knox Storm Solr Kafka Plugin HDFS Plugin Audit Server Policy Server Administration Portal REST APIs DB SOLR HDFS KMS LDAP/AD user/group syncLog4j HBase
  • 23. 23 Sample Simplified Workflow - HDFS Policy Manager Plugin Admin sets policies for HDFS files/folder Data scientist runs a map reduce job User Application Users access HDFS data through application Name Node IT users access HDFS through CLI Namenode uses Plugin for Authorization Audit Database Audit logs pushed to DB Namenode provides resource access to user/client 1 2 2 2 3 4 5
  • 24. 24 Ranger Stacks • Apache Ranger v0.5 supports stack-model to enable easier onboarding of new components, without requiring code changes in Apache Ranger. Ranger Side Changes Define Service-type Secured Components Side Changes Develop Ranger Authorization Plugin • Create a JSON file with following details : - Resources - Access types - Config to connect • Load the JSON into Ranger. • Include plugin library in the secure component. • During initialization of the service: Init RangerBasePlugIn & RangerDefaultAuditHandler class. • To authorize access to a resource: Use RangerAccessRequest.isAccessAllowed() • To support resource lookup: Implement RangerBaseService.lookupResource() & RangerBaseService.validateConfig() https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
  • 26. 26 Data Protection Hadoop allows you to apply data protection policy at two different layers across the Hadoop stack Layer What? How ? Storage Encrypt data in disk Volume level: LUKS (Linux), BitLocker (Windows) Native in Hadoop: HDFS Encryption Partners: Voltage, Protegrity, DataGuise, Vormetric OS level encrypt Transmission Encrypt data as it moves Native in Hadoop: SSL & SASL AES 256 for SSL & DTP with SASL
  • 27. 27 Data at rest Encryption Protection Volume Level Encryption (Open Source - LUKS, DMCrypt) OS File Level Encryption (Open Source - eCryptfs) Hadoop Level Encryption (HDFS TDE*, Hive CLE**, HBase** )
  • 28. 28 1 ° ° ° ° ° ° ° ° ° ° ° ° ° N° HDFS Encryption – How it works DATA ACCESS DATA MANAGEMENT 1 ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° SECURITY YARN HDFS Client ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° ° °HDFS (Hadoop Distributed File System) Encryption Zone (attributes - EZKey ID, version) HDFS-6134 Encrypted File (attributes - EDEK, IV) Name Node KeyProvider API KeyProvider API Key Management System (KMS) Hadoop-10433 KeyProvider API – Hadoop-10141 EDEK DEK Crypto Stream (r/w with DEK) DEKs EZKs Acronym Description EZ Encryption Zone (an HDFS directory) EZK Encryption Zone Key; master key associated with all files in an EZ DEK Data Encryption Key, unique key associated with each file. EZ Key used to generate DEK EDEK Encrypted DEK, Name Node only has access to encrypted DEK. IV Initialization Vector EDEK EDEK
  • 29. 29 As HDFS Admin HDFS Encryption – Common Commands • Run KMS Server – ./kms.sh run • Create Encryption Key – hadoop key create key1 -size 128 – # Key size can be 128, 192 or 256. 256 requires unlimited strength JCE file. • List all Encryption Keys – hadoop key list –metadata • As an Admin(hdfs user) create an encryption Zone – hdfs crypto -createZone -keyName key1 -path /secure1 – Point to an existing & empty directory • List all Encryption Zones – hdfs crypto –listZones • Read/Write to HDFS unchanged – hdfs dfs -copyFromLocal /tmp/vinay.txt /secure1 – hdfs dfs -cat /securehive/sal.txt Run this as user not in HDFS admin role As HDFS End-user
  • 30. 30 Encrypting Data In-Motion Page 30 Protocol Communication Point Encryption Mechanism • REST • WebHDFS (Client to Cluster) • Client to Knox • REST over SSL • Knox Gateway SSL • SPNEGO - provides a mechanism for extending Kerberos to Web applications through the standard HTTP protocol • HTTP • NameNode/JobTracker UI • MapReduce Shuffle • HTTPS • Encrypted MapReduce Shuffle (MAPREDUCE-4117) • RPC • Hadoop Client (Client to Cluster, Intra-Cluster) • SASL – The Hadoop RPC system implements SASL which provides different QoP including encryption • JDBC/ODBC • HiveServer2 • SSL • TCP/IP • Data Transfer (Client to Cluster, Intra-Cluster) • Encrypted DataTransfer Protocol available in Hadoop • Adding SASL support to the DataTransferProtocol