SlideShare a Scribd company logo
1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only.
AUTOMATIC DETECTION, CLASSIFICATION, AND
AUTHORIZATION OF SENSITIVE PERSONAL DATA
IMPACTED BY GDPR
Srikanth Venkat – Senior Director, Product Management, Hortonworks
Subra Ramesh – VP, Products & Engineering, Dataguise
2
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Agenda
2
 GDPR Overview
 GDPR Personal Data – what it requires
 GDPR – Controller vs. Processor Requirements
 Addressing GDPR requirements
– DgSecure: Detection, Element-level Protection, Monitoring
– Hortoworks HDP: Apache Ranger (Security & Privacy)and Apache Atlas
(Data Inventory/Classification)
 Integration of DgSecure Detection with Atlas-Ranger for
Automatic Authorization Control over GDPR Personal Data
 Demo
3
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
General Data Protection Regulation
3
Framework for the digital transformation economy
–Data = business asset, new currency, innovation accelerator
–Personal data leveraged throughout connected ecosystems
GDPR harmonizes and extends EU Data Protection Directive
95/46/EC
Expands the definition of protected data
Expands data subject rights
4
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Overview of GDPR Framework
Data Protection Authority
(supervising authority)
Data Controller
(organisations)
Data Subject
(individuals)
Data
Processor
Third
Countries
Third
Parties
Duties
Rights
Inform?
Disclosure?
Is Data Handling
Secure ?
Guarantees?
Advisory and
Enforcement
European Data Protection Board
(consistency mechanism) EU Courts National Courts
Complaint/
Resolution
5
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
GDPR Data Privacy
5
Sources:
1. ec.europa.eu/justice/data-
protection/reform/files/regulation_oj_en.pdf
2. http://www.consilium.europa.eu/en/infographics/data-protection-
regulation-infographics/
7
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Rights & Obligations under GDPR
7
 Controller Obligations
– Clear Consent
– Clear Detailed Privacy notices
– Breach Notification (72 hours)
– Appointment of Data Protection Officer (250+, or high risk processing)
– Privacy by Design & Other considerations
―Lawful basis, Fair processing, & Specify Purposes
―Adequate, relevant, not excessive
―Data Accuracy, Retention, and Appropriate Security
– International Transfer adequacy
8
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Rights & Obligations under GDPR
8
 Individual Rights
– Access to data
– Remedy from supervisory body/court
―Compensation for Damage
―Compensation for Distress
―Rectification
– Objection (for direct marketing)
– Erasure (right to be forgotten)
– Data Portability
– Restrict data processing (put on hold)
– Automated decisions and profiling
9
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Broad Scope of GDPR
9
NOT ONLY data controllers or processors that are within the
European Union
BUT ALSO
–ANY processing of ANY personal data belonging to EU citizens
when the processing relates to the offering of goods or services,
or monitoring behavior that takes place within the EU
Source: ec.europa.eu/justice/data-
protection/reform/files/regulation_oj_en.pdf
10
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
⬢ Comprehensive coverage across Hadoop
ecosystem components
⬢ Plugins for components resident with
component
⬢ Extensible Plugin Model: plugin for
authorizing other sources can be built
Apache Ranger: Comprehensive Extensible Authorization
11
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
⬢ Simple Intuitive UI for Policy Editing and
Setup
⬢ Fine-grained specificity by resource type,
user context, tags, and operation
⬢ Supports Access, Tag Based, Dynamic Data
Masking, and Row Filtering Policy Types
Apache Ranger - Intuitive and Granular Policy Management
12
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Apache Ranger Audits - Data Access
⬢ Comprehensive scalable audit logging
⬢ Audits for:
⬢ Resource Access Events with user context
⬢ Policy Edits/Creation/Deletion
⬢ User session information
⬢ Component plugin policy sync operations
13
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
STRUCTURED
Atlas: Metadata Truth in Hadoop
TRADITIONAL
RDBMS
METADATA
MPP
APPLIANCES
Kafka Storm
Sqoop
Hive
ATLAS
METADATA
Falcon
RANGER
Custom
Partners
Metadata-driven governance services for Hadoop and
enterprise big data ecosystems
Data Lineage/Provenance
 Along the entire data lifecycle with integrated Cross
component lineage
Data Classification
 Supports classification of data assets using tags (e.g. PII,
PHI, PCI etc.) and attributes
Metadata Catalog Search
 Free text search on metadata
 Advanced search using DSL
Integrations
across the Hadoop ecosystem, through a common metadata
store
 Free text search on metadata
 OOtB real-time metadata and lineage ingestion with Hive,
Sqoop, Storm/Kafka
 APIs for custom metadata ingestion
 Apache Ranger integration for classification based
security
Key Benefits:
Modern Data Lakes need new ways to
govern because:
• Cost – Traditional staff ratio to data size not possible
• Diversity – Only way to manage velocity of new datasets
• Agility – Quick change based on tags / taxonomy
14
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
HDP – Security & Governance
Classification
Prohibition
Time
Location
Policies
PDP
Resource
Cache
Ranger
Manage Access Policies
and Audit Logs
Track Metadata
and Lineage
Atlas Client
Subscribers
to Topic
Gets Metadata
Updates
Atlas
Metastore
Tags
Assets
Entitles
Streams
Pipelines
Feeds
Hive
Tables
HDFS
Files
HBase
Tables
Entities
in Data
Lake
Industry First: Dynamic Tag-based Security Policies
15
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Dataguise: Company Background
Pioneers of Hadoop
Data Protection
2011-2013
Magic Quadrant
“Visionary” for Data
Masking
2015
Recommended for
Data-Centric
Security
2015
Recommended for
Protecting Big Data
in Hadoop
2015
2007-2010
“Breakthrough” Masking Technology
2014
The “Essential”
Solution for Data
Protection
in Hadoop
Cloud Platform
Coverage
2016
2017
Gartner Market
Guide for Data
Masking
2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
16
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DGSECURE PRODUCT
16
17
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Operation Sequence
Define the
Policy
Discover the
Sensitive Data
Secure
Data
Monitor and
Reporting
18
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Visualization: Enterprise-wide Data Security Posture
18
19
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Enable Access Control based on Sensitivity Classification
19
 Set up DgSecure to run on periodic basis to scan for sensitive data and generate
classification information
– DgSecure will continuously update Atlas with Tags as and when it find sensitive information.
 Set up Ranger Policies based on Sensitive Tags
 Ranger Policies will kick in at the time any user tries to access the data, for example,
in a Hive Query
20
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure – Atlas/Ranger Integration Flow
20
DgSecure Detection
Atlas Populated with
Sensitivity Tags
Ranger Policies
based on tags
Access Control based
on Sensitivity
21
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Integration with Atlas/Ranger
21
DgSECURE
DgSecure
Repository
Detection
DATA STORE
Hadoop, Hive, S3,
Blob Storage
ATLAS RANGER
Atlas Tags
ACL
Enforcement
Data Store (Hadoop, Hive, S3, Blob Storage)
22
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Demo – DgSecure + Atlas/Ranger
23
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Key Takeaways: DgSecure + HDP can help with GDPR
 Detection of Sensitive Data
– Structured, Unstructured Data, Context Information used, Machine Learning capabilities
 Protection of Sensitive Data at Element Level
– Masking or Encryption options in Hadoop
– At Rest Protection (Masking or Encryption)
 Monitoring – Raise Alerts on (Attempted) Access to Sensitive Data
– Breach Notification Requirement
 Access Control Integration
– Via Atlas/Ranger integration, Ranger Tag-Based Policies
 Reporting – Visualization of Enterprise-Level Data Exposure
24
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
ThankYou
25
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Policy
25
26
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Hive Task
26
27
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
DgSecure Detection Results (Hive)
28
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Sensitive DataTags in Atlas
28
29
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
Ranger Tag-Based Policies
29
30
© Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved
For more information check out
Check out other relevant sessions:
 Apace Atlas: Governance for your
data, 4:10p, Wednesday April 5th
2017
 Bridle Your Flying Islands And Castles
In The Sky: Built-in Governance And
Security For The Cloud, 11.30am,
Thursday April 6, 2017
 BoF sessions – Security and
Governance 5:50p, Thursday, April
6th 2017
Hortonworks
www.hortonworks.com
Dataguise
www.dataguise.com

More Related Content

What's hot (20)

PDF
Data Governance - Atlas 7.12.2015
Hortonworks
 
PPTX
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
 
PPTX
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
DataWorks Summit/Hadoop Summit
 
PPTX
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
 
PDF
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PPTX
YARN - Past, Present, & Future
DataWorks Summit
 
PPTX
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
PDF
Powering Big Data Success On-Prem and in the Cloud
Hortonworks
 
PDF
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PPTX
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
 
PDF
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
PPTX
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
PPTX
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
PDF
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
PDF
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
PPTX
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
PDF
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
PDF
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
PPTX
Apache Hadoop YARN: state of the union
DataWorks Summit
 
PDF
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 
Data Governance - Atlas 7.12.2015
Hortonworks
 
Welcome to Apache Hadoop's Teenage Years, Arun Murthy Keynote
DataWorks Summit/Hadoop Summit
 
ING- CoreIntel- Collect and Process Network Logs Across Data Centers in Real ...
DataWorks Summit/Hadoop Summit
 
The Elephant in the Clouds
DataWorks Summit/Hadoop Summit
 
Intro to Spark & Zeppelin - Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
YARN - Past, Present, & Future
DataWorks Summit
 
Apache Atlas: Governance for your Data
DataWorks Summit/Hadoop Summit
 
Powering Big Data Success On-Prem and in the Cloud
Hortonworks
 
Apache Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Spark Summit EMEA - Arun Murthy's Keynote
Hortonworks
 
Apache Hadoop Crash Course - HS16SJ
DataWorks Summit/Hadoop Summit
 
Modernise your EDW - Data Lake
DataWorks Summit/Hadoop Summit
 
Hortonworks Open Connected Data Platforms for IoT and Predictive Big Data Ana...
DataWorks Summit
 
Discover HDP 2.1: Apache Solr for Hadoop Search
Hortonworks
 
Supporting Financial Services with a More Flexible Approach to Big Data
Hortonworks
 
Security, ETL, BI & Analytics, and Software Integration
DataWorks Summit
 
Optimizing your Modern Data Architecture - with Attunity, RCG Global Services...
Hortonworks
 
Implementing a Data Lake with Enterprise Grade Data Governance
Hortonworks
 
Apache Hadoop YARN: state of the union
DataWorks Summit
 
Splunk-hortonworks-risk-management-oct-2014
Hortonworks
 

Viewers also liked (10)

PPTX
Solving Cyber at Scale
DataWorks Summit/Hadoop Summit
 
PDF
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
PPTX
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
PPTX
Big Data in Azure
DataWorks Summit/Hadoop Summit
 
PPTX
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Metron: Community Driven Cyber Security
DataWorks Summit/Hadoop Summit
 
PDF
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
DataWorks Summit
 
PPTX
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
PPTX
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Solving Cyber at Scale
DataWorks Summit/Hadoop Summit
 
MaaS (Model as a Service): Modern Streaming Data Science with Apache Metron
DataWorks Summit
 
File Format Benchmark - Avro, JSON, ORC and Parquet
DataWorks Summit/Hadoop Summit
 
Apache Hadoop YARN: Past, Present and Future
DataWorks Summit/Hadoop Summit
 
Running Services on YARN
DataWorks Summit/Hadoop Summit
 
Apache Metron: Community Driven Cyber Security
DataWorks Summit/Hadoop Summit
 
Bringing it All Together: Apache Metron (Incubating) as a Case Study of a Mod...
DataWorks Summit
 
Hadoop 3 in a Nutshell
DataWorks Summit/Hadoop Summit
 
Apache Kafka Best Practices
DataWorks Summit/Hadoop Summit
 
Ad

Similar to Automatic Detection, Classification and Authorization of Sensitive Personal Data Impacted By GDPR (20)

PDF
Hortonworks Hybrid Cloud - Putting you back in control of your data
Scott Clinton
 
PDF
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
VMUG IT
 
PDF
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
DataWorks Summit
 
PPTX
The EU General Protection Regulation and how Oracle can help
Niklas Hjorthen
 
PPTX
The Implacable advance of the data
DataWorks Summit
 
PDF
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
DataWorks Summit
 
PPTX
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
PDF
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
DataWorks Summit
 
PDF
Reinvent Your Data Management Strategy for Successful Digital Transformation
Denodo
 
PPTX
Hortonworks - IBM - Cloud Event
Thiago Santiago
 
PDF
Hortonworks - IBM Cognitive - The Future of Data Science
Thiago Santiago
 
PPTX
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 
PDF
Hortonworks and Voltage Security webinar
Hortonworks
 
PDF
GDPR/CCPA Compliance and Data Governance in Hadoop
Eyad Garelnabi
 
PPTX
#GDPR Compliance - Data Minimization via ArchivePod
Garet Keller
 
PDF
HDF 3.2 - What's New
Hortonworks
 
PDF
Big Data
Ben Duan
 
PPTX
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
PDF
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Steven Meister
 
PPTX
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Hortonworks Hybrid Cloud - Putting you back in control of your data
Scott Clinton
 
04 - VMUGIT - Lecce 2018 - Giampiero Petrosi, Rubrik
VMUG IT
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architecture
DataWorks Summit
 
The EU General Protection Regulation and how Oracle can help
Niklas Hjorthen
 
The Implacable advance of the data
DataWorks Summit
 
Running Enterprise Workloads with an open source Hybrid Cloud Data Architectu...
DataWorks Summit
 
Balancing data democratization with comprehensive information governance: bui...
DataWorks Summit
 
Running Enterprise Workloads with an Open Source Hybrid Cloud Data Architecture
DataWorks Summit
 
Reinvent Your Data Management Strategy for Successful Digital Transformation
Denodo
 
Hortonworks - IBM - Cloud Event
Thiago Santiago
 
Hortonworks - IBM Cognitive - The Future of Data Science
Thiago Santiago
 
A Tale of Two Regulations: Cross-Border Data Protection For Big Data Under GD...
DataWorks Summit/Hadoop Summit
 
Hortonworks and Voltage Security webinar
Hortonworks
 
GDPR/CCPA Compliance and Data Governance in Hadoop
Eyad Garelnabi
 
#GDPR Compliance - Data Minimization via ArchivePod
Garet Keller
 
HDF 3.2 - What's New
Hortonworks
 
Big Data
Ben Duan
 
Expand a Data warehouse with Hadoop and Big Data
jdijcks
 
Eu gdpr technical workflow and productionalization neccessary w privacy ass...
Steven Meister
 
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Scott Mitchell
 
Ad

More from DataWorks Summit/Hadoop Summit (20)

PPT
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
PPT
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
PDF
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
PDF
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
PDF
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
PDF
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
PDF
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
PPTX
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
PPTX
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
PDF
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
PPTX
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
PPTX
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
PPTX
HBase in Practice
DataWorks Summit/Hadoop Summit
 
PPTX
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
PDF
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
PPTX
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
PPTX
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 
Running Apache Spark & Apache Zeppelin in Production
DataWorks Summit/Hadoop Summit
 
State of Security: Apache Spark & Apache Zeppelin
DataWorks Summit/Hadoop Summit
 
Unleashing the Power of Apache Atlas with Apache Ranger
DataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
DataWorks Summit/Hadoop Summit
 
Revolutionize Text Mining with Spark and Zeppelin
DataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
DataWorks Summit/Hadoop Summit
 
Hadoop Crash Course
DataWorks Summit/Hadoop Summit
 
Data Science Crash Course
DataWorks Summit/Hadoop Summit
 
Apache Spark Crash Course
DataWorks Summit/Hadoop Summit
 
Dataflow with Apache NiFi
DataWorks Summit/Hadoop Summit
 
Schema Registry - Set you Data Free
DataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
DataWorks Summit/Hadoop Summit
 
How Hadoop Makes the Natixis Pack More Efficient
DataWorks Summit/Hadoop Summit
 
HBase in Practice
DataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
DataWorks Summit/Hadoop Summit
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
DataWorks Summit/Hadoop Summit
 
Backup and Disaster Recovery in Hadoop
DataWorks Summit/Hadoop Summit
 

Recently uploaded (20)

PDF
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
PDF
July Patch Tuesday
Ivanti
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
Blockchain Transactions Explained For Everyone
CIFDAQ
 
PDF
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
PDF
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
PDF
Python basic programing language for automation
DanialHabibi2
 
PPTX
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
PDF
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
PDF
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
PPTX
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 
DevBcn - Building 10x Organizations Using Modern Productivity Metrics
Justin Reock
 
July Patch Tuesday
Ivanti
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
HubSpot Main Hub: A Unified Growth Platform
Jaswinder Singh
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
Blockchain Transactions Explained For Everyone
CIFDAQ
 
CIFDAQ Weekly Market Wrap for 11th July 2025
CIFDAQ
 
From Code to Challenge: Crafting Skill-Based Games That Engage and Reward
aiyshauae
 
Python basic programing language for automation
DanialHabibi2
 
Building Search Using OpenSearch: Limitations and Workarounds
Sease
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Using FME to Develop Self-Service CAD Applications for a Major UK Police Force
Safe Software
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
Complete JavaScript Notes: From Basics to Advanced Concepts.pdf
haydendavispro
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
Fl Studio 24.2.2 Build 4597 Crack for Windows Free Download 2025
faizk77g
 
Smart Trailers 2025 Update with History and Overview
Paul Menig
 
WooCommerce Workshop: Bring Your Laptop
Laura Hartwig
 

Automatic Detection, Classification and Authorization of Sensitive Personal Data Impacted By GDPR

  • 1. 1 © Hortonworks Inc. 2011 – 2017 All Rights Reserved Hortonworks Confidential. For Internal Use Only. AUTOMATIC DETECTION, CLASSIFICATION, AND AUTHORIZATION OF SENSITIVE PERSONAL DATA IMPACTED BY GDPR Srikanth Venkat – Senior Director, Product Management, Hortonworks Subra Ramesh – VP, Products & Engineering, Dataguise
  • 2. 2 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Agenda 2  GDPR Overview  GDPR Personal Data – what it requires  GDPR – Controller vs. Processor Requirements  Addressing GDPR requirements – DgSecure: Detection, Element-level Protection, Monitoring – Hortoworks HDP: Apache Ranger (Security & Privacy)and Apache Atlas (Data Inventory/Classification)  Integration of DgSecure Detection with Atlas-Ranger for Automatic Authorization Control over GDPR Personal Data  Demo
  • 3. 3 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved General Data Protection Regulation 3 Framework for the digital transformation economy –Data = business asset, new currency, innovation accelerator –Personal data leveraged throughout connected ecosystems GDPR harmonizes and extends EU Data Protection Directive 95/46/EC Expands the definition of protected data Expands data subject rights
  • 4. 4 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Overview of GDPR Framework Data Protection Authority (supervising authority) Data Controller (organisations) Data Subject (individuals) Data Processor Third Countries Third Parties Duties Rights Inform? Disclosure? Is Data Handling Secure ? Guarantees? Advisory and Enforcement European Data Protection Board (consistency mechanism) EU Courts National Courts Complaint/ Resolution
  • 5. 5 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved GDPR Data Privacy 5 Sources: 1. ec.europa.eu/justice/data- protection/reform/files/regulation_oj_en.pdf 2. http://www.consilium.europa.eu/en/infographics/data-protection- regulation-infographics/
  • 6. 7 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Rights & Obligations under GDPR 7  Controller Obligations – Clear Consent – Clear Detailed Privacy notices – Breach Notification (72 hours) – Appointment of Data Protection Officer (250+, or high risk processing) – Privacy by Design & Other considerations ―Lawful basis, Fair processing, & Specify Purposes ―Adequate, relevant, not excessive ―Data Accuracy, Retention, and Appropriate Security – International Transfer adequacy
  • 7. 8 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Rights & Obligations under GDPR 8  Individual Rights – Access to data – Remedy from supervisory body/court ―Compensation for Damage ―Compensation for Distress ―Rectification – Objection (for direct marketing) – Erasure (right to be forgotten) – Data Portability – Restrict data processing (put on hold) – Automated decisions and profiling
  • 8. 9 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Broad Scope of GDPR 9 NOT ONLY data controllers or processors that are within the European Union BUT ALSO –ANY processing of ANY personal data belonging to EU citizens when the processing relates to the offering of goods or services, or monitoring behavior that takes place within the EU Source: ec.europa.eu/justice/data- protection/reform/files/regulation_oj_en.pdf
  • 9. 10 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved ⬢ Comprehensive coverage across Hadoop ecosystem components ⬢ Plugins for components resident with component ⬢ Extensible Plugin Model: plugin for authorizing other sources can be built Apache Ranger: Comprehensive Extensible Authorization
  • 10. 11 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved ⬢ Simple Intuitive UI for Policy Editing and Setup ⬢ Fine-grained specificity by resource type, user context, tags, and operation ⬢ Supports Access, Tag Based, Dynamic Data Masking, and Row Filtering Policy Types Apache Ranger - Intuitive and Granular Policy Management
  • 11. 12 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Apache Ranger Audits - Data Access ⬢ Comprehensive scalable audit logging ⬢ Audits for: ⬢ Resource Access Events with user context ⬢ Policy Edits/Creation/Deletion ⬢ User session information ⬢ Component plugin policy sync operations
  • 12. 13 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved STRUCTURED Atlas: Metadata Truth in Hadoop TRADITIONAL RDBMS METADATA MPP APPLIANCES Kafka Storm Sqoop Hive ATLAS METADATA Falcon RANGER Custom Partners Metadata-driven governance services for Hadoop and enterprise big data ecosystems Data Lineage/Provenance  Along the entire data lifecycle with integrated Cross component lineage Data Classification  Supports classification of data assets using tags (e.g. PII, PHI, PCI etc.) and attributes Metadata Catalog Search  Free text search on metadata  Advanced search using DSL Integrations across the Hadoop ecosystem, through a common metadata store  Free text search on metadata  OOtB real-time metadata and lineage ingestion with Hive, Sqoop, Storm/Kafka  APIs for custom metadata ingestion  Apache Ranger integration for classification based security Key Benefits: Modern Data Lakes need new ways to govern because: • Cost – Traditional staff ratio to data size not possible • Diversity – Only way to manage velocity of new datasets • Agility – Quick change based on tags / taxonomy
  • 13. 14 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved HDP – Security & Governance Classification Prohibition Time Location Policies PDP Resource Cache Ranger Manage Access Policies and Audit Logs Track Metadata and Lineage Atlas Client Subscribers to Topic Gets Metadata Updates Atlas Metastore Tags Assets Entitles Streams Pipelines Feeds Hive Tables HDFS Files HBase Tables Entities in Data Lake Industry First: Dynamic Tag-based Security Policies
  • 14. 15 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Dataguise: Company Background Pioneers of Hadoop Data Protection 2011-2013 Magic Quadrant “Visionary” for Data Masking 2015 Recommended for Data-Centric Security 2015 Recommended for Protecting Big Data in Hadoop 2015 2007-2010 “Breakthrough” Masking Technology 2014 The “Essential” Solution for Data Protection in Hadoop Cloud Platform Coverage 2016 2017 Gartner Market Guide for Data Masking 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017
  • 15. 16 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DGSECURE PRODUCT 16
  • 16. 17 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DgSecure Operation Sequence Define the Policy Discover the Sensitive Data Secure Data Monitor and Reporting
  • 17. 18 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Visualization: Enterprise-wide Data Security Posture 18
  • 18. 19 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Enable Access Control based on Sensitivity Classification 19  Set up DgSecure to run on periodic basis to scan for sensitive data and generate classification information – DgSecure will continuously update Atlas with Tags as and when it find sensitive information.  Set up Ranger Policies based on Sensitive Tags  Ranger Policies will kick in at the time any user tries to access the data, for example, in a Hive Query
  • 19. 20 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DgSecure – Atlas/Ranger Integration Flow 20 DgSecure Detection Atlas Populated with Sensitivity Tags Ranger Policies based on tags Access Control based on Sensitivity
  • 20. 21 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DgSecure Integration with Atlas/Ranger 21 DgSECURE DgSecure Repository Detection DATA STORE Hadoop, Hive, S3, Blob Storage ATLAS RANGER Atlas Tags ACL Enforcement Data Store (Hadoop, Hive, S3, Blob Storage)
  • 21. 22 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Demo – DgSecure + Atlas/Ranger
  • 22. 23 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Key Takeaways: DgSecure + HDP can help with GDPR  Detection of Sensitive Data – Structured, Unstructured Data, Context Information used, Machine Learning capabilities  Protection of Sensitive Data at Element Level – Masking or Encryption options in Hadoop – At Rest Protection (Masking or Encryption)  Monitoring – Raise Alerts on (Attempted) Access to Sensitive Data – Breach Notification Requirement  Access Control Integration – Via Atlas/Ranger integration, Ranger Tag-Based Policies  Reporting – Visualization of Enterprise-Level Data Exposure
  • 23. 24 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved ThankYou
  • 24. 25 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DgSecure Policy 25
  • 25. 26 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DgSecure Hive Task 26
  • 26. 27 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved DgSecure Detection Results (Hive)
  • 27. 28 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Sensitive DataTags in Atlas 28
  • 28. 29 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved Ranger Tag-Based Policies 29
  • 29. 30 © Hortonworks Inc. , and Dataguise Inc. 2011 – 2017. All Rights Reserved For more information check out Check out other relevant sessions:  Apace Atlas: Governance for your data, 4:10p, Wednesday April 5th 2017  Bridle Your Flying Islands And Castles In The Sky: Built-in Governance And Security For The Cloud, 11.30am, Thursday April 6, 2017  BoF sessions – Security and Governance 5:50p, Thursday, April 6th 2017 Hortonworks www.hortonworks.com Dataguise www.dataguise.com

Editor's Notes

  • #4: The GDPR represents a fundamental change in how data is processed. Companies must look at what steps they are taking to protect the rights of data subjects based on the uses of data they are making. Companies must have protective mechanisms in place and show that they are giving controls to data subjects and that they are respecting data subjects’ rights. This requires new technical measures – Privacy by design did not exist prior to the GDPR. Changes starting May 25, 2018 include technical requirements + increased fines.
  • #14: 13