SlideShare a Scribd company logo
1© 2018 All rights reserved.
Distributed Database
Architecture for GDPR
Karthik Ranganathan
Co-Founder & CTO
Feb, 2019
2© 2018 All rights reserved.
Introduction
Karthik Ranganathan
Co-Founder & CTO, YugaByte
Nutanix ♦ Facebook ♦ Microsoft
IIT-Madras, University of Texas-Austin
@karthikr
3© 2018 All rights reserved.
WHAT IS
YUGABYTE DB?
4© 2018 All rights reserved.
High Performance
Cloud-Native
Distributed SQL + NoSQL
YugaByte DB is a modern NewSQL database
5© 2018 All rights reserved.
TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE
Single Shard & Distributed ACID Txns
Document-Based, Strongly
Consistent Storage
Low Latency, Tunable Reads
High Throughput
OPEN SOURCE
Apache 2.0
Popular APIs Extended
Apache Cassandra, Redis and PostgreSQL (BETA)
Auto Sharding & Rebalancing
Global Data Distribution
Design Principles
CLOUD NATIVE
Built For The Container Era
Self-Healing, Fault-Tolerant
6© 2018 All rights reserved.
WHAT IS GDPR?
7© 2018 All rights reserved.
GDPR : General Data Protection Regulation
8© 2018 All rights reserved.
Citizens of EU can control sharing and protection
of their personal data by businesses.
9© 2018 All rights reserved.
Personal Data, similar to
PII (Personally Identifiable Information)
• User name
• Email address
• Date of birth
• Bank details
• Location details
• Computer IP address
10© 2018 All rights reserved.
Control over personal data
• Consent & data location
• Data privacy and safety
• Right to be forgotten
• Data access on demand
• Notify on data breach
• Data portability
• Ability to fix errors in data
• Restrict processing
Database concerns Application concerns
11© 2018 All rights reserved.
#1 USER CONSENT
AND DATA LOCATION
12© 2018 All rights reserved.
Data must be stored in EU by default. Businesses
need explicit user consent to move it outside.
13© 2018 All rights reserved.
Why is this hard?
• EU user data lives in that region
• Other countries have compliance regulation – more geo’s
• Public clouds may not have coverage – hybrid deployments
• Architecture depends on data – multiple per service
Think Global Deployments first!
14© 2018 All rights reserved.
Example – online ecommerce site
• Products table needs globally replication – not PII data
15© 2018 All rights reserved.
Read Replicas
Global Replication
Non-PII Data
Global Replication
with YugaByte DB
16© 2018 All rights reserved.
Example – online ecommerce site
• Users, orders and shipments needs locality – PII data
• Product locations table needs scale – may be PII
17© 2018 All rights reserved.
Primary Data in EU
PII Data
Non-EU Data
Non-EU Data
Geo-Partitioning
with YugaByte DB
18© 2018 All rights reserved.
Replicate data on demand to other geo’s
• User may be ok with replicating data
• Read replicas on demand (for remote, low-latency reads)
• Change data capture (for analytics)
19© 2018 All rights reserved.
Read Replicas
Primary Data in EU
PII Data with YugaByte DB
Read Replicas with
YugaByte DB
20© 2018 All rights reserved.
#2 DATA PRIVACY
AND SAFETY
21© 2018 All rights reserved.
Data must be secured by using best practices by
default. Users need to be notified on breach.
22© 2018 All rights reserved.
Implement end-to-end encryption on day #1
23© 2018 All rights reserved.
• Use TLS Encryption
• Between client and server for app interaction
• Between database servers for replication
Encrypt All Network Communication
24© 2018 All rights reserved.
TLS Encryption
Database Cluster
User
Server to server
communication
25© 2018 All rights reserved.
• Encryption at rest
• Integrate with external Key Management Systems
• Ability to rotate keys on demand
Encryption All Storage
Use app level encryption if needed. Have a key-value table with id
to cipher key. Encrypt PII data with the cipher key for fine-grained
control. More in the next section.
26© 2018 All rights reserved.
Encryption at Rest
Database Cluster
User
Encryption on disk
Key Management
Service
27© 2018 All rights reserved.
#3 RIGHT TO BE
FORGOTTEN
28© 2018 All rights reserved.
Data must be erased on explicit request or when data
is no longer relevant to original intent.
29© 2018 All rights reserved.
• Have a key-value table with id to cipher key
• Encrypt PII data with the cipher key on write
• Decrypt PII data on access
• Delete cipher key to forget PII data
Use Encryption of Data Attributes
30© 2018 All rights reserved.
SET email=foo@bar.com FOR USER ID=XXX
Example - Storing User Profile Data
SET email=ENCRYPTED FOR USER ID=XXX
Get encryption
key for user
Encryption PII Data
Store encrypted data
• Reads require decryption
• Data not accessible without key
31© 2018 All rights reserved.
• Many cases where value not needed
• Anonymize PII data with one way hash functions
• Use hashed ids for in data warehouse
• There is no PII data if hashed ids are used!
Use Anonymization of Data Attributes
32© 2018 All rights reserved.
USER=foo@bar.com CHECKED OUT PRODUCT=X, CATEGORY=Gadget
Example – Website Analytics
USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget
One-way hash
user id
Analytics
33© 2018 All rights reserved.
Example – Website Analytics
• User no longer identifiable
• Hashed data still useful!
34© 2018 All rights reserved.
#4 DATA ACCESS
ON DEMAND
35© 2018 All rights reserved.
Ability to inform a user about what data is being used,
for what purpose and where it is stored.
36© 2018 All rights reserved.
• Store in a separate information architecture table
• Make tagging a part of the process
• Easy to find what PII data is stored on demand
Tag Tables and Columns with PII
37© 2018 All rights reserved.
• Ensure PII are encrypted
• Ensure non-PII columns do not have sensitive data
• Use Spark/Presto to perform scan periodically
• Run scan on a read replica to not impact production
Run Continuous Compliance Checks
38© 2018 All rights reserved.
Ensure PII columns are encrypted
Ensure no PII data in other columns
Tag PII Columns
39© 2018 All rights reserved.
PUTTING IT ALL TOGETHER
40© 2018 All rights reserved.
GDPR Reference Architecture
Primary Cluster
(in EU)
Read Replica Clusters
(Anywhere in the World)
Encrypted Encrypted
App clients
Encrypted Async
Replication
Reads & Writes, Encrypted
Analytics clients
Read only, Encrypted
At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes
PII Columns Encrypted w/
Cipher Key
Tag PII Columns
Ensure PII columns are
encrypted
Ensure no PII data in other
columns
41© 2018 All rights reserved.
Thank You!
Try it at
docs.yugabyte.com/latest/quick-start

More Related Content

What's hot (20)

PDF
Big Data Patents Data 3Q 2016
Alex G. Lee, Ph.D. Esq. CLP
 
PDF
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
PDF
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
DevOps.com
 
PDF
ICIC 2017: New product presentationsLighthouse IP
Dr. Haxel Consult
 
ODP
Open Source Business Intelligence Overview
Alex Meadows
 
PPTX
The ETH Zurich DOI Desk
Barbara Hirschmann
 
PDF
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
PDF
Unstructured data’s role as an organisational enabler - Nikolai Petrou, Altvi...
BCS Data Management Specialist Group
 
PDF
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
Dr. Haxel Consult
 
PPTX
Enterprise Reporting with MongoDB and JasperSoft
MongoDB
 
PPTX
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Andreas Metzger
 
PPTX
The Concept & Techniques of Data Mining
rashed sharif
 
PDF
Iris 2018
Jordi Hinojosa
 
PPT
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
jagada7
 
PDF
What's New In Neo4j 3.4 & Bloom Update
Neo4j
 
PDF
II-SDV 2016 VantagePoint
Dr. Haxel Consult
 
PDF
Datacite at iita
Olatunbosun Obileye
 
PDF
A Gentle Introduction to Big Data
Mehmet Ali Akyol
 
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 
Big Data Patents Data 3Q 2016
Alex G. Lee, Ph.D. Esq. CLP
 
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
How to Gain a Competitive Edge with an Open Source, Purpose-built Time Series...
DevOps.com
 
ICIC 2017: New product presentationsLighthouse IP
Dr. Haxel Consult
 
Open Source Business Intelligence Overview
Alex Meadows
 
The ETH Zurich DOI Desk
Barbara Hirschmann
 
II-SDV 2015, 20 - 21 April, in Nice
Dr. Haxel Consult
 
Unstructured data’s role as an organisational enabler - Nikolai Petrou, Altvi...
BCS Data Management Specialist Group
 
II-SDV 2016 Patrick Beaucamp - Data Science with R and Vanilla Air
Dr. Haxel Consult
 
Enterprise Reporting with MongoDB and JasperSoft
MongoDB
 
Towards an End-to-End Architecture for Run-time Data Protection in the Cloud
Andreas Metzger
 
The Concept & Techniques of Data Mining
rashed sharif
 
Iris 2018
Jordi Hinojosa
 
A Hybrid Cloud MultiCloud Approach to Streamline Supply Chain Data Flow
jagada7
 
What's New In Neo4j 3.4 & Bloom Update
Neo4j
 
II-SDV 2016 VantagePoint
Dr. Haxel Consult
 
Datacite at iita
Olatunbosun Obileye
 
A Gentle Introduction to Big Data
Mehmet Ali Akyol
 
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
semanticsconference
 

Similar to YugaByte DB - "Designing a Distributed Database Architecture for GDPR Compliance" webinar slides (20)

PPTX
Distributed Database Architecture for GDPR
Yugabyte
 
PPTX
Privacy by design
Lars Albertsson
 
PDF
Privacy by Design - Lars Albertsson, Mapflat
Evention
 
PDF
Hpts 2011 flexible_oltp
Jags Ramnarayan
 
PPTX
MongoDB.local Sydney: The Changing Face of Data Privacy & Ethics, and How Mon...
MongoDB
 
PDF
Protecting privacy in practice
Lars Albertsson
 
PPTX
Securing Open Source Databases
Gazzang
 
PDF
Isaca new delhi india - privacy and big data
Ulf Mattsson
 
PDF
New Solutions for Security and Compliance in the Cloud
Online Tech
 
PPTX
Next generation data protection and security for oracle users - gdpr blockc...
Ulf Mattsson
 
PDF
DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today
DataStax
 
PDF
Reducing Database Pain & Costs with Postgres
EDB
 
PPTX
Security Framework for Multitenant Architecture
DataWorks Summit
 
PDF
Isaca new delhi india privacy and big data
Ulf Mattsson
 
PDF
Where data security and value of data meet in the cloud brighttalk webinar ...
Ulf Mattsson
 
PPTX
DataStax C*ollege Credit: What and Why NoSQL?
DataStax
 
PPTX
Yes sql08 inmemorydb
Daniel Austin
 
PDF
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Ulf Mattsson
 
PDF
Issa chicago next generation tokenization ulf mattsson apr 2011
Ulf Mattsson
 
PDF
Guide to NoSQL with MySQL
Samuel Rohaut
 
Distributed Database Architecture for GDPR
Yugabyte
 
Privacy by design
Lars Albertsson
 
Privacy by Design - Lars Albertsson, Mapflat
Evention
 
Hpts 2011 flexible_oltp
Jags Ramnarayan
 
MongoDB.local Sydney: The Changing Face of Data Privacy & Ethics, and How Mon...
MongoDB
 
Protecting privacy in practice
Lars Albertsson
 
Securing Open Source Databases
Gazzang
 
Isaca new delhi india - privacy and big data
Ulf Mattsson
 
New Solutions for Security and Compliance in the Cloud
Online Tech
 
Next generation data protection and security for oracle users - gdpr blockc...
Ulf Mattsson
 
DataStax & 451 Group Webinar - Real NoSQL Applications in the Enterprise Today
DataStax
 
Reducing Database Pain & Costs with Postgres
EDB
 
Security Framework for Multitenant Architecture
DataWorks Summit
 
Isaca new delhi india privacy and big data
Ulf Mattsson
 
Where data security and value of data meet in the cloud brighttalk webinar ...
Ulf Mattsson
 
DataStax C*ollege Credit: What and Why NoSQL?
DataStax
 
Yes sql08 inmemorydb
Daniel Austin
 
Atlanta ISSA 2010 Enterprise Data Protection Ulf Mattsson
Ulf Mattsson
 
Issa chicago next generation tokenization ulf mattsson apr 2011
Ulf Mattsson
 
Guide to NoSQL with MySQL
Samuel Rohaut
 
Ad

Recently uploaded (20)

PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
PDF
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
LOOPS in C Programming Language - Technology
RishabhDwivedi43
 
Book industry state of the nation 2025 - Tech Forum 2025
BookNet Canada
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Newgen Beyond Frankenstein_Build vs Buy_Digital_version.pdf
darshakparmar
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Ad

YugaByte DB - "Designing a Distributed Database Architecture for GDPR Compliance" webinar slides

  • 1. 1© 2018 All rights reserved. Distributed Database Architecture for GDPR Karthik Ranganathan Co-Founder & CTO Feb, 2019
  • 2. 2© 2018 All rights reserved. Introduction Karthik Ranganathan Co-Founder & CTO, YugaByte Nutanix ♦ Facebook ♦ Microsoft IIT-Madras, University of Texas-Austin @karthikr
  • 3. 3© 2018 All rights reserved. WHAT IS YUGABYTE DB?
  • 4. 4© 2018 All rights reserved. High Performance Cloud-Native Distributed SQL + NoSQL YugaByte DB is a modern NewSQL database
  • 5. 5© 2018 All rights reserved. TRANSACTIONAL PLANET-SCALEHIGH PERFORMANCE Single Shard & Distributed ACID Txns Document-Based, Strongly Consistent Storage Low Latency, Tunable Reads High Throughput OPEN SOURCE Apache 2.0 Popular APIs Extended Apache Cassandra, Redis and PostgreSQL (BETA) Auto Sharding & Rebalancing Global Data Distribution Design Principles CLOUD NATIVE Built For The Container Era Self-Healing, Fault-Tolerant
  • 6. 6© 2018 All rights reserved. WHAT IS GDPR?
  • 7. 7© 2018 All rights reserved. GDPR : General Data Protection Regulation
  • 8. 8© 2018 All rights reserved. Citizens of EU can control sharing and protection of their personal data by businesses.
  • 9. 9© 2018 All rights reserved. Personal Data, similar to PII (Personally Identifiable Information) • User name • Email address • Date of birth • Bank details • Location details • Computer IP address
  • 10. 10© 2018 All rights reserved. Control over personal data • Consent & data location • Data privacy and safety • Right to be forgotten • Data access on demand • Notify on data breach • Data portability • Ability to fix errors in data • Restrict processing Database concerns Application concerns
  • 11. 11© 2018 All rights reserved. #1 USER CONSENT AND DATA LOCATION
  • 12. 12© 2018 All rights reserved. Data must be stored in EU by default. Businesses need explicit user consent to move it outside.
  • 13. 13© 2018 All rights reserved. Why is this hard? • EU user data lives in that region • Other countries have compliance regulation – more geo’s • Public clouds may not have coverage – hybrid deployments • Architecture depends on data – multiple per service Think Global Deployments first!
  • 14. 14© 2018 All rights reserved. Example – online ecommerce site • Products table needs globally replication – not PII data
  • 15. 15© 2018 All rights reserved. Read Replicas Global Replication Non-PII Data Global Replication with YugaByte DB
  • 16. 16© 2018 All rights reserved. Example – online ecommerce site • Users, orders and shipments needs locality – PII data • Product locations table needs scale – may be PII
  • 17. 17© 2018 All rights reserved. Primary Data in EU PII Data Non-EU Data Non-EU Data Geo-Partitioning with YugaByte DB
  • 18. 18© 2018 All rights reserved. Replicate data on demand to other geo’s • User may be ok with replicating data • Read replicas on demand (for remote, low-latency reads) • Change data capture (for analytics)
  • 19. 19© 2018 All rights reserved. Read Replicas Primary Data in EU PII Data with YugaByte DB Read Replicas with YugaByte DB
  • 20. 20© 2018 All rights reserved. #2 DATA PRIVACY AND SAFETY
  • 21. 21© 2018 All rights reserved. Data must be secured by using best practices by default. Users need to be notified on breach.
  • 22. 22© 2018 All rights reserved. Implement end-to-end encryption on day #1
  • 23. 23© 2018 All rights reserved. • Use TLS Encryption • Between client and server for app interaction • Between database servers for replication Encrypt All Network Communication
  • 24. 24© 2018 All rights reserved. TLS Encryption Database Cluster User Server to server communication
  • 25. 25© 2018 All rights reserved. • Encryption at rest • Integrate with external Key Management Systems • Ability to rotate keys on demand Encryption All Storage Use app level encryption if needed. Have a key-value table with id to cipher key. Encrypt PII data with the cipher key for fine-grained control. More in the next section.
  • 26. 26© 2018 All rights reserved. Encryption at Rest Database Cluster User Encryption on disk Key Management Service
  • 27. 27© 2018 All rights reserved. #3 RIGHT TO BE FORGOTTEN
  • 28. 28© 2018 All rights reserved. Data must be erased on explicit request or when data is no longer relevant to original intent.
  • 29. 29© 2018 All rights reserved. • Have a key-value table with id to cipher key • Encrypt PII data with the cipher key on write • Decrypt PII data on access • Delete cipher key to forget PII data Use Encryption of Data Attributes
  • 30. 30© 2018 All rights reserved. SET [email protected] FOR USER ID=XXX Example - Storing User Profile Data SET email=ENCRYPTED FOR USER ID=XXX Get encryption key for user Encryption PII Data Store encrypted data • Reads require decryption • Data not accessible without key
  • 31. 31© 2018 All rights reserved. • Many cases where value not needed • Anonymize PII data with one way hash functions • Use hashed ids for in data warehouse • There is no PII data if hashed ids are used! Use Anonymization of Data Attributes
  • 32. 32© 2018 All rights reserved. [email protected] CHECKED OUT PRODUCT=X, CATEGORY=Gadget Example – Website Analytics USER=HASHED_VAL CHECKED OUT PRODUCT=X, CATEGORY=Gadget One-way hash user id Analytics
  • 33. 33© 2018 All rights reserved. Example – Website Analytics • User no longer identifiable • Hashed data still useful!
  • 34. 34© 2018 All rights reserved. #4 DATA ACCESS ON DEMAND
  • 35. 35© 2018 All rights reserved. Ability to inform a user about what data is being used, for what purpose and where it is stored.
  • 36. 36© 2018 All rights reserved. • Store in a separate information architecture table • Make tagging a part of the process • Easy to find what PII data is stored on demand Tag Tables and Columns with PII
  • 37. 37© 2018 All rights reserved. • Ensure PII are encrypted • Ensure non-PII columns do not have sensitive data • Use Spark/Presto to perform scan periodically • Run scan on a read replica to not impact production Run Continuous Compliance Checks
  • 38. 38© 2018 All rights reserved. Ensure PII columns are encrypted Ensure no PII data in other columns Tag PII Columns
  • 39. 39© 2018 All rights reserved. PUTTING IT ALL TOGETHER
  • 40. 40© 2018 All rights reserved. GDPR Reference Architecture Primary Cluster (in EU) Read Replica Clusters (Anywhere in the World) Encrypted Encrypted App clients Encrypted Async Replication Reads & Writes, Encrypted Analytics clients Read only, Encrypted At-Rest Encryption for All Nodes At-Rest Encryption for All Nodes PII Columns Encrypted w/ Cipher Key Tag PII Columns Ensure PII columns are encrypted Ensure no PII data in other columns
  • 41. 41© 2018 All rights reserved. Thank You! Try it at docs.yugabyte.com/latest/quick-start