SlideShare a Scribd company logo
Inexpensive
Datamasking for
MySQL with
ProxySQL
René Cannaò
Who we are
René Cannaò
Founder of ProxySQL
MySQL SRE at Dropbox
thanks to:
Frédéric Descamps
MySQL Community Manager
Other Sessions
273. ProxySQL, MaxScale, MySQL Router and other database traffic
managers / Petr Zaitsev (Percona)
155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
Agenda
● Database overview
● What is ProxySQL
● Features overview
● Data masking
● Rules
● Masking rules
● Obfuscation with mysqldump
● Examples
Overview of ProxySQL
Application and Database layers
APPLICATIONS
DATABASES
Main motivations
empower the DBAs
Improves manageability
understand and improve performance
High performance and High Availability
create a proxy layer to shield the database
Database as a Service (layered)
APPLICATIONS
DATABASES + MANAGER(s)
DAAS – REVERSE PROXY
What is ProxySQL?
The MySQL data stargate
How to deploy
How to deploy
ProxySQL Features (short list)
High Availability and Scalability
seamless failover
firewall
query throttling
query timeout
query mirroring
runtime reconfiguration
Scheduler
Support for Galera/PXC and
Group Replication
on-the-fly rewrite of queries
caching reads outside the database
connection pooling and multiplexing
complex query routing and r/w split
load balancing
real time statistics
monitoring
Data masking
Multiple instances on same ports
Native Clustering
Support for ClickHouse
Data Masking
Data masking or data obfuscation is the process of hiding original
data with random characters or data.
The main reason for applying masking to a data field is to protect
data that is classified as personal identifiable data, personal
sensitive data or commercially sensitive data, however the data
must remain usable for the purposes of undertaking valid test cycles
Why using ProxySQL as data masking
solution?
Open Source & Free like in beer
Other solutions are expensive or not working
Not worse than the other solutions as currently none is perfect
The best solution would be to have this feature implemented in the
server just after the handler API
Query Rules
instructions to "program" ProxySQL behavior
matching criteria
actions
flow control and chains
Query Rewrite
Dynamically rewrite queries sent by the application/client
without the client being aware
on the fly
using ProxySQL query rules
rules defined using regular expressions, s/match/replace/
The concept
We use Regular Expressions to modify the clients’ SQL statement
and replace the column(s) we want to hide by some characters or
generate fake data.
We will split our solution in two different solutions:
● Provide access to the database to developers
● Generate dump to populate a database to share
Only the defined users, in our example we use a developer, will
have his statements modified.
The concept (2)
We will also create two categories :
•data masking
•data obfuscating
Data Masking
Here we will just mask with a generic character the full value of the
column or part of it:
Data Obfuscation
Here we will just replace the value of the column with random
characters of the same type, we create fake data
Access
INSERT INTO mysql_users
(username, password, active, default_hostgroup)
VALUES ('devel','devel',1,1);
INSERT INTO mysql_users
(username, password, active, default_hostgroup)
VALUES ('backup','dumpme',1,1);
Create a user for masking:
Create a user for backups:
Rules
Avoid SELECT *
for the developer, we need to create some rules to block any
SELECT * variant on the table
if the column is part of many tables, we need to do so for each
of them
Rules (2)
Mask or obfuscate the field
when the field is selected in the columns we need:
● to replace the column by showing the first 2 characters and a
certain amount of X s or generate a random string
● keep the column name
● for mysqldump we need to allow SELECT * but mask and/or
obfuscate sensible values
Rules overview
rule_id: 1
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: `*first_name*`
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern: first_name
apply: 0
Rule #1
rule_id: 2
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: ((?)(`?w+`?.)?first_name()?)([ ,n])
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern:
1CONCAT(LEFT(2first_name,2),REPEAT('X',10))3 first_name4
apply: 0
Rule #2
rule_id: 158
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: ((?)(`?w+`?.)?salary()?)([ ,n])
negate_match_pattern: 0
re_modifiers: CASELESS,GLOBAL
flagOUT: NULL
replace_pattern: 1CONCAT( floor(rand() * 50000) + 10000,'')3
salary4
Rule #2 - obfuscating
Let's imagine we want to provide fake number for `salaries`.`salary` column.
We could instead of the previous rule use this one
rule_id: 3
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: )()?) first_names+(w),
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern: )1 2,
apply: 1
Rule #3
rule_id: 4
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: )()?) first_names+(.*)s+from
re_modifiers: caseless,global
flagOUT: NULL
replace_pattern: )1 2 from
apply: 1
Rule #4
rule_id: 5
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+*.*FROM.*employees
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #5
rule_id: 6
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+employees.*.*FROM.*employees
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #6
rule_id: 7
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+(w+).*.*FROM.*employeess+(ass+)?(1)
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #6
Rules for mysqldump
To provide a dump that might be used by developers, Q/A or
support, we need to:
● generate valid data
● obfuscate sensitive information
● rewrite SQL statements issued by mysqldump
● only for tables and columns with sensitive data
mysqldump rules
rule_id: 8
active: 1
user: backup
schema: employees
flagIN: 0
match: ^/*!40001 SQL_NO_CACHE */ * FROM `salaries`
replace: SQL_NO_CACHE emp_no,
ROUND(RAND()*100000), from_date, to_date
FROM salaries
flagOUT: NULL
apply: 1
Rule #8
mysqldump rules
rule_id: 9
active: 1
user: backup
schema: employees
flagIN: 0
match: * FROM `employees`
replace: emp_no, CONCAT(LEFT(birth_date,2),
FLOOR(RAND()*50)+10,
RIGHT(birth_date,6)) birth_date,
CONCAT(LEFT(first_name,2),
REPEAT('x',LENGTH(first_name)-2)) first_name,
CONCAT(LEFT(last_name,3),
REPEAT('x',LENGTH(last_name)-3)) last_name,
gender, hire_date FROM employees
flagOUT: NULL
apply: 1
Rule #9
Limitions
● better support in proxySQL >= 1.4.x
○ RE2 an PCRE regexes
● all fields with the same name will be masked whatever the
name of the table is in the same schema
● the regexps can always be not sufficient
● block any query not matching whitelisted SQL statements
● the dump via ProxySQL solution seems to be the best
Make it easy
This is not really easy isn´t it ?
You can use this small bash script
(https://github.com/lefred/maskit) to generate them:
# ./maskit.sh -c first_name -t employees -d employees
column: first_name
table: employees
schema: employees
let's add the rules...
Examples
Easy ones:
SELECT * FROM employees;
SELECT emp_no, last_name, first_name FROM employees;
Examples (2)
More difficult:
select emp_no, concat(first_name), last_name from
employees;
select emp_no, first_name, first_name from
employees.employees
select emp_no, `first_name` from employees;
select emp_no, first_name
-> from employees; (*)
Examples (3)
More difficult:
select t1.first_name from employees.employees as t1;
select emp_no, first_name as fred from employees;
select emp_no, first_name rene from employees;
select emp_no, first_name `as` from employees;
select first_name as `as`, last_name from employees;
select `t1`.`first_name` from employees.employees as t1;
Examples (4)
More difficult:
select first_name fred, last_name from employees;
select emp_no, first_name /* first_name */ from
employees.employees;
/* */ select last_name, first_name from employees;
select CUSTOMERS.* from myapp.CUSTOMERS;
select a.* from employees.employees a;`
We need you!
Thank you!
Questions?
E: rene@proxysql.com

More Related Content

What's hot (20)

PPTX
Microservices in the Apache Kafka Ecosystem
confluent
 
PDF
Extracting Insights from Data at Twitter
Prasad Wagle
 
PDF
Event driven autoscaling with keda
Adam Hamsik
 
PDF
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Colin Charles
 
PDF
Jitney, Kafka at Airbnb
alexismidon
 
PDF
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
Severalnines
 
PDF
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
PDF
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
PDF
Scripting and training for effective fundraising calls
Albert Melfo
 
ODP
Base NoSql et Python
yboussard
 
PDF
PSR-3 logs using Monolog and Graylog
OCoderFest
 
PPTX
Query logging with proxysql
YoungHeon (Roy) Kim
 
PDF
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 
PPT
Intro to MySQL Master Slave Replication
satejsahu
 
PPTX
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
PDF
MySQL Performance Schema in Action
Sveta Smirnova
 
PDF
gRPC - 打造輕量、高效能的後端服務
升煌 黃
 
PDF
MongoDB Case Study in Healthcare
MongoDB
 
PDF
Inno Db Internals Inno Db File Formats And Source Code Structure
MySQLConference
 
PDF
Hyperledger Fabric Application Development 20190618
Arnaud Le Hors
 
Microservices in the Apache Kafka Ecosystem
confluent
 
Extracting Insights from Data at Twitter
Prasad Wagle
 
Event driven autoscaling with keda
Adam Hamsik
 
The Proxy Wars - MySQL Router, ProxySQL, MariaDB MaxScale
Colin Charles
 
Jitney, Kafka at Airbnb
alexismidon
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
Severalnines
 
Apache Arrow Workshop at VLDB 2019 / BOSS Session
Wes McKinney
 
The Full MySQL and MariaDB Parallel Replication Tutorial
Jean-François Gagné
 
Scripting and training for effective fundraising calls
Albert Melfo
 
Base NoSql et Python
yboussard
 
PSR-3 logs using Monolog and Graylog
OCoderFest
 
Query logging with proxysql
YoungHeon (Roy) Kim
 
Meet Spilo, Zalando’s HIGH-AVAILABLE POSTGRESQL CLUSTER - Feike Steenbergen
distributed matters
 
Intro to MySQL Master Slave Replication
satejsahu
 
Our journey with druid - from initial research to full production scale
Itai Yaffe
 
MySQL Performance Schema in Action
Sveta Smirnova
 
gRPC - 打造輕量、高效能的後端服務
升煌 黃
 
MongoDB Case Study in Healthcare
MongoDB
 
Inno Db Internals Inno Db File Formats And Source Code Structure
MySQLConference
 
Hyperledger Fabric Application Development 20190618
Arnaud Le Hors
 

Viewers also liked (20)

PPTX
MEAN Stack
José Moreno
 
PDF
[스마트스터디]MongoDB 의 역습
smartstudy_official
 
PDF
SunshinePHP 2017 - Making the most out of MySQL
Gabriela Ferrara
 
PDF
Building Scalable High Availability Systems using MySQL Fabric
Mats Kindahl
 
PDF
MySQL Enterprise Cloud
Mark Swarbrick
 
PDF
Coding like a girl - DjangoCon
Gabriela Ferrara
 
PDF
Strip your TEXT fields
Gabriela Ferrara
 
PDF
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Mats Kindahl
 
PDF
Sharding using MySQL and PHP
Mats Kindahl
 
PPTX
Exploring MongoDB & Elasticsearch: Better Together
ObjectRocket
 
PDF
The MySQL Server Ecosystem in 2016
Colin Charles
 
PPTX
Laravel 5 and SOLID
Igor Talevski
 
PDF
MySQL 5.7 - 
Tirando o Máximo Proveito
Gabriela Ferrara
 
PDF
20171104 hk-py con-mysql-documentstore_v1
Ivan Ma
 
PDF
Strip your TEXT fields - Exeter Web Feb/2016
Gabriela Ferrara
 
PDF
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
PDF
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
Gabriela Ferrara
 
PDF
MySQL Cluster Whats New
Mark Swarbrick
 
PDF
Mongodb
Apurva Vyas
 
PDF
LAMP: Desenvolvendo além do trivial
Gabriela Ferrara
 
MEAN Stack
José Moreno
 
[스마트스터디]MongoDB 의 역습
smartstudy_official
 
SunshinePHP 2017 - Making the most out of MySQL
Gabriela Ferrara
 
Building Scalable High Availability Systems using MySQL Fabric
Mats Kindahl
 
MySQL Enterprise Cloud
Mark Swarbrick
 
Coding like a girl - DjangoCon
Gabriela Ferrara
 
Strip your TEXT fields
Gabriela Ferrara
 
MySQL Sharding: Tools and Best Practices for Horizontal Scaling
Mats Kindahl
 
Sharding using MySQL and PHP
Mats Kindahl
 
Exploring MongoDB & Elasticsearch: Better Together
ObjectRocket
 
The MySQL Server Ecosystem in 2016
Colin Charles
 
Laravel 5 and SOLID
Igor Talevski
 
MySQL 5.7 - 
Tirando o Máximo Proveito
Gabriela Ferrara
 
20171104 hk-py con-mysql-documentstore_v1
Ivan Ma
 
Strip your TEXT fields - Exeter Web Feb/2016
Gabriela Ferrara
 
Ora mysql bothGetting the best of both worlds with Oracle 11g and MySQL Enter...
Ivan Zoratti
 
LaravelSP - MySQL 5.7: introdução ao JSON Data Type
Gabriela Ferrara
 
MySQL Cluster Whats New
Mark Swarbrick
 
Mongodb
Apurva Vyas
 
LAMP: Desenvolvendo além do trivial
Gabriela Ferrara
 
Ad

Similar to Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL) (20)

ODT
Mysql
ksujitha
 
PPT
plsql les06
sasa_eldoby
 
PDF
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Jesmar Cannao'
 
PPTX
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
TAISEEREISA
 
ODP
Msql
ksujitha
 
PPTX
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
Alex Zaballa
 
PPTX
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
Alex Zaballa
 
PPTX
Postgresql
NexThoughts Technologies
 
ODP
Sql lite android
Dushyant Nasit
 
PDF
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
Frederic Descamps
 
PDF
PerlApp2Postgresql (2)
Jerome Eteve
 
PDF
Oracle adapters for Ruby ORMs
Raimonds Simanovskis
 
PDF
Lobos Introduction
Nicolas Buduroi
 
DOCX
SSMS-waitstats
E Blake
 
PPT
Php classes in mumbai
aadi Surve
 
PPTX
It's Time to Get Ready for the Power of PL/SQL and JavaScript Combined
Rodrigo Mesquita
 
PPT
My sql with querys
NIRMAL FELIX
 
PPTX
Database COMPLETE
Abrar ali
 
PPTX
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
Jürgen Ambrosi
 
PPT
Beg sql
KPNR Jan
 
Mysql
ksujitha
 
plsql les06
sasa_eldoby
 
ProxySQL and the Tricks Up Its Sleeve - Percona Live 2022.pdf
Jesmar Cannao'
 
Chapter 3.pptx Oracle SQL or local Android database setup SQL, SQL-Lite, codi...
TAISEEREISA
 
Msql
ksujitha
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
Alex Zaballa
 
OOW16 - Oracle Database 12c - The Best Oracle Database 12c New Features for D...
Alex Zaballa
 
Sql lite android
Dushyant Nasit
 
MySQL Day Roma - MySQL Shell and Visual Studio Code Extension
Frederic Descamps
 
PerlApp2Postgresql (2)
Jerome Eteve
 
Oracle adapters for Ruby ORMs
Raimonds Simanovskis
 
Lobos Introduction
Nicolas Buduroi
 
SSMS-waitstats
E Blake
 
Php classes in mumbai
aadi Surve
 
It's Time to Get Ready for the Power of PL/SQL and JavaScript Combined
Rodrigo Mesquita
 
My sql with querys
NIRMAL FELIX
 
Database COMPLETE
Abrar ali
 
2° Ciclo Microsoft CRUI 3° Sessione: l'evoluzione delle piattaforme tecnologi...
Jürgen Ambrosi
 
Beg sql
KPNR Jan
 
Ad

More from Ontico (20)

PDF
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Ontico
 
PDF
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Ontico
 
PPTX
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Ontico
 
PDF
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Ontico
 
PDF
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Ontico
 
PDF
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
Ontico
 
PDF
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Ontico
 
PPTX
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
Ontico
 
PPTX
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ontico
 
PDF
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Ontico
 
PPTX
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Ontico
 
PPTX
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Ontico
 
PDF
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Ontico
 
PPT
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Ontico
 
PPTX
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Ontico
 
PPTX
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Ontico
 
PPTX
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Ontico
 
PPTX
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Ontico
 
PDF
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Ontico
 
PDF
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Ontico
 
One-cloud — система управления дата-центром в Одноклассниках / Олег Анастасье...
Ontico
 
Масштабируя DNS / Артем Гавриченков (Qrator Labs)
Ontico
 
Создание BigData-платформы для ФГУП Почта России / Андрей Бащенко (Luxoft)
Ontico
 
Готовим тестовое окружение, или сколько тестовых инстансов вам нужно / Алекса...
Ontico
 
Новые технологии репликации данных в PostgreSQL / Александр Алексеев (Postgre...
Ontico
 
PostgreSQL Configuration for Humans / Alvaro Hernandez (OnGres)
Ontico
 
Опыт разработки модуля межсетевого экранирования для MySQL / Олег Брославский...
Ontico
 
ProxySQL Use Case Scenarios / Alkin Tezuysal (Percona)
Ontico
 
MySQL Replication — Advanced Features / Петр Зайцев (Percona)
Ontico
 
Внутренний open-source. Как разрабатывать мобильное приложение большим количе...
Ontico
 
Подробно о том, как Causal Consistency реализовано в MongoDB / Михаил Тюленев...
Ontico
 
Балансировка на скорости проводов. Без ASIC, без ограничений. Решения NFWare ...
Ontico
 
Перехват трафика — мифы и реальность / Евгений Усков (Qrator Labs)
Ontico
 
И тогда наверняка вдруг запляшут облака! / Алексей Сушков (ПЕТЕР-СЕРВИС)
Ontico
 
Как мы заставили Druid работать в Одноклассниках / Юрий Невиницин (OK.RU)
Ontico
 
Разгоняем ASP.NET Core / Илья Вербицкий (WebStoating s.r.o.)
Ontico
 
100500 способов кэширования в Oracle Database или как достичь максимальной ск...
Ontico
 
Apache Ignite Persistence: зачем Persistence для In-Memory, и как он работает...
Ontico
 
Механизмы мониторинга баз данных: взгляд изнутри / Дмитрий Еманов (Firebird P...
Ontico
 
Как мы учились чинить самолеты в воздухе / Евгений Коломеец (Virtuozzo)
Ontico
 

Recently uploaded (20)

PPT
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
PPTX
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
PDF
Design Thinking basics for Engineers.pdf
CMR University
 
PDF
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
PPTX
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
PPTX
Hashing Introduction , hash functions and techniques
sailajam21
 
PDF
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
PDF
Zilliz Cloud Demo for performance and scale
Zilliz
 
PPTX
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
PPTX
Day2 B2 Best.pptx
helenjenefa1
 
PPTX
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
PPTX
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
PPTX
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
PDF
Electrical Engineer operation Supervisor
ssaruntatapower143
 
PDF
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
PPTX
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PDF
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
PPTX
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 
Carmon_Remote Sensing GIS by Mahesh kumar
DhananjayM6
 
Worm gear strength and wear calculation as per standard VB Bhandari Databook.
shahveer210504
 
Design Thinking basics for Engineers.pdf
CMR University
 
MAD Unit - 2 Activity and Fragment Management in Android (Diploma IT)
JappanMavani
 
265587293-NFPA 101 Life safety code-PPT-1.pptx
chandermwason
 
Hashing Introduction , hash functions and techniques
sailajam21
 
Basic_Concepts_in_Clinical_Biochemistry_2018كيمياء_عملي.pdf
AdelLoin
 
Zilliz Cloud Demo for performance and scale
Zilliz
 
Lecture 1 Shell and Tube Heat exchanger-1.pptx
mailforillegalwork
 
Day2 B2 Best.pptx
helenjenefa1
 
Shinkawa Proposal to meet Vibration API670.pptx
AchmadBashori2
 
Mechanical Design of shell and tube heat exchangers as per ASME Sec VIII Divi...
shahveer210504
 
MobileComputingMANET2023 MobileComputingMANET2023.pptx
masterfake98765
 
Electrical Engineer operation Supervisor
ssaruntatapower143
 
Biomechanics of Gait: Engineering Solutions for Rehabilitation (www.kiu.ac.ug)
publication11
 
Damage of stability of a ship and how its change .pptx
ehamadulhaque
 
PORTFOLIO Golam Kibria Khan — architect with a passion for thoughtful design...
MasumKhan59
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
GitOps_Without_K8s_Training_detailed git repository
DanialHabibi2
 
What is Shot Peening | Shot Peening is a Surface Treatment Process
Vibra Finish
 

Inexpensive Datamasking for MySQL with ProxySQL — Data Anonymization for Developers / Rene Cannao (ProxySQL)

  • 2. Who we are René Cannaò Founder of ProxySQL MySQL SRE at Dropbox thanks to: Frédéric Descamps MySQL Community Manager
  • 3. Other Sessions 273. ProxySQL, MaxScale, MySQL Router and other database traffic managers / Petr Zaitsev (Percona) 155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
  • 4. Agenda ● Database overview ● What is ProxySQL ● Features overview ● Data masking ● Rules ● Masking rules ● Obfuscation with mysqldump ● Examples
  • 6. Application and Database layers APPLICATIONS DATABASES
  • 7. Main motivations empower the DBAs Improves manageability understand and improve performance High performance and High Availability create a proxy layer to shield the database
  • 8. Database as a Service (layered) APPLICATIONS DATABASES + MANAGER(s) DAAS – REVERSE PROXY
  • 9. What is ProxySQL? The MySQL data stargate
  • 12. ProxySQL Features (short list) High Availability and Scalability seamless failover firewall query throttling query timeout query mirroring runtime reconfiguration Scheduler Support for Galera/PXC and Group Replication on-the-fly rewrite of queries caching reads outside the database connection pooling and multiplexing complex query routing and r/w split load balancing real time statistics monitoring Data masking Multiple instances on same ports Native Clustering
  • 14. Data Masking Data masking or data obfuscation is the process of hiding original data with random characters or data. The main reason for applying masking to a data field is to protect data that is classified as personal identifiable data, personal sensitive data or commercially sensitive data, however the data must remain usable for the purposes of undertaking valid test cycles
  • 15. Why using ProxySQL as data masking solution? Open Source & Free like in beer Other solutions are expensive or not working Not worse than the other solutions as currently none is perfect The best solution would be to have this feature implemented in the server just after the handler API
  • 16. Query Rules instructions to "program" ProxySQL behavior matching criteria actions flow control and chains
  • 17. Query Rewrite Dynamically rewrite queries sent by the application/client without the client being aware on the fly using ProxySQL query rules rules defined using regular expressions, s/match/replace/
  • 18. The concept We use Regular Expressions to modify the clients’ SQL statement and replace the column(s) we want to hide by some characters or generate fake data. We will split our solution in two different solutions: ● Provide access to the database to developers ● Generate dump to populate a database to share Only the defined users, in our example we use a developer, will have his statements modified.
  • 19. The concept (2) We will also create two categories : •data masking •data obfuscating
  • 20. Data Masking Here we will just mask with a generic character the full value of the column or part of it:
  • 21. Data Obfuscation Here we will just replace the value of the column with random characters of the same type, we create fake data
  • 22. Access INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('devel','devel',1,1); INSERT INTO mysql_users (username, password, active, default_hostgroup) VALUES ('backup','dumpme',1,1); Create a user for masking: Create a user for backups:
  • 23. Rules Avoid SELECT * for the developer, we need to create some rules to block any SELECT * variant on the table if the column is part of many tables, we need to do so for each of them
  • 24. Rules (2) Mask or obfuscate the field when the field is selected in the columns we need: ● to replace the column by showing the first 2 characters and a certain amount of X s or generate a random string ● keep the column name ● for mysqldump we need to allow SELECT * but mask and/or obfuscate sensible values
  • 25. Rules overview rule_id: 1 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: `*first_name*` re_modifiers: caseless,global flagOUT: NULL replace_pattern: first_name apply: 0 Rule #1
  • 26. rule_id: 2 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: ((?)(`?w+`?.)?first_name()?)([ ,n]) re_modifiers: caseless,global flagOUT: NULL replace_pattern: 1CONCAT(LEFT(2first_name,2),REPEAT('X',10))3 first_name4 apply: 0 Rule #2
  • 27. rule_id: 158 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: ((?)(`?w+`?.)?salary()?)([ ,n]) negate_match_pattern: 0 re_modifiers: CASELESS,GLOBAL flagOUT: NULL replace_pattern: 1CONCAT( floor(rand() * 50000) + 10000,'')3 salary4 Rule #2 - obfuscating Let's imagine we want to provide fake number for `salaries`.`salary` column. We could instead of the previous rule use this one
  • 28. rule_id: 3 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: )()?) first_names+(w), re_modifiers: caseless,global flagOUT: NULL replace_pattern: )1 2, apply: 1 Rule #3
  • 29. rule_id: 4 active: 1 username: devel schemaname: employees flagIN: 0 match_pattern: )()?) first_names+(.*)s+from re_modifiers: caseless,global flagOUT: NULL replace_pattern: )1 2 from apply: 1 Rule #4
  • 30. rule_id: 5 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0 Rule #5
  • 31. rule_id: 6 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+employees.*.*FROM.*employees re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0 Rule #6
  • 32. rule_id: 7 active: 1 username: devel schemaname: employees match_pattern: ^SELECTs+(w+).*.*FROM.*employeess+(ass+)?(1) re_modifiers: caseless,global error_msg: Query not allowed due to sensitive information, please contact [email protected] apply: 0 Rule #6
  • 33. Rules for mysqldump To provide a dump that might be used by developers, Q/A or support, we need to: ● generate valid data ● obfuscate sensitive information ● rewrite SQL statements issued by mysqldump ● only for tables and columns with sensitive data
  • 34. mysqldump rules rule_id: 8 active: 1 user: backup schema: employees flagIN: 0 match: ^/*!40001 SQL_NO_CACHE */ * FROM `salaries` replace: SQL_NO_CACHE emp_no, ROUND(RAND()*100000), from_date, to_date FROM salaries flagOUT: NULL apply: 1 Rule #8
  • 35. mysqldump rules rule_id: 9 active: 1 user: backup schema: employees flagIN: 0 match: * FROM `employees` replace: emp_no, CONCAT(LEFT(birth_date,2), FLOOR(RAND()*50)+10, RIGHT(birth_date,6)) birth_date, CONCAT(LEFT(first_name,2), REPEAT('x',LENGTH(first_name)-2)) first_name, CONCAT(LEFT(last_name,3), REPEAT('x',LENGTH(last_name)-3)) last_name, gender, hire_date FROM employees flagOUT: NULL apply: 1 Rule #9
  • 36. Limitions ● better support in proxySQL >= 1.4.x ○ RE2 an PCRE regexes ● all fields with the same name will be masked whatever the name of the table is in the same schema ● the regexps can always be not sufficient ● block any query not matching whitelisted SQL statements ● the dump via ProxySQL solution seems to be the best
  • 37. Make it easy This is not really easy isn´t it ? You can use this small bash script (https://github.com/lefred/maskit) to generate them: # ./maskit.sh -c first_name -t employees -d employees column: first_name table: employees schema: employees let's add the rules...
  • 38. Examples Easy ones: SELECT * FROM employees; SELECT emp_no, last_name, first_name FROM employees;
  • 39. Examples (2) More difficult: select emp_no, concat(first_name), last_name from employees; select emp_no, first_name, first_name from employees.employees select emp_no, `first_name` from employees; select emp_no, first_name -> from employees; (*)
  • 40. Examples (3) More difficult: select t1.first_name from employees.employees as t1; select emp_no, first_name as fred from employees; select emp_no, first_name rene from employees; select emp_no, first_name `as` from employees; select first_name as `as`, last_name from employees; select `t1`.`first_name` from employees.employees as t1;
  • 41. Examples (4) More difficult: select first_name fred, last_name from employees; select emp_no, first_name /* first_name */ from employees.employees; /* */ select last_name, first_name from employees; select CUSTOMERS.* from myapp.CUSTOMERS; select a.* from employees.employees a;`