SlideShare a Scribd company logo
Secure Software Design
For Data Privacy
Narudom Roongsiriwong, CISSP
OWASP Thailand Chapter Meeting, July 25, 2019
WhoAmI
● Lazy Blogger
– Japan, Security, FOSS, Politics, Christian
– http://narudomr.blogspot.com
● Information Security since 1995
● Web Application Development since 1998
● SVP, Head of IT Security, Kiatnakin Bank PLC (KKP)
● Committee Member, Thailand Banking Sector CERT (TB-CERT)
● Consultant, OWASP Thailand Chapter
● Committee Member, Cloud Security Alliance (CSA), Thailand Chapter
● Committee Member, National Digital ID Project, Technical Team
● Contact: narudom@owasp.org
How can we survive from dataHow can we survive from data
protection regulation’s penaltiesprotection regulation’s penalties
if your systems are compromisedif your systems are compromised
or data are breached?or data are breached?
Privacy By Design
The 7 Foundational Principles
● Proactive not Reactive; Preventative not Remedial
● Privacy as the Default
● Privacy Embedded into Design
● Full Functionality – Positive-Sum, not Zero-Sum
● End-to-End Security – Lifecycle Protection
● Visibility and Transparency
● Respect for User Privacy
Source: Privacy By Design – The 7 Foundational Principles, Ann Cavoukian, Ph.D. ,
Information & Privacy Commissioner, Ontario, Canada
Data Privacy Ground Rules
● If you don’t need it, don’t collect it.
● If you need to collect it for processing only, collect it only
after you have informed the user that you are collecting their
information and they have consented, but don’t store it
● If you have the need to collect it for processing and storage,
then collect it, with user consent, and store it only for an
explicit retention period that is compliant with organizational
policy and/or regulatory requirements
● If you have the need to collect it and store it, then don’t
archive it, if the data has outlived its usefulness and there is
no retention requirement.
Fundamental Security Concepts
Design
Core
Confidentiality Integrity Availability
Authentication Authorization Accountability
Authentication Authorization AccountabilityNeed to Know Least Privilege
Separation of
Duties
Defense in Depth
Fail Safe /
Fail Secure
Economy of
Mechanisms
Complete
Mediation
Open Design
Least Common
Mechanisms
Psychological
Acceptability
Weakest Link
Leveraging Existing
Components
Security in Privacy Design
Design
Core
Confidentiality IntegrityIntegrity Availability
Authentication Authorization Accountability
Authentication Authorization AccountabilityNeed to Know Least Privilege
Separation of
Duties
Defense in Depth
Fail Safe /
Fail Secure
Economy of
Mechanisms
Complete
Mediation
Open Design
Least Common
Mechanisms
Psychological
Acceptability
Weakest Link
Leveraging Existing
Components
Privacy vs Integrity
● In most of data protection acts (such as GDPR) said that
“organizations must take necessary and reasonable steps to
ensure the accuracy of personal data collected from data
subjects”
● Some privacy design approaches using referential integrity
across datasets
● But some privacy design approaches using data distortion
techniques
● Conclusion
– Data as “Source of Truth” → Integrity is a must
– Data in use → Integrity depends on utility
Confidentiality Controls for Data Privacy
Types of Confidentiality Controls
Secret Writing
Confidentiality
Control
Masking
Overt
Covert
Encryption
Hashing
Steganography
Digital
Watermarking
Real World Cryptography Implementation
● Cryptographic algorithms and parameters
– Symmetric: 3DES, AES / Asymmetric: RSA, ECC
– Key size
– Padding
– Symmetric algorithm specific parameters
● Mode: ECB, CBC, CFB, etc.
● Initialization Vector (IV) / Starting Variable (SV) / Nonce
● Key controls and key management
● Key change/exchange procedures
● Cryptographic toolkits
● Random number/seed generators
● Condenses arbitrary
message to fixed size
– h = H(M)
● Usually assume hash
function is public
● Hash used to detect
changes to message
● Well-know hash functions:
MD5, SHA-1, SHA-2
Hashing
● Role-based access control
● Separation of duties
● Least privilege
Authorization Design for Data Privacy
● Log data should include the who, what, where, and when
aspects of software operations
● Design decisions to retain, archive, and dispose logs should
not contradict external regulatory or internal retention
requirements.
● Sensitive data (direct and/or indirect identifiers) should never
be logged in plaintext form
● Automatically log the authenticated principal and system
timestamp
Accountability Design for Data Privacy:
Audit/Logging
Design Principle for Privacy: Need To Know
● Need to know limits information access to the information that an
individual requires to carry out his or her job responsibilities
(business needs)
● When access to covered data is broader than what is required for
legitimate purposes, there is unnecessary risk of an attacker gaining
access to the data
“Need to Know” Implementation
Name Address Phone
Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211
Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754
Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453
John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728
John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562
Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004
● Which attributes these groups of people need or not need to
know?
– Outbound telesales
– Marketing researchers on geographic analysis
– Direct mail marketers
Design Principle for Privacy: Least Privilege
● When designing access controls, each user or system
component should be allocated the minimum privilege
required to perform an action for the minimum amount of
time
● To limit damage that can be caused by an accident, error, or
unauthorized act
● Benefits of the principle include:
– Better system stability
– Better system security
– Ease of deployment
24
“Least Privilege” Implementation
● Ensure that only a minimal set of users have
root/administrator/sa/sysadmin access
● Unless a permission is granted explicitly, the user or process
should not be able to access the protected resource.
● For data protection regulations, only those who need access
to certain data are allowed to have it, otherwise, it should be
totally secured from other internal people within the
business
● The concept of having more
than one person required to
complete a task.
● Primary objective the
prevention of fraud and
errors
● The requirement of two
signatures on a cheque.
Design Principle for Privacy: Separation of Duties
● Identification of a requirement (or change request); e.g. a
business person
● Authorization and approval; e.g. an IT governance board or
manager
● Design and development; e.g. a developer
● Review, inspection and approval; e.g. another developer or
architect.
● Implementation in production; typically a software change or
system administrator.
“Separation of Duties” Implementation
Privacy Design
Terminology
● Data Attribute:
– Data field, data column or variable, an information that can be found across the data records in
a data set
● Dataset:
– A set of data records, conceptually similar to a table in a conventional database or spreadsheet,
having records (rows) and attributes (columns)
● Direct Identifier:
– A data attribute that on its own identifies an individual (e.g. fingerprint) or has been assigned to
an individual (e.g. Citizen ID)
● Indirect identifier or Quasi-Identifiers:
– A data attribute that, by itself/on its own, does not identify an individual, but may identify an
individual when combined with other information
● De-identification:
– The process used to prevent someone's personal identity from being revealed
● Re-identification:
– Identifying a person from an anonymized dataset
Link to
A Visual Guide to Practical Data De-Identification
https://fpf.org/wp-content/uploads/2016/04/FPF_Visual-Guide-to-Practical-Data-DeID.pdf
Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
Practical Data De-Identification (1/2)
Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
Practical Data De-Identification (2/2)
Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
Pseudonymization (1/2)
● Decoupling identifiable data from the dataset, usually by
means of identifier key references
● Pseudonym (aka Token) may represent one or more
attributes
● Pseudonyms can be
– Reversible (by the owner(s) of the original data), where the
original values are securely kept but can be retrieved and
linked back to the pseudonyms
– Irreversible, where the original values are properly disposed
and the pseudonymization was done in a non-repeatable
fashion
Pseudonymization (2/2)
● Pseudonyms persistence
– Persistent – Same pseudonym values represent the same
individual across different datasets
– Non-persistent – Different pseudonyms represent the same
individual in different datasets to prevent linking of the
different datasets
● Pseudonyms generation
– Random (Ex. UUID, GUID)
– Deterministic (Ex. Hashing, Encryption, PCI DSS Tokenization)
Pseudonymization – Example#1 (1/2)
Before Anomymization:
Name Address Phone
Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211
Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754
Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453
John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728
John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562
Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004
Name Address Phone
LAU5B90A 4290 Cheval Circle, Stow, OH 44224 330-805-4211
1YXHL5K0 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754
KOTACI4U 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453
SDM1VHX3 2480 Pendower Lane, Keswick, VA 22947 240-994-6728
UJQXYU27 23 College Street, South Hadley, MA 01075 413-532-5562
9NG6Y5VF W1748 Circle Drive, Sullivan, WI 53178 262-593-5004
After Pseudonymizing the Name Attribute:
Pseudonymization – Example#1 (2/2)
Identity Database
Pseudonym Name
LAU5B90A Jim Demetriou
1YXHL5K0 Gary Furlong
KOTACI4U Maria Herring
SDM1VHX3 John Sacksteder
UJQXYU27 John Mantel
9NG6Y5VF Dan Okray
Pseudonymization – Example#2
Identity
Non-Identifiable Data
Full Data
First Name: Narudom
Last Name: Roongsiriwong
Age: 18
Gender: Male
Nationality: Thai
Blood Type: O
Occupation: Engineer
+
First Name: Narudom
Last Name: Roongsiriwong
Age: 18
Gender: Male
Nationality: Thai
Blood Type: O
Occupation: Engineer
=
Pseudonymization Guideline
● When to use
– Data values need to be unique and no need to keep original attribute
● How to use:
– Replace the respective attribute values with made up values
– The made up values should be unique, and should have no relationship to the original
values
● Tips
– Do not use running numbers as pseudonym
– This should be a key part of your Privacy by Design strategy
– Ensure not to re-use pseudonyms that have already been utilized
– Persistent pseudonyms are usually better for maintaining referential integrity across data
sets
– For reversible pseudonyms, the mapping tables or functions or secret encryption keys
should be securely kept and can only be used by the organization
Data Anonymization
● Anonymization is the process of removing private
information from the data
● Anonymized data cannot be linked to any one individual
account
What You Need to Aware of Anonymization
● Purpose of anonymization and its utility
● Characteristics of each anonymization techniques
● Inferred information after implementation
● Expertise with the subject matter
● Competency in anonymization process and techniques
● Recipients
Anonymization Techniques
Character Masking
Anonymization
Suppression
Attribute Suppression
Generalization
Modification
Swapping or Shuffling
Perturbation
Others
Data Synthetic
Data Aggregation
Recoding
Record Suppression
Attribute Suppression
Name Address Phone
Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211
Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754
Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453
John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728
John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562
Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004
Name Phone
Jim Demetriou 330-805-4211
Gary Furlong 908-359-1754
Maria Herring 315-682-4453
John Sacksteder 240-994-6728
John Mantel 413-532-5562
Dan Okray 262-593-5004
After Suppressing the “Address” Attribute:
Before Anomymization:
The removal of an entire part of data (“column” in database) in a data set.
Attribute Suppression Guideline
● When to use
– That attribute is not required in the anonymized dataset, or when the
attribute cannot otherwise be suitably anonymized with another technique
● How to use:
– Delete (e.g. remove) the attribute(s), not hiding
– If the structure of the data set needs to be maintained, clear the data (and
possibly the header)
● Tips
– This is the strongest type of anonymization technique, because there is no
way of recovering any information from such an attribute
– Less sensitive derived attribute may be create to suppress the original
attribute(s). E.g. “Usage Duration” attribute base on “Check-In” and ‘Check-
Out” date and time attributes
Record Suppression
● The removal of an entire record in a data set
Name Address Phone
3BRYAYN8 Highlands Farm Woodchurch, Ashford, TN26 3RJ 2087726222
3O7T78EZ St Elizabeths, Much Hadham, SG10 6EW 2083435600
3WVYDLCN 10 Downing St, Westminster, London SW1A 2AA 1322341162
6SSC98FX Hermitage Court, Hermitage, Kent, ME16 9NT 2086887666
9CSYE673 Grimsby Road, Cleethorpes, North East Lincolnshire, DN35 7LB 1908262860
9DIHFAQ9 14 High Street, Brompton, Gillingham, ME7 5AE 2089440110
Can anyone guess
who should this person be?
Secure Software Design for Data Privacy
Record Suppression Guideline
● When to use
– The records are so unique and outliers can lead to easy re-
identification
● How to use:
– Delete the entire record, not just row hiding
● Tips
– The removal of a record can impact the data set such as for
statistical analysis
Character Masking
● The change of the characters of a data
value, e.g. by using a constant symbol
(e.g. “*” or “x”)
● Masking is typically partial, i.e. applied
only to some characters in the attribute
Character Masking Guideline
● When to use
– The data value is a string of characters and hiding some part is
sufficient to provide anonymity
● How to use:
– Replace the appropriate characters with a chosen symbol
● Fixed number of characters (e.g. for credit card numbers)
● Variable number of characters (e.g. for email address)
● Tips
– Subject matter knowledge of each data type to be mask is
needed to ensure the right characters are masked
– The data owners are meant to recognize their own data
Recoding
● A deliberate reduction in the precision of data
● Example:
– Converting a person’s age into an age range
– Converting a precise location into a less precise location
Recoding – Example
Before Anomymization:
Name Address Phone
LAU5B90A 4290 Cheval Circle, Stow, OH 44224 330-805-4211
1YXHL5K0 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754
KOTACI4U 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453
SDM1VHX3 2480 Pendower Lane, Keswick, VA 22947 240-994-6728
UJQXYU27 23 College Street, South Hadley, MA 01075 413-532-5562
9NG6Y5VF W1748 Circle Drive, Sullivan, WI 53178 262-593-5004
After Recoding the Address Attribute:
Name Address Phone
LAU5B90A Stow, OH 330-805-4211
1YXHL5K0 Hillsborough, NJ 908-359-1754
KOTACI4U Manlius, NY 315-682-4453
SDM1VHX3 Keswick, VA 240-994-6728
UJQXYU27 South Hadley, MA 413-532-5562
9NG6Y5VF Sullivan, WI 262-593-5004
Recoding Guideline
● When to use
– The data values that can be recoded and still be useful for the
intended purpose
● How to use:
– Design appropriate data categories and rules for translating data.
– Consider suppressing any records that still stand out after the
translation (see record suppression)
● Tips
– Design the data ranges with appropriate sizes
● Too large data range may cause the data too much modification
● Too small data range may be easy to re-identify
Shuffling
● Rearranging data in the data set where the individual
attribute values are still represented in the data set, but
generally, do not correspond to the original records
Shuffling – Example
Before Anomymization:
Name Address Phone
Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211
Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754
Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453
John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728
John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562
Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004
After Shuffling:
Name Address Phone
Jim Demetriou 23 College Street, South Hadley, MA 01075 262-593-5004
Gary Furlong 2480 Pendower Lane, Keswick, VA 22947 315-682-4453
Maria Herring 24 Steeple Drive, Hillsborough, NJ 08844 413-532-5562
John Sacksteder 8096 Wild Lemon Lane, Manlius, NY 13104 908-359-1754
John Mantel W1748 Circle Drive, Sullivan, WI 53178 330-805-4211
Dan Okray 4290 Cheval Circle, Stow, OH 44224 240-994-6728
Shuffling Guideline
● When to use
– Subsequent analysis only needs to look at aggregated data and
there is no need for analysis of relationships between
attributes at the record level
● How to use:
– Identify which attributes to shuffle then shuffle or reassign the
attribute values to any record in the data set
● Tips
– Assess and decide which attributes need to be shuffled
Perturbation
● The value modification from the original data set in order to
be slightly different
● Two main techniques
– Probability distribution: data replacement from the same
distribution sample or from the distribution itself
– Value distortion: modification by multiplicative or additive
noise, or other randomized processes (more effective)
Perturbation – Example
Before Anomymization:
Person Height (cm) Weight (kg) Age (years) Smokes? Disease A? Disease B?
198740 160 50 30 No No No
287402 177 70 36 No No Yes
398747 158 46 20 Yes Yes No
498732 173 75 22 No No No
598772 169 82 44 Yes Yes Yes
Attribute Anonymization Technique
Height (in cm) Base-5 rounding (5 is chosen to be somewhat proportionate to the typical height value
of, e.g. 120 to 190 cm)
Weight (in kg) Base-3 rounding (3 is chosen to be somewhat proportionate to the typical weight value
of, e.g. 40 to 100 kg)
Age (in years) Base-3 rounding (3 is chosen to be somewhat proportionate to the typical age va lue of,
e.g. 10 to 100 years)
(the remaining
attributes)
Nil, due to being non-numerica l and difficult to modify without substantial change in value
Perturbation Rules Using Base-X Rounding:
Perturbation – Example
Before Anomymization:
Person Height (cm) Weight (kg) Age (years) Smokes? Disease A? Disease B?
198740 160 50 30 No No No
287402 177 70 36 No No Yes
398747 158 46 20 Yes Yes No
498732 173 75 22 No No No
598772 169 82 44 Yes Yes Yes
Person Height (cm) Weight (kg) Age (years) Smokes? Disease A? Disease B?
198740 160 51 30 No No No
287402 175 69 36 No No Yes
398747 160 45 18 Yes Yes No
498732 175 75 21 No No No
598772 170 81 42 Yes Yes Yes
After Purturbation:
Perturbation Guideline
● When to use
– Quasi-identifiers (typically numbers and dates) which may
potentially be identifying when combined with other data
sources, and slight changes in value are acceptable.
– Should not be used where data accuracy is important
● How to use:
– Depends on the exact data perturbation technique used
Other Techniques
● Data Synthetic
● Data Aggregation
Data Synthetic
● Description:
– Synthetic datasets separated from the original data, instead of
modifying the original dataset
● When to use:
– A large amount of data is required for system testing, but the
actual data cannot be used and yet the data should be “realistic”
like format, relationship among attributes, etc.
– Actual data not covered all test cases
● How to use:
– study the patterns from the original dataset (i.e. the actual data)
and apply the patterns when creating the “anonymised” dataset
(i.e. the synthetic data)
Data Aggregation
● Description:
– Converting a dataset from a list of records to summarized
values.
● When to use:
– Individual records are not required and aggregated data is
sufficient
● How to use:
– Using totals or averages, etc
– Discussion with the data recipient about the expected utility
Conclusion: Select the Right Anonymization
● Purpose of anonymization and its utility
● Characteristics of each anonymization techniques
● Inferred information after implementation
● Expertise with the subject matter
● Competency in anonymization process and techniques
● Recipients
Example Design:
E-Commerce on the Cloud
Pseudonymization
Client
Transaction DB
Personal Indentifiable
Information Service
Web API
PII DB
E-Commerce
Front-End Web Server
Example Design:
Personalized Marketing
Pseudonymization
Application
Data Warehouse
Data for Analytic
w/o Direct Identifier
(and/or Quasi-Identifier)
Business Intelligence
Tool
Personalized Info
Marketing
Campaigns
Direct Identifier
(and/or
Quasi-Identifier)
Personalized
Marketing Campaigns
Secure Software Design for Data Privacy
Example Design:
PCI-DSS 3.2
Requirement 3: Protect stored cardholder data
Protection methods such as
● encryption,
● truncation,
● masking,
● and hashing
are critical components of cardholder data protection. If an
intruder circumvents other security controls and gains access to
encrypted data, without the proper cryptographic keys, the data
is unreadable and unusable to that person.
Pseudonymization
Recoding
Pseudonymization
Character Masking
Secure Software Design for Data Privacy

More Related Content

What's hot (20)

PPTX
IOT Security FUN-damental
Satria Ady Pradana
 
PPTX
ION-E Defense In Depth Presentation for The Institiute of Internal Auditors
mdagrossa
 
PDF
Why Zero Trust Yields Maximum Security
Priyanka Aash
 
PPTX
IoT Security: Debunking the "We Aren't THAT Connected" Myth
Security Innovation
 
PPTX
The Internet of Everything is Here
Lancope, Inc.
 
PPTX
The Making of a simple Cyber Threat Intelligence Gathering System
Niran Seriki, CCISO, CISM
 
PPTX
IoT Security, Threats and Challenges By V.P.Prabhakaran
Koenig Solutions Ltd.
 
PDF
Cyber Security 4.0 conference 30 November 2016
InfinIT - Innovationsnetværket for it
 
PDF
From Business Architecture to Security Architecture
Priyanka Aash
 
PDF
Cyber intelligence for corporate security
G3 intelligence Ltd
 
PPTX
Cyber threat intelligence: maturity and metrics
Mark Arena
 
PPT
Mark Arena - Cyber Threat Intelligence #uisgcon9
UISGCON
 
PDF
Threat Intelligence Workshop
Priyanka Aash
 
PDF
Cyber Security 4.0 conference 30 November 2016
InfinIT - Innovationsnetværket for it
 
PDF
Enterprise 360 degree risk management
Infosys
 
PDF
Threat Intelligence Is Like Three Day Potty Training
Priyanka Aash
 
PPTX
Actionable Threat Intelligence
OWASP Delhi
 
DOCX
What operational technology cyber security is?
sohailAhmad304
 
PPTX
Cyber Threat Intelligence
Prachi Mishra
 
PPTX
Cyber threat Intelligence and Incident Response by:-Sandeep Singh
OWASP Delhi
 
IOT Security FUN-damental
Satria Ady Pradana
 
ION-E Defense In Depth Presentation for The Institiute of Internal Auditors
mdagrossa
 
Why Zero Trust Yields Maximum Security
Priyanka Aash
 
IoT Security: Debunking the "We Aren't THAT Connected" Myth
Security Innovation
 
The Internet of Everything is Here
Lancope, Inc.
 
The Making of a simple Cyber Threat Intelligence Gathering System
Niran Seriki, CCISO, CISM
 
IoT Security, Threats and Challenges By V.P.Prabhakaran
Koenig Solutions Ltd.
 
Cyber Security 4.0 conference 30 November 2016
InfinIT - Innovationsnetværket for it
 
From Business Architecture to Security Architecture
Priyanka Aash
 
Cyber intelligence for corporate security
G3 intelligence Ltd
 
Cyber threat intelligence: maturity and metrics
Mark Arena
 
Mark Arena - Cyber Threat Intelligence #uisgcon9
UISGCON
 
Threat Intelligence Workshop
Priyanka Aash
 
Cyber Security 4.0 conference 30 November 2016
InfinIT - Innovationsnetværket for it
 
Enterprise 360 degree risk management
Infosys
 
Threat Intelligence Is Like Three Day Potty Training
Priyanka Aash
 
Actionable Threat Intelligence
OWASP Delhi
 
What operational technology cyber security is?
sohailAhmad304
 
Cyber Threat Intelligence
Prachi Mishra
 
Cyber threat Intelligence and Incident Response by:-Sandeep Singh
OWASP Delhi
 

Similar to Secure Software Design for Data Privacy (20)

PPTX
Privacy by design
blogzilla
 
PDF
TLabs - deutsche telekom
Christina Azzam
 
PPTX
Privacy Secrets Your Systems May Be Telling
Rebecca Leitch
 
PPTX
Privacy Secrets Your Systems May Be Telling
Security Innovation
 
PPTX
week 7.pptx
StephenGwadi
 
PPTX
Privacy and data protection primer - City of Portland
Hector Dominguez
 
PPTX
Exploring Data Privacy - SQL Saturday Louisville 2011
John Magnabosco
 
PPTX
Privacy by Design - taking in account the state of the art
James Mulhern
 
PDF
HCMUT IMP Computer Science 20 - E-Government from the view of Privacy
Duc Lai Trung Minh
 
PPTX
PRIVACY_SPI-Subject_3rdyear-BSITWeb.pptx
inaresangelique
 
PPT
Personal privacy and computer technologies
sidra batool
 
PPTX
UN Global Pulse Privacy Framing
Micah Altman
 
PPT
Privacy by Design Seminar - Jan 22, 2015
Dr. Ann Cavoukian
 
PPTX
Privacy is at the heart of data protection
Jisc
 
PDF
[AIIM18] GDPR: whose job is it now? - Paul Lanois
AIIM International
 
PDF
DN18 | Privacy by Design for Blockchain | Silvan Jongerius | TechGDPR
Dataconomy Media
 
PDF
When Privacy Scales - Intelligent Product Design under GDPR
Amanda Casari
 
PPTX
Board Privacy - Level 1 - Module 1.pptx
trevor501353
 
PPT
An Introduction to Privacy Policy Lecture Note One.ppt
Muhammad54342
 
PPTX
Training privacy by design
Tommy Vandepitte
 
Privacy by design
blogzilla
 
TLabs - deutsche telekom
Christina Azzam
 
Privacy Secrets Your Systems May Be Telling
Rebecca Leitch
 
Privacy Secrets Your Systems May Be Telling
Security Innovation
 
week 7.pptx
StephenGwadi
 
Privacy and data protection primer - City of Portland
Hector Dominguez
 
Exploring Data Privacy - SQL Saturday Louisville 2011
John Magnabosco
 
Privacy by Design - taking in account the state of the art
James Mulhern
 
HCMUT IMP Computer Science 20 - E-Government from the view of Privacy
Duc Lai Trung Minh
 
PRIVACY_SPI-Subject_3rdyear-BSITWeb.pptx
inaresangelique
 
Personal privacy and computer technologies
sidra batool
 
UN Global Pulse Privacy Framing
Micah Altman
 
Privacy by Design Seminar - Jan 22, 2015
Dr. Ann Cavoukian
 
Privacy is at the heart of data protection
Jisc
 
[AIIM18] GDPR: whose job is it now? - Paul Lanois
AIIM International
 
DN18 | Privacy by Design for Blockchain | Silvan Jongerius | TechGDPR
Dataconomy Media
 
When Privacy Scales - Intelligent Product Design under GDPR
Amanda Casari
 
Board Privacy - Level 1 - Module 1.pptx
trevor501353
 
An Introduction to Privacy Policy Lecture Note One.ppt
Muhammad54342
 
Training privacy by design
Tommy Vandepitte
 
Ad

More from Narudom Roongsiriwong, CISSP (20)

PDF
Biometric Authentication.pdf
Narudom Roongsiriwong, CISSP
 
PDF
Security Shift Leftmost - Secure Architecture.pdf
Narudom Roongsiriwong, CISSP
 
PDF
Security Patterns for Software Development
Narudom Roongsiriwong, CISSP
 
PDF
Blockchain and Cryptocurrency for Dummies
Narudom Roongsiriwong, CISSP
 
PDF
Secure Your Encryption with HSM
Narudom Roongsiriwong, CISSP
 
PDF
Application Security Verification Standard Project
Narudom Roongsiriwong, CISSP
 
PDF
Coding Security: Code Mania 101
Narudom Roongsiriwong, CISSP
 
PDF
Top 10 Bad Coding Practices Lead to Security Problems
Narudom Roongsiriwong, CISSP
 
PDF
OWASP Top 10 Proactive Control 2016 (C5-C10)
Narudom Roongsiriwong, CISSP
 
PDF
Securing the Internet from Cyber Criminals
Narudom Roongsiriwong, CISSP
 
PDF
Secure Code Review 101
Narudom Roongsiriwong, CISSP
 
PDF
Secure Software Development Adoption Strategy
Narudom Roongsiriwong, CISSP
 
PDF
Secure PHP Coding
Narudom Roongsiriwong, CISSP
 
PDF
Application Security: Last Line of Defense
Narudom Roongsiriwong, CISSP
 
PDF
AnyID and Privacy
Narudom Roongsiriwong, CISSP
 
PDF
OWASP Top 10 A4 – Insecure Direct Object Reference
Narudom Roongsiriwong, CISSP
 
PPTX
Payment Card System Overview
Narudom Roongsiriwong, CISSP
 
PDF
AnyID: Security Point of View
Narudom Roongsiriwong, CISSP
 
Biometric Authentication.pdf
Narudom Roongsiriwong, CISSP
 
Security Shift Leftmost - Secure Architecture.pdf
Narudom Roongsiriwong, CISSP
 
Security Patterns for Software Development
Narudom Roongsiriwong, CISSP
 
Blockchain and Cryptocurrency for Dummies
Narudom Roongsiriwong, CISSP
 
Secure Your Encryption with HSM
Narudom Roongsiriwong, CISSP
 
Application Security Verification Standard Project
Narudom Roongsiriwong, CISSP
 
Coding Security: Code Mania 101
Narudom Roongsiriwong, CISSP
 
Top 10 Bad Coding Practices Lead to Security Problems
Narudom Roongsiriwong, CISSP
 
OWASP Top 10 Proactive Control 2016 (C5-C10)
Narudom Roongsiriwong, CISSP
 
Securing the Internet from Cyber Criminals
Narudom Roongsiriwong, CISSP
 
Secure Code Review 101
Narudom Roongsiriwong, CISSP
 
Secure Software Development Adoption Strategy
Narudom Roongsiriwong, CISSP
 
Secure PHP Coding
Narudom Roongsiriwong, CISSP
 
Application Security: Last Line of Defense
Narudom Roongsiriwong, CISSP
 
AnyID and Privacy
Narudom Roongsiriwong, CISSP
 
OWASP Top 10 A4 – Insecure Direct Object Reference
Narudom Roongsiriwong, CISSP
 
Payment Card System Overview
Narudom Roongsiriwong, CISSP
 
AnyID: Security Point of View
Narudom Roongsiriwong, CISSP
 
Ad

Recently uploaded (20)

PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PPTX
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
PPTX
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
PPTX
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
PPTX
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
PDF
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
Using AI/ML for Space Biology Research
VICTOR MAESTRE RAMIREZ
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - Running a Successful API Program: Best Practi...
apidays
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
Research Methodology Overview Introduction
ayeshagul29594
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
apidays Munich 2025 - Building an AWS Serverless Application with Terraform, ...
apidays
 
apidays Helsinki & North 2025 - Agentic AI: A Friend or Foe?, Merja Kajava (A...
apidays
 
apidays Singapore 2025 - Designing for Change, Julie Schiller (Google)
apidays
 
Development and validation of the Japanese version of the Organizational Matt...
Yoga Tokuyoshi
 

Secure Software Design for Data Privacy

  • 1. Secure Software Design For Data Privacy Narudom Roongsiriwong, CISSP OWASP Thailand Chapter Meeting, July 25, 2019
  • 2. WhoAmI ● Lazy Blogger – Japan, Security, FOSS, Politics, Christian – http://narudomr.blogspot.com ● Information Security since 1995 ● Web Application Development since 1998 ● SVP, Head of IT Security, Kiatnakin Bank PLC (KKP) ● Committee Member, Thailand Banking Sector CERT (TB-CERT) ● Consultant, OWASP Thailand Chapter ● Committee Member, Cloud Security Alliance (CSA), Thailand Chapter ● Committee Member, National Digital ID Project, Technical Team ● Contact: [email protected]
  • 3. How can we survive from dataHow can we survive from data protection regulation’s penaltiesprotection regulation’s penalties if your systems are compromisedif your systems are compromised or data are breached?or data are breached?
  • 4. Privacy By Design The 7 Foundational Principles ● Proactive not Reactive; Preventative not Remedial ● Privacy as the Default ● Privacy Embedded into Design ● Full Functionality – Positive-Sum, not Zero-Sum ● End-to-End Security – Lifecycle Protection ● Visibility and Transparency ● Respect for User Privacy Source: Privacy By Design – The 7 Foundational Principles, Ann Cavoukian, Ph.D. , Information & Privacy Commissioner, Ontario, Canada
  • 5. Data Privacy Ground Rules ● If you don’t need it, don’t collect it. ● If you need to collect it for processing only, collect it only after you have informed the user that you are collecting their information and they have consented, but don’t store it ● If you have the need to collect it for processing and storage, then collect it, with user consent, and store it only for an explicit retention period that is compliant with organizational policy and/or regulatory requirements ● If you have the need to collect it and store it, then don’t archive it, if the data has outlived its usefulness and there is no retention requirement.
  • 6. Fundamental Security Concepts Design Core Confidentiality Integrity Availability Authentication Authorization Accountability Authentication Authorization AccountabilityNeed to Know Least Privilege Separation of Duties Defense in Depth Fail Safe / Fail Secure Economy of Mechanisms Complete Mediation Open Design Least Common Mechanisms Psychological Acceptability Weakest Link Leveraging Existing Components
  • 7. Security in Privacy Design Design Core Confidentiality IntegrityIntegrity Availability Authentication Authorization Accountability Authentication Authorization AccountabilityNeed to Know Least Privilege Separation of Duties Defense in Depth Fail Safe / Fail Secure Economy of Mechanisms Complete Mediation Open Design Least Common Mechanisms Psychological Acceptability Weakest Link Leveraging Existing Components
  • 8. Privacy vs Integrity ● In most of data protection acts (such as GDPR) said that “organizations must take necessary and reasonable steps to ensure the accuracy of personal data collected from data subjects” ● Some privacy design approaches using referential integrity across datasets ● But some privacy design approaches using data distortion techniques ● Conclusion – Data as “Source of Truth” → Integrity is a must – Data in use → Integrity depends on utility
  • 9. Confidentiality Controls for Data Privacy Types of Confidentiality Controls Secret Writing Confidentiality Control Masking Overt Covert Encryption Hashing Steganography Digital Watermarking
  • 10. Real World Cryptography Implementation ● Cryptographic algorithms and parameters – Symmetric: 3DES, AES / Asymmetric: RSA, ECC – Key size – Padding – Symmetric algorithm specific parameters ● Mode: ECB, CBC, CFB, etc. ● Initialization Vector (IV) / Starting Variable (SV) / Nonce ● Key controls and key management ● Key change/exchange procedures ● Cryptographic toolkits ● Random number/seed generators
  • 11. ● Condenses arbitrary message to fixed size – h = H(M) ● Usually assume hash function is public ● Hash used to detect changes to message ● Well-know hash functions: MD5, SHA-1, SHA-2 Hashing
  • 12. ● Role-based access control ● Separation of duties ● Least privilege Authorization Design for Data Privacy
  • 13. ● Log data should include the who, what, where, and when aspects of software operations ● Design decisions to retain, archive, and dispose logs should not contradict external regulatory or internal retention requirements. ● Sensitive data (direct and/or indirect identifiers) should never be logged in plaintext form ● Automatically log the authenticated principal and system timestamp Accountability Design for Data Privacy: Audit/Logging
  • 14. Design Principle for Privacy: Need To Know ● Need to know limits information access to the information that an individual requires to carry out his or her job responsibilities (business needs) ● When access to covered data is broader than what is required for legitimate purposes, there is unnecessary risk of an attacker gaining access to the data
  • 15. “Need to Know” Implementation Name Address Phone Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211 Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754 Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453 John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728 John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562 Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004 ● Which attributes these groups of people need or not need to know? – Outbound telesales – Marketing researchers on geographic analysis – Direct mail marketers
  • 16. Design Principle for Privacy: Least Privilege ● When designing access controls, each user or system component should be allocated the minimum privilege required to perform an action for the minimum amount of time ● To limit damage that can be caused by an accident, error, or unauthorized act ● Benefits of the principle include: – Better system stability – Better system security – Ease of deployment
  • 17. 24 “Least Privilege” Implementation ● Ensure that only a minimal set of users have root/administrator/sa/sysadmin access ● Unless a permission is granted explicitly, the user or process should not be able to access the protected resource. ● For data protection regulations, only those who need access to certain data are allowed to have it, otherwise, it should be totally secured from other internal people within the business
  • 18. ● The concept of having more than one person required to complete a task. ● Primary objective the prevention of fraud and errors ● The requirement of two signatures on a cheque. Design Principle for Privacy: Separation of Duties
  • 19. ● Identification of a requirement (or change request); e.g. a business person ● Authorization and approval; e.g. an IT governance board or manager ● Design and development; e.g. a developer ● Review, inspection and approval; e.g. another developer or architect. ● Implementation in production; typically a software change or system administrator. “Separation of Duties” Implementation
  • 21. Terminology ● Data Attribute: – Data field, data column or variable, an information that can be found across the data records in a data set ● Dataset: – A set of data records, conceptually similar to a table in a conventional database or spreadsheet, having records (rows) and attributes (columns) ● Direct Identifier: – A data attribute that on its own identifies an individual (e.g. fingerprint) or has been assigned to an individual (e.g. Citizen ID) ● Indirect identifier or Quasi-Identifiers: – A data attribute that, by itself/on its own, does not identify an individual, but may identify an individual when combined with other information ● De-identification: – The process used to prevent someone's personal identity from being revealed ● Re-identification: – Identifying a person from an anonymized dataset
  • 22. Link to A Visual Guide to Practical Data De-Identification https://fpf.org/wp-content/uploads/2016/04/FPF_Visual-Guide-to-Practical-Data-DeID.pdf
  • 23. Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
  • 24. Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
  • 25. Practical Data De-Identification (1/2) Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
  • 26. Practical Data De-Identification (2/2) Source: A Visual Guide to Practical Data De-Identification, Future of Privacy Forum
  • 27. Pseudonymization (1/2) ● Decoupling identifiable data from the dataset, usually by means of identifier key references ● Pseudonym (aka Token) may represent one or more attributes ● Pseudonyms can be – Reversible (by the owner(s) of the original data), where the original values are securely kept but can be retrieved and linked back to the pseudonyms – Irreversible, where the original values are properly disposed and the pseudonymization was done in a non-repeatable fashion
  • 28. Pseudonymization (2/2) ● Pseudonyms persistence – Persistent – Same pseudonym values represent the same individual across different datasets – Non-persistent – Different pseudonyms represent the same individual in different datasets to prevent linking of the different datasets ● Pseudonyms generation – Random (Ex. UUID, GUID) – Deterministic (Ex. Hashing, Encryption, PCI DSS Tokenization)
  • 29. Pseudonymization – Example#1 (1/2) Before Anomymization: Name Address Phone Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211 Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754 Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453 John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728 John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562 Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004 Name Address Phone LAU5B90A 4290 Cheval Circle, Stow, OH 44224 330-805-4211 1YXHL5K0 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754 KOTACI4U 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453 SDM1VHX3 2480 Pendower Lane, Keswick, VA 22947 240-994-6728 UJQXYU27 23 College Street, South Hadley, MA 01075 413-532-5562 9NG6Y5VF W1748 Circle Drive, Sullivan, WI 53178 262-593-5004 After Pseudonymizing the Name Attribute:
  • 30. Pseudonymization – Example#1 (2/2) Identity Database Pseudonym Name LAU5B90A Jim Demetriou 1YXHL5K0 Gary Furlong KOTACI4U Maria Herring SDM1VHX3 John Sacksteder UJQXYU27 John Mantel 9NG6Y5VF Dan Okray
  • 31. Pseudonymization – Example#2 Identity Non-Identifiable Data Full Data First Name: Narudom Last Name: Roongsiriwong Age: 18 Gender: Male Nationality: Thai Blood Type: O Occupation: Engineer + First Name: Narudom Last Name: Roongsiriwong Age: 18 Gender: Male Nationality: Thai Blood Type: O Occupation: Engineer =
  • 32. Pseudonymization Guideline ● When to use – Data values need to be unique and no need to keep original attribute ● How to use: – Replace the respective attribute values with made up values – The made up values should be unique, and should have no relationship to the original values ● Tips – Do not use running numbers as pseudonym – This should be a key part of your Privacy by Design strategy – Ensure not to re-use pseudonyms that have already been utilized – Persistent pseudonyms are usually better for maintaining referential integrity across data sets – For reversible pseudonyms, the mapping tables or functions or secret encryption keys should be securely kept and can only be used by the organization
  • 33. Data Anonymization ● Anonymization is the process of removing private information from the data ● Anonymized data cannot be linked to any one individual account
  • 34. What You Need to Aware of Anonymization ● Purpose of anonymization and its utility ● Characteristics of each anonymization techniques ● Inferred information after implementation ● Expertise with the subject matter ● Competency in anonymization process and techniques ● Recipients
  • 35. Anonymization Techniques Character Masking Anonymization Suppression Attribute Suppression Generalization Modification Swapping or Shuffling Perturbation Others Data Synthetic Data Aggregation Recoding Record Suppression
  • 36. Attribute Suppression Name Address Phone Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211 Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754 Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453 John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728 John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562 Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004 Name Phone Jim Demetriou 330-805-4211 Gary Furlong 908-359-1754 Maria Herring 315-682-4453 John Sacksteder 240-994-6728 John Mantel 413-532-5562 Dan Okray 262-593-5004 After Suppressing the “Address” Attribute: Before Anomymization: The removal of an entire part of data (“column” in database) in a data set.
  • 37. Attribute Suppression Guideline ● When to use – That attribute is not required in the anonymized dataset, or when the attribute cannot otherwise be suitably anonymized with another technique ● How to use: – Delete (e.g. remove) the attribute(s), not hiding – If the structure of the data set needs to be maintained, clear the data (and possibly the header) ● Tips – This is the strongest type of anonymization technique, because there is no way of recovering any information from such an attribute – Less sensitive derived attribute may be create to suppress the original attribute(s). E.g. “Usage Duration” attribute base on “Check-In” and ‘Check- Out” date and time attributes
  • 38. Record Suppression ● The removal of an entire record in a data set Name Address Phone 3BRYAYN8 Highlands Farm Woodchurch, Ashford, TN26 3RJ 2087726222 3O7T78EZ St Elizabeths, Much Hadham, SG10 6EW 2083435600 3WVYDLCN 10 Downing St, Westminster, London SW1A 2AA 1322341162 6SSC98FX Hermitage Court, Hermitage, Kent, ME16 9NT 2086887666 9CSYE673 Grimsby Road, Cleethorpes, North East Lincolnshire, DN35 7LB 1908262860 9DIHFAQ9 14 High Street, Brompton, Gillingham, ME7 5AE 2089440110 Can anyone guess who should this person be?
  • 40. Record Suppression Guideline ● When to use – The records are so unique and outliers can lead to easy re- identification ● How to use: – Delete the entire record, not just row hiding ● Tips – The removal of a record can impact the data set such as for statistical analysis
  • 41. Character Masking ● The change of the characters of a data value, e.g. by using a constant symbol (e.g. “*” or “x”) ● Masking is typically partial, i.e. applied only to some characters in the attribute
  • 42. Character Masking Guideline ● When to use – The data value is a string of characters and hiding some part is sufficient to provide anonymity ● How to use: – Replace the appropriate characters with a chosen symbol ● Fixed number of characters (e.g. for credit card numbers) ● Variable number of characters (e.g. for email address) ● Tips – Subject matter knowledge of each data type to be mask is needed to ensure the right characters are masked – The data owners are meant to recognize their own data
  • 43. Recoding ● A deliberate reduction in the precision of data ● Example: – Converting a person’s age into an age range – Converting a precise location into a less precise location
  • 44. Recoding – Example Before Anomymization: Name Address Phone LAU5B90A 4290 Cheval Circle, Stow, OH 44224 330-805-4211 1YXHL5K0 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754 KOTACI4U 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453 SDM1VHX3 2480 Pendower Lane, Keswick, VA 22947 240-994-6728 UJQXYU27 23 College Street, South Hadley, MA 01075 413-532-5562 9NG6Y5VF W1748 Circle Drive, Sullivan, WI 53178 262-593-5004 After Recoding the Address Attribute: Name Address Phone LAU5B90A Stow, OH 330-805-4211 1YXHL5K0 Hillsborough, NJ 908-359-1754 KOTACI4U Manlius, NY 315-682-4453 SDM1VHX3 Keswick, VA 240-994-6728 UJQXYU27 South Hadley, MA 413-532-5562 9NG6Y5VF Sullivan, WI 262-593-5004
  • 45. Recoding Guideline ● When to use – The data values that can be recoded and still be useful for the intended purpose ● How to use: – Design appropriate data categories and rules for translating data. – Consider suppressing any records that still stand out after the translation (see record suppression) ● Tips – Design the data ranges with appropriate sizes ● Too large data range may cause the data too much modification ● Too small data range may be easy to re-identify
  • 46. Shuffling ● Rearranging data in the data set where the individual attribute values are still represented in the data set, but generally, do not correspond to the original records
  • 47. Shuffling – Example Before Anomymization: Name Address Phone Jim Demetriou 4290 Cheval Circle, Stow, OH 44224 330-805-4211 Gary Furlong 24 Steeple Drive, Hillsborough, NJ 08844 908-359-1754 Maria Herring 8096 Wild Lemon Lane, Manlius, NY 13104 315-682-4453 John Sacksteder 2480 Pendower Lane, Keswick, VA 22947 240-994-6728 John Mantel 23 College Street, South Hadley, MA 01075 413-532-5562 Dan Okray W1748 Circle Drive, Sullivan, WI 53178 262-593-5004 After Shuffling: Name Address Phone Jim Demetriou 23 College Street, South Hadley, MA 01075 262-593-5004 Gary Furlong 2480 Pendower Lane, Keswick, VA 22947 315-682-4453 Maria Herring 24 Steeple Drive, Hillsborough, NJ 08844 413-532-5562 John Sacksteder 8096 Wild Lemon Lane, Manlius, NY 13104 908-359-1754 John Mantel W1748 Circle Drive, Sullivan, WI 53178 330-805-4211 Dan Okray 4290 Cheval Circle, Stow, OH 44224 240-994-6728
  • 48. Shuffling Guideline ● When to use – Subsequent analysis only needs to look at aggregated data and there is no need for analysis of relationships between attributes at the record level ● How to use: – Identify which attributes to shuffle then shuffle or reassign the attribute values to any record in the data set ● Tips – Assess and decide which attributes need to be shuffled
  • 49. Perturbation ● The value modification from the original data set in order to be slightly different ● Two main techniques – Probability distribution: data replacement from the same distribution sample or from the distribution itself – Value distortion: modification by multiplicative or additive noise, or other randomized processes (more effective)
  • 50. Perturbation – Example Before Anomymization: Person Height (cm) Weight (kg) Age (years) Smokes? Disease A? Disease B? 198740 160 50 30 No No No 287402 177 70 36 No No Yes 398747 158 46 20 Yes Yes No 498732 173 75 22 No No No 598772 169 82 44 Yes Yes Yes Attribute Anonymization Technique Height (in cm) Base-5 rounding (5 is chosen to be somewhat proportionate to the typical height value of, e.g. 120 to 190 cm) Weight (in kg) Base-3 rounding (3 is chosen to be somewhat proportionate to the typical weight value of, e.g. 40 to 100 kg) Age (in years) Base-3 rounding (3 is chosen to be somewhat proportionate to the typical age va lue of, e.g. 10 to 100 years) (the remaining attributes) Nil, due to being non-numerica l and difficult to modify without substantial change in value Perturbation Rules Using Base-X Rounding:
  • 51. Perturbation – Example Before Anomymization: Person Height (cm) Weight (kg) Age (years) Smokes? Disease A? Disease B? 198740 160 50 30 No No No 287402 177 70 36 No No Yes 398747 158 46 20 Yes Yes No 498732 173 75 22 No No No 598772 169 82 44 Yes Yes Yes Person Height (cm) Weight (kg) Age (years) Smokes? Disease A? Disease B? 198740 160 51 30 No No No 287402 175 69 36 No No Yes 398747 160 45 18 Yes Yes No 498732 175 75 21 No No No 598772 170 81 42 Yes Yes Yes After Purturbation:
  • 52. Perturbation Guideline ● When to use – Quasi-identifiers (typically numbers and dates) which may potentially be identifying when combined with other data sources, and slight changes in value are acceptable. – Should not be used where data accuracy is important ● How to use: – Depends on the exact data perturbation technique used
  • 53. Other Techniques ● Data Synthetic ● Data Aggregation
  • 54. Data Synthetic ● Description: – Synthetic datasets separated from the original data, instead of modifying the original dataset ● When to use: – A large amount of data is required for system testing, but the actual data cannot be used and yet the data should be “realistic” like format, relationship among attributes, etc. – Actual data not covered all test cases ● How to use: – study the patterns from the original dataset (i.e. the actual data) and apply the patterns when creating the “anonymised” dataset (i.e. the synthetic data)
  • 55. Data Aggregation ● Description: – Converting a dataset from a list of records to summarized values. ● When to use: – Individual records are not required and aggregated data is sufficient ● How to use: – Using totals or averages, etc – Discussion with the data recipient about the expected utility
  • 56. Conclusion: Select the Right Anonymization ● Purpose of anonymization and its utility ● Characteristics of each anonymization techniques ● Inferred information after implementation ● Expertise with the subject matter ● Competency in anonymization process and techniques ● Recipients
  • 57. Example Design: E-Commerce on the Cloud Pseudonymization Client Transaction DB Personal Indentifiable Information Service Web API PII DB E-Commerce Front-End Web Server
  • 58. Example Design: Personalized Marketing Pseudonymization Application Data Warehouse Data for Analytic w/o Direct Identifier (and/or Quasi-Identifier) Business Intelligence Tool Personalized Info Marketing Campaigns Direct Identifier (and/or Quasi-Identifier) Personalized Marketing Campaigns
  • 60. Example Design: PCI-DSS 3.2 Requirement 3: Protect stored cardholder data Protection methods such as ● encryption, ● truncation, ● masking, ● and hashing are critical components of cardholder data protection. If an intruder circumvents other security controls and gains access to encrypted data, without the proper cryptographic keys, the data is unreadable and unusable to that person. Pseudonymization Recoding Pseudonymization Character Masking