SlideShare a Scribd company logo
SESSION ID:
#RSAC
Anchit AroraApplying Auto-Data
Classification Techniques for
Large Data Sets
Program Manager
InfoSec, Cisco
PDAC-W02
#RSAC
• Complex work models: always accessible,
remote & mobile workers
• Definition of perimeter: Cloud, Customer &
partners
• Users choose devices (BYOD)
The proliferation of data and increase in
complexity
1995 2006 2014
9 to 5 in the
office
Emergence of Internet &
mobility
The Human
Network
2020
The Internet of
Everything
BYOD &
Externalization
Pace
• Enterprise data collection to increase 40 to 60 %
per year*
• Experts predict the amount of data generated
annually to increase 4300% by 2020 *
Complexity
• Big data architectures, low storage cost,
Increase of data retention
• 80% of data generated today is
unstructured
• Data generated worldwide will reach 44
zettabytes by 2020*
Volume
* Numbers and statistics from Gartner, Gigaom Research, CSC, Seagate
#RSAC
Auto-classification: The why and what
3
Desired business outcome: At Cisco we want to provide
additional sensitivity context to structured and unstructured
data, to be able to apply controls more effectively
Scope: Our aim is to have an automated classification capability
for all structured data systems, and provide capability to better
govern/control generation of unstructured data which is created
as a result of export from structured data systems using
label/field association to each record set
#RSAC
Use-case: From structured to unstructured
4
SoR
SoE
Structured data system (SoR)
Classification Engine
algorithms and
dictionaries
IndexerAPI
Classification
Index all existing and newly written data is indexed and
classified based on algorithm and dictionary defined for the SoR
Provide classification information to the user –
or access policy based on class to the application
UI
Export (E) & tag
#RSAC
5
Box.com is an external cloud platform used by Cisco for collaboration and
storage of data
Security questions to ask:
What is this data?
What’s the source of the data?
Who owns this data?
What’s the sensitivity of the data?
Is all data equally sensitive (this is the essence for optimal security)?
What’s the level of security required?
An unstructured data use-case: box.com
#RSAC
Should we ask the user to govern security?
6
Can we expect the user to make the right security decision with all this
complexity involved in decision making?
The user needs to be very knowledgable to make the right decision
The answer is No: But however many systems are designed to have
users govern security -
Recognize data categories in systems with unstructured data
Classify data in any data system
Set data securitypolicy
Securely export data out of the system
Making the shift from user governed to data owner governed
#RSAC
Data Management
Policy
Enforcement
Governance of Data by Data Owner
Data Protection capabilities
Data Intelligence & monitoring
capabilities
Governance of
Data by End User
How to make the shift to a data owner model?
Classify
Sensitivity
Data
Taxonomy
Recognize
Data Type
Tag
Across various data types: Engineering, Customer, Finance, HR
#RSAC
Conceptual approach
8
Discover Recognize Classify
Find data
objects
Identified
Data
Sensitivity1
Large
unstructured
generic data
repositories
Classification mostly unknown
Data
Sensitivity2
Data
Sensitivity3
Data
Sensitivity4
Structured data
systems (SoR)
#RSAC
Structured data case study: Engineering & Customer data
protection in context of bug Information
#RSAC
A case study: Bug information
10
Millions of bugs + product bugs, 3 approaches available to protect:
1. Treat all bugs equally, and apply ‘very strict’ controls on all bugs
• In heterogenic data models , most data is ‘Over’-protected
• Limits business ability and User experience
2. Treat all bugs equally, and apply ‘loose’ controls on all bugs
• Results in ‘Under’-protected data
3. Apply the right amount of protection on a bug, based on sensitivity
• Balanced security and cost applied – just the right amount of security!
#RSAC
Setting the foundation for auto-class
11
Category:
Is a bug
Product
development
lifecycle:
Sustaining
Severity:
Sev1,
Status:
Open
Found by
Customer
Customer
network
topology
Belongs to
hardware
A Sensitive software bug
in CDETSInventory Process
Identify
• Identify the most
sensitive IP and IP’s
appropriate
owner(s)
Define
• Define data use and
access rules for the
most sensitive IP
Translate
• Translate rules into
IT enforceable
policies
The inventory process engages the business to build out the data
taxonomy and a model of the sensitivity
#RSAC
The proof is in the numbers!!
12
Parameter Value
Average time to classify a single bug 5 minutes
Total number of bugs 7 Million
Time to classify 35 Million minutes
Cost/min of SME analyst $ 0.83/Min
Cost to classify $ 29 Million
Additional costs to consider for manual:
Training: For consistent user behavior
Change to business: Cleaning legacy
Change to applications and Infrastructure
Parameter Value
Average time to classify a single bug* 0.002 minutes
Total number of bugs 7 Million
Time to classify 14,000 Minutes
Estimated cost for Infrastructure and resources
required to classify
$ 0.25 Million
Auto-Classification approach
Manual approach
Accuracy Results
83%
#RSAC
The most sensitive data is just a small portion
13
< 1% Restricted
2.5% Highly Confidential
#RSAC
How did we execute the methodology?
14
AS-IS: New SoR integration for Auto-Class
# Phase Scope
1 Engage Identify SoR and engage stakeholders to communicate expectations, R&R, Identify data workflow (user stories)
and data categories. Plan and establish scope and planning of the SoR integration
2 Attribute Analysis of data, database fields, record and build a data sensitivity model / algorithm to be able to classify the
data
3 Develop Development of attribution and scoring algorithm into the classification engine and perform indexing of
datasets
4 Validate Validation and tuning of classification results of the classification engine to ensure accuracy of the output
5 Integrate Integration of classification data with the source system
6 Protect Planning and implementation of protective measures in the source system for sensitive data classes
Engage Attribute Develop Validate Integrate Protect
A 6 step workflow, for structured data (SoR)
#RSAC
Building an attribution model
15
Attribute A, Attribute B, Attribute C …………………….
Attribute L, Attribute M, Attribute N……………
Attribute X, Attribute Y, Attribute Z……
All available source system
built-in attributes
Selected attributes and values
Extracted entities from free-text fields
and attachments:
Attribution model
Weights
Scoring
equation
Values
and
scores
Classification
rules
Data
freshness
Contextual
information
Extracted
entities
#RSAC
How to create a similar solution for your
organization?
16
Engage
•System
Identification
•Stakeholder
identification
•Source system
data fields
•Field analysis
•Field type analysis
•Data record
analysis
•Define Dictionary
•Candidate fields
•Feasibility
•Socialization
Attribute
•Field value
assignment
•Field correlation
•Weight scoring
•Sensitivity scoring
Develop
•Classification
engine
Infrastructure
Setup
•Classification
engine
configuration
•Coding of
classification
algorithm
Validate
•Sample size
scoping
•Sample size
indexing
•Validation of
sample set
•Statistical
validation of
sample set
•Tune
•Result
socialization
Integrate
•Design
•User stories
•Source system
tagging
(application
tagging)
•Stakeholder
Socialization
Protect
•Access control
•Behavior
monitoring
•Source System
Secure design
•Source System
compliance
•Export control
•Import control
•Data Loss
#RSAC
Now what? - Prevent, Detect and Educate
17
Data
Visibility
Prevent
DetectEducate
• Restrict access to the application and
through search
• Fine grain access based on data
classification
• Tag source systems and docs w/
classification metadata
• Focus on most sensitive data
• Integration with DLP solutions
• Data science
Policy Driven,
Context-Based
Access Control
Access
Visibility
Control
Restricted
Why
• Bug Status: Open
• Bug Severity: Critical
• Keywords: Customer:
#RSAC
Q&A
18
Anchit Arora
Program Manager
InfoSec, Data Security Analytics Team
ancarora@cisco.com

More Related Content

What's hot (20)

PDF
CLOUD SECURITY ESSENTIALS 2.0 Full Stack Hacking & Recovery
Priyanka Aash
 
PDF
Threat intel- -content-curation-organizing-the-path-to-successful-detection
Priyanka Aash
 
PDF
How To Avoid The Top Ten Software Security Flaws
Priyanka Aash
 
PDF
Insights from-NSAs-cybersecurity-threat-operations-center
Priyanka Aash
 
PDF
Westjets Security Architecture Made Simple We Finally Got It Right
Priyanka Aash
 
PDF
Soc 2030-socs-are-broken-lets-fix- them
Priyanka Aash
 
PDF
Pulling our-socs-up
Priyanka Aash
 
PDF
The Rise of the Purple Team
Priyanka Aash
 
PDF
Aspirin as a Service: Using the Cloud to Cure Security Headaches
Priyanka Aash
 
PDF
Implementing An Automated Incident Response Architecture
Priyanka Aash
 
PDF
Introduction and a Look at Security Trends
Priyanka Aash
 
PDF
Predicting exploitability-forecasts-for-vulnerability-management
Priyanka Aash
 
PDF
Cloud Breach – Preparation and Response
Priyanka Aash
 
PDF
Achieving Defendable Architectures Via Threat Driven Methodologies
Priyanka Aash
 
PDF
Incident response-in-the-cloud
Priyanka Aash
 
PDF
Crypto 101: Encryption, Codebreaking, SSL and Bitcoin
Priyanka Aash
 
PDF
From SIEM to SOC: Crossing the Cybersecurity Chasm
Priyanka Aash
 
PDF
Soc analyst course content
ShivamSharma909
 
PDF
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
Sam Kumarsamy
 
PPTX
Overview of Google’s BeyondCorp Approach to Security
Priyanka Aash
 
CLOUD SECURITY ESSENTIALS 2.0 Full Stack Hacking & Recovery
Priyanka Aash
 
Threat intel- -content-curation-organizing-the-path-to-successful-detection
Priyanka Aash
 
How To Avoid The Top Ten Software Security Flaws
Priyanka Aash
 
Insights from-NSAs-cybersecurity-threat-operations-center
Priyanka Aash
 
Westjets Security Architecture Made Simple We Finally Got It Right
Priyanka Aash
 
Soc 2030-socs-are-broken-lets-fix- them
Priyanka Aash
 
Pulling our-socs-up
Priyanka Aash
 
The Rise of the Purple Team
Priyanka Aash
 
Aspirin as a Service: Using the Cloud to Cure Security Headaches
Priyanka Aash
 
Implementing An Automated Incident Response Architecture
Priyanka Aash
 
Introduction and a Look at Security Trends
Priyanka Aash
 
Predicting exploitability-forecasts-for-vulnerability-management
Priyanka Aash
 
Cloud Breach – Preparation and Response
Priyanka Aash
 
Achieving Defendable Architectures Via Threat Driven Methodologies
Priyanka Aash
 
Incident response-in-the-cloud
Priyanka Aash
 
Crypto 101: Encryption, Codebreaking, SSL and Bitcoin
Priyanka Aash
 
From SIEM to SOC: Crossing the Cybersecurity Chasm
Priyanka Aash
 
Soc analyst course content
ShivamSharma909
 
bcs_sb_TechPartner_SAPlatform_Damballa_EN_v1a (2)
Sam Kumarsamy
 
Overview of Google’s BeyondCorp Approach to Security
Priyanka Aash
 

Similar to Applying Auto-Data Classification Techniques for Large Data Sets (20)

PDF
RSA 2016 Realities of Data Security
Scott Carlson
 
PPTX
Cyber Defense Matrix: Reloaded
Sounil Yu
 
PDF
Data Science Transforming Security Operations
Priyanka Aash
 
PPTX
RSA 2016 Security Analytics Presentation
Anton Chuvakin
 
PDF
Security Analytics: The Promise of Artificial Intelligence, Machine Learning,...
Cybereason
 
PPTX
Cyber Defense Matrix: Revolutions
Sounil Yu
 
PDF
DevSecOps - Building continuous security into it and app infrastructures
Priyanka Aash
 
PPTX
Top 10 Best Practices for Implementing Data Classification
Watchful Software
 
PDF
AI for Cybersecurity Innovation
Pete Burnap
 
PDF
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Keith Kraus
 
PDF
Soluzioni per la sicurezza aziendale di hp
at MicroFocus Italy ❖✔
 
PDF
18 Tips for Data Classification - Data Sheet by Secure Islands
Secure Islands - Data Security Policy
 
PPTX
Software Risk Analytics
Rob Cross
 
PDF
Advances in cloud scale machine learning for cyber-defense
Priyanka Aash
 
PDF
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs
 
PDF
Practical appsec lessons learned in the age of agile and DevOps
Priyanka Aash
 
PDF
Big Data & Security Have Collided - What Are You Going to do About It?
EMC
 
PDF
Demystifying Security Analytics: Data, Methods, Use Cases
Priyanka Aash
 
PDF
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Rod Soto
 
PDF
Automated prevention of ransomware with machine learning and gpos
Priyanka Aash
 
RSA 2016 Realities of Data Security
Scott Carlson
 
Cyber Defense Matrix: Reloaded
Sounil Yu
 
Data Science Transforming Security Operations
Priyanka Aash
 
RSA 2016 Security Analytics Presentation
Anton Chuvakin
 
Security Analytics: The Promise of Artificial Intelligence, Machine Learning,...
Cybereason
 
Cyber Defense Matrix: Revolutions
Sounil Yu
 
DevSecOps - Building continuous security into it and app infrastructures
Priyanka Aash
 
Top 10 Best Practices for Implementing Data Classification
Watchful Software
 
AI for Cybersecurity Innovation
Pete Burnap
 
Streaming Cyber Security into Graph: Accelerating Data into DataStax Graph an...
Keith Kraus
 
Soluzioni per la sicurezza aziendale di hp
at MicroFocus Italy ❖✔
 
18 Tips for Data Classification - Data Sheet by Secure Islands
Secure Islands - Data Security Policy
 
Software Risk Analytics
Rob Cross
 
Advances in cloud scale machine learning for cyber-defense
Priyanka Aash
 
Marlabs Capabilities Overview: DWBI, Analytics and Big Data Services
Marlabs
 
Practical appsec lessons learned in the age of agile and DevOps
Priyanka Aash
 
Big Data & Security Have Collided - What Are You Going to do About It?
EMC
 
Demystifying Security Analytics: Data, Methods, Use Cases
Priyanka Aash
 
SPO2-T11_Automated-Prevention-of-Ransomware-with-Machine-Learning-and-GPOs
Rod Soto
 
Automated prevention of ransomware with machine learning and gpos
Priyanka Aash
 
Ad

More from Priyanka Aash (20)

PDF
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
 
PDF
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
PDF
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
PDF
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
PDF
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
PDF
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
PDF
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
PDF
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
PDF
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
PDF
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
PDF
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
PDF
Keynote : Presentation on SASE Technology
Priyanka Aash
 
PDF
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
PDF
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
PDF
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
PDF
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
PDF
(CISOPlatform Summit & SACON 2024) Kids Cyber Security .pdf
Priyanka Aash
 
PDF
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
Priyanka Aash
 
PDF
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
Priyanka Aash
 
PDF
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
Priyanka Aash
 
From Chatbot to Destroyer of Endpoints - Can ChatGPT Automate EDR Bypasses (1...
Priyanka Aash
 
Cracking the Code - Unveiling Synergies Between Open Source Security and AI.pdf
Priyanka Aash
 
Oh, the Possibilities - Balancing Innovation and Risk with Generative AI.pdf
Priyanka Aash
 
Lessons Learned from Developing Secure AI Workflows.pdf
Priyanka Aash
 
Cyber Defense Matrix Workshop - RSA Conference
Priyanka Aash
 
A Constitutional Quagmire - Ethical Minefields of AI, Cyber, and Privacy.pdf
Priyanka Aash
 
Securing AI - There Is No Try, Only Do!.pdf
Priyanka Aash
 
GenAI Opportunities and Challenges - Where 370 Enterprises Are Focusing Now.pdf
Priyanka Aash
 
Coordinated Disclosure for ML - What's Different and What's the Same.pdf
Priyanka Aash
 
10 Key Challenges for AI within the EU Data Protection Framework.pdf
Priyanka Aash
 
Techniques for Automatic Device Identification and Network Assignment.pdf
Priyanka Aash
 
Keynote : Presentation on SASE Technology
Priyanka Aash
 
Keynote : AI & Future Of Offensive Security
Priyanka Aash
 
Redefining Cybersecurity with AI Capabilities
Priyanka Aash
 
Demystifying Neural Networks And Building Cybersecurity Applications
Priyanka Aash
 
Finetuning GenAI For Hacking and Defending
Priyanka Aash
 
(CISOPlatform Summit & SACON 2024) Kids Cyber Security .pdf
Priyanka Aash
 
(CISOPlatform Summit & SACON 2024) Regulation & Response In Banks.pdf
Priyanka Aash
 
(CISOPlatform Summit & SACON 2024) Cyber Insurance & Risk Quantification.pdf
Priyanka Aash
 
(CISOPlatform Summit & SACON 2024) Workshop _ Most Dangerous Attack Technique...
Priyanka Aash
 
Ad

Recently uploaded (20)

PDF
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
PPTX
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
PDF
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
PDF
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
PPTX
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
DOCX
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
PDF
Advancing WebDriver BiDi support in WebKit
Igalia
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PDF
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
PDF
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
PDF
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
PDF
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
PDF
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
PDF
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
PDF
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
PDF
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 
“NPU IP Hardware Shaped Through Software and Use-case Analysis,” a Presentati...
Edge AI and Vision Alliance
 
The Project Compass - GDG on Campus MSIT
dscmsitkol
 
Go Concurrency Real-World Patterns, Pitfalls, and Playground Battles.pdf
Emily Achieng
 
[Newgen] NewgenONE Marvin Brochure 1.pdf
darshakparmar
 
"Autonomy of LLM Agents: Current State and Future Prospects", Oles` Petriv
Fwdays
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
Cryptography Quiz: test your knowledge of this important security concept.
Rajni Bhardwaj Grover
 
Advancing WebDriver BiDi support in WebKit
Igalia
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
CIFDAQ Market Insights for July 7th 2025
CIFDAQ
 
Achieving Consistent and Reliable AI Code Generation - Medusa AI
medusaaico
 
Bitcoin for Millennials podcast with Bram, Power Laws of Bitcoin
Stephen Perrenod
 
Newgen 2022-Forrester Newgen TEI_13 05 2022-The-Total-Economic-Impact-Newgen-...
darshakparmar
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
CIFDAQ Market Wrap for the week of 4th July 2025
CIFDAQ
 
POV_ Why Enterprises Need to Find Value in ZERO.pdf
darshakparmar
 
IoT-Powered Industrial Transformation – Smart Manufacturing to Connected Heal...
Rejig Digital
 
Agentic AI lifecycle for Enterprise Hyper-Automation
Debmalya Biswas
 
"AI Transformation: Directions and Challenges", Pavlo Shaternik
Fwdays
 

Applying Auto-Data Classification Techniques for Large Data Sets

  • 1. SESSION ID: #RSAC Anchit AroraApplying Auto-Data Classification Techniques for Large Data Sets Program Manager InfoSec, Cisco PDAC-W02
  • 2. #RSAC • Complex work models: always accessible, remote & mobile workers • Definition of perimeter: Cloud, Customer & partners • Users choose devices (BYOD) The proliferation of data and increase in complexity 1995 2006 2014 9 to 5 in the office Emergence of Internet & mobility The Human Network 2020 The Internet of Everything BYOD & Externalization Pace • Enterprise data collection to increase 40 to 60 % per year* • Experts predict the amount of data generated annually to increase 4300% by 2020 * Complexity • Big data architectures, low storage cost, Increase of data retention • 80% of data generated today is unstructured • Data generated worldwide will reach 44 zettabytes by 2020* Volume * Numbers and statistics from Gartner, Gigaom Research, CSC, Seagate
  • 3. #RSAC Auto-classification: The why and what 3 Desired business outcome: At Cisco we want to provide additional sensitivity context to structured and unstructured data, to be able to apply controls more effectively Scope: Our aim is to have an automated classification capability for all structured data systems, and provide capability to better govern/control generation of unstructured data which is created as a result of export from structured data systems using label/field association to each record set
  • 4. #RSAC Use-case: From structured to unstructured 4 SoR SoE Structured data system (SoR) Classification Engine algorithms and dictionaries IndexerAPI Classification Index all existing and newly written data is indexed and classified based on algorithm and dictionary defined for the SoR Provide classification information to the user – or access policy based on class to the application UI Export (E) & tag
  • 5. #RSAC 5 Box.com is an external cloud platform used by Cisco for collaboration and storage of data Security questions to ask: What is this data? What’s the source of the data? Who owns this data? What’s the sensitivity of the data? Is all data equally sensitive (this is the essence for optimal security)? What’s the level of security required? An unstructured data use-case: box.com
  • 6. #RSAC Should we ask the user to govern security? 6 Can we expect the user to make the right security decision with all this complexity involved in decision making? The user needs to be very knowledgable to make the right decision The answer is No: But however many systems are designed to have users govern security - Recognize data categories in systems with unstructured data Classify data in any data system Set data securitypolicy Securely export data out of the system Making the shift from user governed to data owner governed
  • 7. #RSAC Data Management Policy Enforcement Governance of Data by Data Owner Data Protection capabilities Data Intelligence & monitoring capabilities Governance of Data by End User How to make the shift to a data owner model? Classify Sensitivity Data Taxonomy Recognize Data Type Tag Across various data types: Engineering, Customer, Finance, HR
  • 8. #RSAC Conceptual approach 8 Discover Recognize Classify Find data objects Identified Data Sensitivity1 Large unstructured generic data repositories Classification mostly unknown Data Sensitivity2 Data Sensitivity3 Data Sensitivity4 Structured data systems (SoR)
  • 9. #RSAC Structured data case study: Engineering & Customer data protection in context of bug Information
  • 10. #RSAC A case study: Bug information 10 Millions of bugs + product bugs, 3 approaches available to protect: 1. Treat all bugs equally, and apply ‘very strict’ controls on all bugs • In heterogenic data models , most data is ‘Over’-protected • Limits business ability and User experience 2. Treat all bugs equally, and apply ‘loose’ controls on all bugs • Results in ‘Under’-protected data 3. Apply the right amount of protection on a bug, based on sensitivity • Balanced security and cost applied – just the right amount of security!
  • 11. #RSAC Setting the foundation for auto-class 11 Category: Is a bug Product development lifecycle: Sustaining Severity: Sev1, Status: Open Found by Customer Customer network topology Belongs to hardware A Sensitive software bug in CDETSInventory Process Identify • Identify the most sensitive IP and IP’s appropriate owner(s) Define • Define data use and access rules for the most sensitive IP Translate • Translate rules into IT enforceable policies The inventory process engages the business to build out the data taxonomy and a model of the sensitivity
  • 12. #RSAC The proof is in the numbers!! 12 Parameter Value Average time to classify a single bug 5 minutes Total number of bugs 7 Million Time to classify 35 Million minutes Cost/min of SME analyst $ 0.83/Min Cost to classify $ 29 Million Additional costs to consider for manual: Training: For consistent user behavior Change to business: Cleaning legacy Change to applications and Infrastructure Parameter Value Average time to classify a single bug* 0.002 minutes Total number of bugs 7 Million Time to classify 14,000 Minutes Estimated cost for Infrastructure and resources required to classify $ 0.25 Million Auto-Classification approach Manual approach Accuracy Results 83%
  • 13. #RSAC The most sensitive data is just a small portion 13 < 1% Restricted 2.5% Highly Confidential
  • 14. #RSAC How did we execute the methodology? 14 AS-IS: New SoR integration for Auto-Class # Phase Scope 1 Engage Identify SoR and engage stakeholders to communicate expectations, R&R, Identify data workflow (user stories) and data categories. Plan and establish scope and planning of the SoR integration 2 Attribute Analysis of data, database fields, record and build a data sensitivity model / algorithm to be able to classify the data 3 Develop Development of attribution and scoring algorithm into the classification engine and perform indexing of datasets 4 Validate Validation and tuning of classification results of the classification engine to ensure accuracy of the output 5 Integrate Integration of classification data with the source system 6 Protect Planning and implementation of protective measures in the source system for sensitive data classes Engage Attribute Develop Validate Integrate Protect A 6 step workflow, for structured data (SoR)
  • 15. #RSAC Building an attribution model 15 Attribute A, Attribute B, Attribute C ……………………. Attribute L, Attribute M, Attribute N…………… Attribute X, Attribute Y, Attribute Z…… All available source system built-in attributes Selected attributes and values Extracted entities from free-text fields and attachments: Attribution model Weights Scoring equation Values and scores Classification rules Data freshness Contextual information Extracted entities
  • 16. #RSAC How to create a similar solution for your organization? 16 Engage •System Identification •Stakeholder identification •Source system data fields •Field analysis •Field type analysis •Data record analysis •Define Dictionary •Candidate fields •Feasibility •Socialization Attribute •Field value assignment •Field correlation •Weight scoring •Sensitivity scoring Develop •Classification engine Infrastructure Setup •Classification engine configuration •Coding of classification algorithm Validate •Sample size scoping •Sample size indexing •Validation of sample set •Statistical validation of sample set •Tune •Result socialization Integrate •Design •User stories •Source system tagging (application tagging) •Stakeholder Socialization Protect •Access control •Behavior monitoring •Source System Secure design •Source System compliance •Export control •Import control •Data Loss
  • 17. #RSAC Now what? - Prevent, Detect and Educate 17 Data Visibility Prevent DetectEducate • Restrict access to the application and through search • Fine grain access based on data classification • Tag source systems and docs w/ classification metadata • Focus on most sensitive data • Integration with DLP solutions • Data science Policy Driven, Context-Based Access Control Access Visibility Control Restricted Why • Bug Status: Open • Bug Severity: Critical • Keywords: Customer: