SlideShare a Scribd company logo
From	
  Threat	
  Intelligence	
  to	
  Defense	
  
Cleverness:	
  A	
  Data	
  Science	
  Approach	
  
(#<datasci)	
  
Alex	
  Pinto	
  
Chief	
  Data	
  Scien2st	
  –	
  Niddel	
  	
  /	
  MLSec	
  Project	
  
@alexcpsec	
  
@MLSecProject
Alex	
  Pinto	
  
	
  
•  That	
  guy	
  that	
  started	
  MLSec	
  Project	
  	
  
•  Chief	
  Data	
  Scien2st	
  at	
  Niddel	
  
	
  
•  Machine	
  Learning	
  Researcher	
  focused	
  on	
  Security	
  Data	
  
•  Network	
  security	
  and	
  incident	
  response	
  aficionado	
  	
  
•  Tortured	
  by	
  Log	
  Management	
  /	
  SIEMs	
  as	
  a	
  child	
  
•  A	
  BIG	
  FAN	
  of	
  AMack	
  Maps	
  #pewpew	
  
•  Does	
  not	
  know	
  who	
  hacked	
  Sony	
  
•  Has	
  nothing	
  to	
  do	
  with	
  aMribu2on	
  in	
  “Opera2on	
  Capybara”	
  
()	
  {	
  :;	
  };	
  whoami	
  
•  Cyber	
  War…	
  Threat	
  Intel	
  –	
  What	
  is	
  it	
  good	
  for?	
  
•  Combine	
  and	
  TIQ-­‐test	
  
•  Using	
  TIQ-­‐test	
  
•  Novelty	
  Test	
  
•  Overlap	
  Test	
  
•  Popula2on	
  Test	
  
•  Aging	
  Test	
  
•  Uniqueness	
  Test	
  
•  Use	
  case:	
  Feed	
  Comparison	
  
Agenda	
  
What	
  is	
  TI	
  good	
  for	
  anyway?	
  
What	
  is	
  TI	
  good	
  for	
  anyway?	
  
•  1)	
  AMribu2on	
  
What	
  is	
  TI	
  good	
  for	
  anyway?	
  
What	
  is	
  TI	
  good	
  for	
  anyway?	
  
•  2)	
  Cyber	
  Threat	
  Maps	
  	
  
	
  (hMps://github.com/hrbrmstr/pewpew)	
  
What	
  is	
  TI	
  good	
  for	
  anyway?	
  
•  3)	
  How	
  about	
  actual	
  defense?	
  	
  
•  Use	
  it	
  as	
  blacklists?	
  As	
  research	
  data?	
  
•  Thing	
  is	
  RAW	
  DATA	
  is	
  hard	
  to	
  work	
  with	
  
(Semi-­‐)Required	
  Reading	
  
•  #2qtest	
  slides:	
  hMp://bit.ly/2qtest	
  
•  RPubs	
  Page:	
  hMp://bit.ly/2qtest-­‐rpubs	
  
Combine	
  and	
  TIQ-­‐Test	
  
•  Combine	
  (hMps://github.com/mlsecproject/combine)	
  
•  Gathers	
  TI	
  data	
  (ip/host)	
  from	
  Internet	
  and	
  local	
  files	
  
•  Normalizes	
  the	
  data	
  and	
  enriches	
  it	
  (AS	
  /	
  Geo	
  /	
  pDNS)	
  
•  Can	
  export	
  to	
  CSV,	
  “2q-­‐test	
  format”	
  and	
  CRITs	
  
•  Coming	
  soon:	
  CybOX	
  /	
  STIX	
  (ty	
  @kylemaxwell)	
  
•  TIQ-­‐Test	
  (hMps://github.com/mlsecproject/2q-­‐test)	
  
•  Runs	
  sta2s2cal	
  summaries	
  and	
  tests	
  on	
  TI	
  feeds	
  
•  Generates	
  charts	
  based	
  on	
  the	
  tests	
  and	
  summaries	
  
•  WriMen	
  in	
  R	
  (because	
  you	
  should	
  learn	
  a	
  stat	
  language)	
  
Using	
  TIQ-­‐TEST	
  
•  Available	
  tests	
  and	
  sta2s2cs:	
  
•  NOVELTY	
  –	
  How	
  ogen	
  do	
  they	
  update	
  themselves?	
  
•  OVERLAP	
  –	
  How	
  do	
  they	
  compare	
  to	
  what	
  you	
  got?	
  
•  POPULATION	
  –	
  How	
  does	
  this	
  popula2on	
  distribu2on	
  
compare	
  to	
  another	
  one	
  ?	
  
•  AGING	
  –	
  How	
  long	
  does	
  an	
  indicator	
  sit	
  on	
  a	
  feed?	
  
•  UNIQUENESS	
  –	
  How	
  many	
  indicators	
  are	
  found	
  in	
  only	
  
one	
  feed?	
  
Using	
  TIQ-­‐TEST	
  
•  New	
  dataset!	
  
•  hMps://github.com/mlsecproject/2q-­‐test-­‐Winter2015	
  
Using	
  TIQ-­‐TEST	
  –	
  Feeds	
  Selected	
  
•  Dataset	
  was	
  separated	
  into	
  “inbound”	
  and	
  “outbound”	
  
Using	
  TIQ-­‐TEST	
  –	
  Data	
  Prep	
  
•  Extract	
  the	
  “raw”	
  informa2on	
  from	
  indicator	
  feeds	
  
•  Both	
  IP	
  addresses	
  and	
  hostnames	
  were	
  extracted	
  
Using	
  TIQ-­‐TEST	
  –	
  Data	
  Prep	
  
•  Convert	
  the	
  hostname	
  data	
  to	
  IP	
  addresses:	
  
•  Ac2ve	
  IP	
  addresses	
  for	
  the	
  respec2ve	
  date	
  (“A”	
  query)	
  
•  Passive	
  DNS	
  from	
  Farsight	
  Security	
  (DNSDB)	
  
•  For	
  each	
  IP	
  record	
  (including	
  the	
  ones	
  from	
  hostnames):	
  
•  Add	
  asnumber	
  and	
  asname	
  (from	
  MaxMind	
  ASN	
  DB)	
  
•  Add	
  country	
  (from	
  MaxMind	
  GeoLite	
  DB)	
  
•  Add	
  rhost	
  (again	
  from	
  DNSDB)	
  –	
  most	
  popular	
  “PTR”	
  
Using	
  TIQ-­‐TEST	
  –	
  Data	
  Prep	
  Done	
  
Novelty	
  Test	
  –	
  measuring	
  added	
  
and	
  dropped	
  indicators
Novelty	
  Test	
  -­‐	
  Inbound	
  
Overlap	
  Test	
  –	
  More	
  data	
  is	
  beMer,	
  
but	
  make	
  sure	
  it	
  is	
  not	
  the	
  same	
  
data
Overlap	
  Test	
  -­‐	
  Inbound	
  
Overlap	
  Test	
  -­‐	
  Outbound	
  
Popula<on	
  Test	
  
•  Let	
  us	
  use	
  the	
  ASN	
  and	
  
GeoIP	
  databases	
  that	
  we	
  
used	
  to	
  enrich	
  our	
  data	
  as	
  a	
  
reference	
  of	
  the	
  “true”	
  
popula2on.	
  	
  
•  But,	
  but,	
  human	
  beings	
  are	
  
unpredictable!	
  We	
  will	
  
never	
  be	
  able	
  to	
  forecast	
  
this!	
  
	
  
	
  
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
Is	
  your	
  sampling	
  poll	
  as	
  random	
  as	
  
you	
  think?	
  
Can	
  we	
  get	
  a	
  beSer	
  look?	
  
•  Sta2s2cal	
  inference-­‐based	
  comparison	
  models	
  
(hypothesis	
  tes2ng)	
  
•  Exact	
  binomial	
  tests	
  (when	
  we	
  have	
  the	
  “true”	
  pop)	
  
•  Chi-­‐squared	
  propor2on	
  tests	
  (similar	
  to	
  
independence	
  tests)	
  
	
  
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
Aging	
  Test	
  –	
  Is	
  someone	
  cleaning	
  
this	
  up	
  eventually?
INBOUND
OUTBOUND
Uniqueness	
  Test	
  
Uniqueness	
  Test	
  
•  “Domain-­‐based	
  indicators	
  are	
  unique	
  to	
  one	
  list	
  between	
  96.16%	
  
and	
  97.37%”	
  
•  “IP-­‐based	
  indicators	
  are	
  unique	
  to	
  one	
  list	
  between	
  82.46%	
  and	
  
95.24%	
  of	
  the	
  2me”	
  
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
Intermission
•  Some	
  of	
  you	
  are	
  probably	
  like:	
  
•  “You	
  Data	
  Scien2sts	
  and	
  your	
  
algorithms,	
  how	
  quaint.”	
  
•  “Why	
  aren’t	
  you	
  doing	
  some	
  useful	
  
research	
  like	
  na2on-­‐state	
  
aMribu2on?”	
  
OPTION	
  1:	
  Cool	
  Story,	
  Bro!	
  
OPTION	
  2:	
  How	
  can	
  I	
  use	
  this	
  
awesomeness	
  on	
  my	
  data?
•  How	
  about	
  using	
  TIQ-­‐TEST	
  to	
  evaluate	
  a	
  private	
  intel	
  feed?	
  
•  Trying	
  stuff	
  before	
  you	
  buy	
  is	
  usually	
  a	
  good	
  idea.	
  Just	
  sayin’	
  
•  Let’s	
  compare	
  a	
  new	
  feed,	
  “private1”,	
  against	
  our	
  combined	
  
outbound	
  indicators	
  
	
  
Use	
  Case:	
  Comparing	
  Private	
  Feeds	
  
Popula<on	
  Test	
  
Popula<on	
  Test	
  
Popula<on	
  Test	
  
Aging	
  Test	
  
•  I	
  guess	
  most	
  DGAs	
  rotate	
  every	
  24	
  hours,	
  right?	
  
•  Rota2on	
  means	
  the	
  private	
  data	
  is	
  s2ll	
  “fresh”,	
  from	
  research	
  or	
  
DGA	
  genera2on	
  procedures	
  
Mostly	
  DGA	
  Related	
  Churn	
  
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
A++	
  WOULD	
  
THREAT	
  INTEL	
  
AGAIN
MLSec	
  Project	
  
•  Both	
  projects	
  are	
  available	
  as	
  GPLv3	
  by	
  MLSec	
  Project	
  
•  Doing	
  ML	
  research	
  on	
  Security?	
  Let	
  us	
  know!	
  
•  Puvng	
  together	
  a	
  trust	
  group	
  to	
  share	
  experiences	
  and	
  develop	
  
open-­‐source	
  tools	
  to	
  help	
  with	
  data	
  gathering	
  and	
  analysis	
  
•  Liked	
  TIQ-­‐TEST?	
  We	
  can	
  help	
  benchmark	
  your	
  private	
  feeds	
  
using	
  these	
  and	
  other	
  techniques	
  
	
  
•  Visit	
  hSps://www.mlsecproject.org	
  ,	
  message	
  @MLSecProject	
  
or	
  just	
  e-­‐mail	
  me.
•  Come	
  talk	
  to	
  me	
  about	
  Niddel!	
  
•  Private	
  Beta	
  of	
  Magnet,	
  the	
  Machine	
  Learning-­‐powered	
  
Threat	
  Intelligence	
  Plaworm.	
  
•  Our	
  models	
  extrapolate	
  the	
  knowledge	
  of	
  exis2ng	
  threat	
  
intelligence	
  feeds	
  as	
  experienced	
  analysis	
  would.	
  
•  Models	
  make	
  use	
  of	
  the	
  same	
  data	
  analyst	
  would	
  have.	
  
•  Automa2cally	
  triages	
  and	
  hunts	
  on	
  pivots	
  of	
  enriched	
  
informa2on	
  	
  
Don’t	
  want	
  to	
  do	
  all	
  this	
  work?	
  
Take	
  Aways	
  
•  Analyze	
  your	
  data.	
  Extract	
  value	
  from	
  it!	
  
•  Try	
  before	
  you	
  buy!	
  Different	
  test	
  results	
  mean	
  
different	
  things	
  to	
  different	
  orgs.	
  
•  Try	
  the	
  sample	
  data,	
  replicate	
  the	
  experiments:	
  
•  hMps://github.com/mlsecproject/2q-­‐test-­‐Winter2015	
  
•  hMp://rpubs.com/alexcpsec/2q-­‐test-­‐Winter2015	
  
Greets	
  and	
  Thanks!	
  
•  @kylemaxwell,	
  @paul4pc	
  for	
  helping	
  with	
  
COMBINE	
  
•  @bfist	
  for	
  his	
  work	
  on	
  hMp://sony.aMributed.to	
  
•  @hrbrmstr	
  for	
  IPEW	
  and	
  chart	
  revisions	
  for	
  TIQ-­‐
TEST	
  
•  All	
  the	
  MLSec	
  Project	
  community	
  peps!	
  
From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)
Thanks!	
  
•  Q&A?	
  
•  Feedback!	
  
”The	
  measure	
  of	
  intelligence	
  is	
  the	
  ability	
  to	
  change."	
  	
  
	
   	
   	
   	
  	
   	
   	
  -­‐	
  Albert	
  Einstein	
  	
  
Alex	
  Pinto	
  	
  
@alexcpsec	
  
@MLSecProject	
  
@NiddelCorp	
  

More Related Content

PPTX
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Alex Pinto
 
PDF
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Alex Pinto
 
PDF
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Alex Pinto
 
PDF
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Alex Pinto
 
PDF
SANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
Alex Pinto
 
PDF
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Alex Pinto
 
PPTX
Towards a Threat Hunting Automation Maturity Model
Alex Pinto
 
PPTX
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Alex Pinto
 
Data-Driven Threat Intelligence: Useful Methods and Measurements for Handling...
Alex Pinto
 
Beyond Matching: Applying Data Science Techniques to IOC-based Detection
Alex Pinto
 
Data-Driven Threat Intelligence: Metrics on Indicator Dissemination and Sharing
Alex Pinto
 
Biting into the Jawbreaker: Pushing the Boundaries of Threat Hunting Automation
Alex Pinto
 
SANS CTI Summit 2016 - Data-Driven Threat Intelligence: Sharing
Alex Pinto
 
Sharing is Caring: Understanding and Measuring Threat Intelligence Sharing Ef...
Alex Pinto
 
Towards a Threat Hunting Automation Maturity Model
Alex Pinto
 
Determining the Fit and Impact of CTI Indicators on Your Monitoring Pipeline ...
Alex Pinto
 

What's hot (20)

PDF
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Alex Pinto
 
PDF
BSidesLV 2013 - Using Machine Learning to Support Information Security
Alex Pinto
 
PDF
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
PDF
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Alex Pinto
 
PPTX
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
Joshua R Nicholson
 
PDF
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Elasticsearch
 
PPTX
Abstract Tools for Effective Threat Hunting
chrissanders88
 
PPTX
SOC2016 - The Investigation Labyrinth
chrissanders88
 
PDF
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE - ATT&CKcon
 
PDF
AI & ML in Cyber Security - Why Algorithms are Dangerous
Raffael Marty
 
PDF
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
North Texas Chapter of the ISSA
 
PPTX
Cyber Threat Hunting with Phirelight
Hostway|HOSTING
 
PPTX
Building a Successful Threat Hunting Program
Carl C. Manion
 
PPTX
Bsides 2019 - Intelligent Threat Hunting
Dhruv Majumdar
 
PDF
Enabling effective hunt teaming and incident response
jeffmcjunkin
 
PDF
MITRE ATT&CKcon 2.0: Lessons in Purple Team Testing with MITRE ATT&CK; Daniel...
MITRE - ATT&CKcon
 
PDF
ATTACKers Think in Graphs: Building Graphs for Threat Intelligence
MITRE - ATT&CKcon
 
PPTX
Crowd-Sourced Threat Intelligence
AlienVault
 
PDF
Endpoint (big) Data In The Age of Compromise, Ian Rainsburgh
Napier University
 
PDF
Visualization in the Age of Big Data
Raffael Marty
 
Measuring the IQ of your Threat Intelligence Feeds (#tiqtest)
Alex Pinto
 
BSidesLV 2013 - Using Machine Learning to Support Information Security
Alex Pinto
 
Secure Because Math: A Deep-Dive on Machine Learning-Based Monitoring (#Secur...
Alex Pinto
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Alex Pinto
 
2016 FS-ISAC Annual Summit (Miami) - Developing Effective Encryption Strategies
Joshua R Nicholson
 
Threat Hunting with Elastic at SpectorOps: Welcome to HELK
Elasticsearch
 
Abstract Tools for Effective Threat Hunting
chrissanders88
 
SOC2016 - The Investigation Labyrinth
chrissanders88
 
MITRE ATT&CKcon 2.0: Prioritizing Data Sources for Minimum Viable Detection; ...
MITRE - ATT&CKcon
 
AI & ML in Cyber Security - Why Algorithms are Dangerous
Raffael Marty
 
Luncheon 2016-07-16 - Topic 2 - Advanced Threat Hunting by Justin Falck
North Texas Chapter of the ISSA
 
Cyber Threat Hunting with Phirelight
Hostway|HOSTING
 
Building a Successful Threat Hunting Program
Carl C. Manion
 
Bsides 2019 - Intelligent Threat Hunting
Dhruv Majumdar
 
Enabling effective hunt teaming and incident response
jeffmcjunkin
 
MITRE ATT&CKcon 2.0: Lessons in Purple Team Testing with MITRE ATT&CK; Daniel...
MITRE - ATT&CKcon
 
ATTACKers Think in Graphs: Building Graphs for Threat Intelligence
MITRE - ATT&CKcon
 
Crowd-Sourced Threat Intelligence
AlienVault
 
Endpoint (big) Data In The Age of Compromise, Ian Rainsburgh
Napier University
 
Visualization in the Age of Big Data
Raffael Marty
 
Ad

Similar to From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci) (20)

PPTX
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Alexandre Sieira
 
PDF
Cyber Threat Ranking using READ
Zachary S. Brown
 
PDF
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
PDF
Implementation of Secured Network Based Intrusion Detection System Using SVM ...
IRJET Journal
 
PDF
Intrusion Detection System Using Machine Learning: An Overview
IRJET Journal
 
PDF
Using Data Science for Cybersecurity
VMware Tanzu
 
PDF
An efficient intrusion detection using relevance vector machine
IAEME Publication
 
PPTX
Managing Confidential Information – Trends and Approaches
Micah Altman
 
PPTX
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
Rod Soto
 
PDF
Review of Intrusion and Anomaly Detection Techniques
IJMER
 
PPTX
Jim Wojno: Incident Response - No Pain, No Gain!
centralohioissa
 
PDF
Anomaly detection by using CFS subset and neural network with WEKA tools
Drjabez
 
PDF
IDS / IPS Survey
Deris Stiawan
 
PPTX
Using Big Data to Counteract Advanced Threats
Zivaro Inc
 
PDF
Data mining and homeland security rl31798
Daniel John
 
PDF
3.2
Vin Sharma
 
PDF
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE - ATT&CKcon
 
PPTX
How I Learned to Stop Information Sharing and Love the DIKW
Sounil Yu
 
PDF
Intrusion detection using generative minority oversampling
battleroyal767
 
Threat Intelligence Baseada em Dados: Métricas de Disseminação e Compartilham...
Alexandre Sieira
 
Cyber Threat Ranking using READ
Zachary S. Brown
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
ijceronline
 
Implementation of Secured Network Based Intrusion Detection System Using SVM ...
IRJET Journal
 
Intrusion Detection System Using Machine Learning: An Overview
IRJET Journal
 
Using Data Science for Cybersecurity
VMware Tanzu
 
An efficient intrusion detection using relevance vector machine
IAEME Publication
 
Managing Confidential Information – Trends and Approaches
Micah Altman
 
Dynamic Population Discovery for Lateral Movement (Using Machine Learning)
Rod Soto
 
Review of Intrusion and Anomaly Detection Techniques
IJMER
 
Jim Wojno: Incident Response - No Pain, No Gain!
centralohioissa
 
Anomaly detection by using CFS subset and neural network with WEKA tools
Drjabez
 
IDS / IPS Survey
Deris Stiawan
 
Using Big Data to Counteract Advanced Threats
Zivaro Inc
 
Data mining and homeland security rl31798
Daniel John
 
MITRE ATT&CKcon 2018: Hunters ATT&CKing with the Data, Roberto Rodriguez, Spe...
MITRE - ATT&CKcon
 
How I Learned to Stop Information Sharing and Love the DIKW
Sounil Yu
 
Intrusion detection using generative minority oversampling
battleroyal767
 
Ad

Recently uploaded (20)

PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
Introduction to Data Analytics and Data Science
KavithaCIT
 
PPTX
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
PPTX
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
PDF
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
Introduction to Data Analytics and Data Science
KavithaCIT
 
Introduction-to-Python-Programming-Language (1).pptx
dhyeysapariya
 
The whitetiger novel review for collegeassignment.pptx
DhruvPatel754154
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Databricks-DE-Associate Certification Questions-june-2024.pptx
pedelli41
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Multiscale Segmentation of Survey Respondents: Seeing the Trees and the Fores...
Sione Palu
 
An Uncut Conversation With Grok | PDF Document
Mike Hydes
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
INFO8116 - Week 10 - Slides.pptx data analutics
guddipatel10
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 

From Threat Intelligence to Defense Cleverness: A Data Science Approach (#tidatasci)

  • 1. From  Threat  Intelligence  to  Defense   Cleverness:  A  Data  Science  Approach   (#<datasci)   Alex  Pinto   Chief  Data  Scien2st  –  Niddel    /  MLSec  Project   @alexcpsec   @MLSecProject
  • 2. Alex  Pinto     •  That  guy  that  started  MLSec  Project     •  Chief  Data  Scien2st  at  Niddel     •  Machine  Learning  Researcher  focused  on  Security  Data   •  Network  security  and  incident  response  aficionado     •  Tortured  by  Log  Management  /  SIEMs  as  a  child   •  A  BIG  FAN  of  AMack  Maps  #pewpew   •  Does  not  know  who  hacked  Sony   •  Has  nothing  to  do  with  aMribu2on  in  “Opera2on  Capybara”   ()  {  :;  };  whoami  
  • 3. •  Cyber  War…  Threat  Intel  –  What  is  it  good  for?   •  Combine  and  TIQ-­‐test   •  Using  TIQ-­‐test   •  Novelty  Test   •  Overlap  Test   •  Popula2on  Test   •  Aging  Test   •  Uniqueness  Test   •  Use  case:  Feed  Comparison   Agenda  
  • 4. What  is  TI  good  for  anyway?  
  • 5. What  is  TI  good  for  anyway?   •  1)  AMribu2on  
  • 6. What  is  TI  good  for  anyway?  
  • 7. What  is  TI  good  for  anyway?   •  2)  Cyber  Threat  Maps      (hMps://github.com/hrbrmstr/pewpew)  
  • 8. What  is  TI  good  for  anyway?   •  3)  How  about  actual  defense?     •  Use  it  as  blacklists?  As  research  data?   •  Thing  is  RAW  DATA  is  hard  to  work  with  
  • 9. (Semi-­‐)Required  Reading   •  #2qtest  slides:  hMp://bit.ly/2qtest   •  RPubs  Page:  hMp://bit.ly/2qtest-­‐rpubs  
  • 10. Combine  and  TIQ-­‐Test   •  Combine  (hMps://github.com/mlsecproject/combine)   •  Gathers  TI  data  (ip/host)  from  Internet  and  local  files   •  Normalizes  the  data  and  enriches  it  (AS  /  Geo  /  pDNS)   •  Can  export  to  CSV,  “2q-­‐test  format”  and  CRITs   •  Coming  soon:  CybOX  /  STIX  (ty  @kylemaxwell)   •  TIQ-­‐Test  (hMps://github.com/mlsecproject/2q-­‐test)   •  Runs  sta2s2cal  summaries  and  tests  on  TI  feeds   •  Generates  charts  based  on  the  tests  and  summaries   •  WriMen  in  R  (because  you  should  learn  a  stat  language)  
  • 11. Using  TIQ-­‐TEST   •  Available  tests  and  sta2s2cs:   •  NOVELTY  –  How  ogen  do  they  update  themselves?   •  OVERLAP  –  How  do  they  compare  to  what  you  got?   •  POPULATION  –  How  does  this  popula2on  distribu2on   compare  to  another  one  ?   •  AGING  –  How  long  does  an  indicator  sit  on  a  feed?   •  UNIQUENESS  –  How  many  indicators  are  found  in  only   one  feed?  
  • 12. Using  TIQ-­‐TEST   •  New  dataset!   •  hMps://github.com/mlsecproject/2q-­‐test-­‐Winter2015  
  • 13. Using  TIQ-­‐TEST  –  Feeds  Selected   •  Dataset  was  separated  into  “inbound”  and  “outbound”  
  • 14. Using  TIQ-­‐TEST  –  Data  Prep   •  Extract  the  “raw”  informa2on  from  indicator  feeds   •  Both  IP  addresses  and  hostnames  were  extracted  
  • 15. Using  TIQ-­‐TEST  –  Data  Prep   •  Convert  the  hostname  data  to  IP  addresses:   •  Ac2ve  IP  addresses  for  the  respec2ve  date  (“A”  query)   •  Passive  DNS  from  Farsight  Security  (DNSDB)   •  For  each  IP  record  (including  the  ones  from  hostnames):   •  Add  asnumber  and  asname  (from  MaxMind  ASN  DB)   •  Add  country  (from  MaxMind  GeoLite  DB)   •  Add  rhost  (again  from  DNSDB)  –  most  popular  “PTR”  
  • 16. Using  TIQ-­‐TEST  –  Data  Prep  Done  
  • 17. Novelty  Test  –  measuring  added   and  dropped  indicators
  • 18. Novelty  Test  -­‐  Inbound  
  • 19. Overlap  Test  –  More  data  is  beMer,   but  make  sure  it  is  not  the  same   data
  • 20. Overlap  Test  -­‐  Inbound  
  • 21. Overlap  Test  -­‐  Outbound  
  • 22. Popula<on  Test   •  Let  us  use  the  ASN  and   GeoIP  databases  that  we   used  to  enrich  our  data  as  a   reference  of  the  “true”   popula2on.     •  But,  but,  human  beings  are   unpredictable!  We  will   never  be  able  to  forecast   this!      
  • 24. Is  your  sampling  poll  as  random  as   you  think?  
  • 25. Can  we  get  a  beSer  look?   •  Sta2s2cal  inference-­‐based  comparison  models   (hypothesis  tes2ng)   •  Exact  binomial  tests  (when  we  have  the  “true”  pop)   •  Chi-­‐squared  propor2on  tests  (similar  to   independence  tests)    
  • 27. Aging  Test  –  Is  someone  cleaning   this  up  eventually?
  • 31. Uniqueness  Test   •  “Domain-­‐based  indicators  are  unique  to  one  list  between  96.16%   and  97.37%”   •  “IP-­‐based  indicators  are  unique  to  one  list  between  82.46%  and   95.24%  of  the  2me”  
  • 34. •  Some  of  you  are  probably  like:   •  “You  Data  Scien2sts  and  your   algorithms,  how  quaint.”   •  “Why  aren’t  you  doing  some  useful   research  like  na2on-­‐state   aMribu2on?”   OPTION  1:  Cool  Story,  Bro!  
  • 35. OPTION  2:  How  can  I  use  this   awesomeness  on  my  data?
  • 36. •  How  about  using  TIQ-­‐TEST  to  evaluate  a  private  intel  feed?   •  Trying  stuff  before  you  buy  is  usually  a  good  idea.  Just  sayin’   •  Let’s  compare  a  new  feed,  “private1”,  against  our  combined   outbound  indicators     Use  Case:  Comparing  Private  Feeds  
  • 40. Aging  Test   •  I  guess  most  DGAs  rotate  every  24  hours,  right?   •  Rota2on  means  the  private  data  is  s2ll  “fresh”,  from  research  or   DGA  genera2on  procedures   Mostly  DGA  Related  Churn  
  • 43. A++  WOULD   THREAT  INTEL   AGAIN
  • 44. MLSec  Project   •  Both  projects  are  available  as  GPLv3  by  MLSec  Project   •  Doing  ML  research  on  Security?  Let  us  know!   •  Puvng  together  a  trust  group  to  share  experiences  and  develop   open-­‐source  tools  to  help  with  data  gathering  and  analysis   •  Liked  TIQ-­‐TEST?  We  can  help  benchmark  your  private  feeds   using  these  and  other  techniques     •  Visit  hSps://www.mlsecproject.org  ,  message  @MLSecProject   or  just  e-­‐mail  me.
  • 45. •  Come  talk  to  me  about  Niddel!   •  Private  Beta  of  Magnet,  the  Machine  Learning-­‐powered   Threat  Intelligence  Plaworm.   •  Our  models  extrapolate  the  knowledge  of  exis2ng  threat   intelligence  feeds  as  experienced  analysis  would.   •  Models  make  use  of  the  same  data  analyst  would  have.   •  Automa2cally  triages  and  hunts  on  pivots  of  enriched   informa2on     Don’t  want  to  do  all  this  work?  
  • 46. Take  Aways   •  Analyze  your  data.  Extract  value  from  it!   •  Try  before  you  buy!  Different  test  results  mean   different  things  to  different  orgs.   •  Try  the  sample  data,  replicate  the  experiments:   •  hMps://github.com/mlsecproject/2q-­‐test-­‐Winter2015   •  hMp://rpubs.com/alexcpsec/2q-­‐test-­‐Winter2015  
  • 47. Greets  and  Thanks!   •  @kylemaxwell,  @paul4pc  for  helping  with   COMBINE   •  @bfist  for  his  work  on  hMp://sony.aMributed.to   •  @hrbrmstr  for  IPEW  and  chart  revisions  for  TIQ-­‐ TEST   •  All  the  MLSec  Project  community  peps!  
  • 49. Thanks!   •  Q&A?   •  Feedback!   ”The  measure  of  intelligence  is  the  ability  to  change."                  -­‐  Albert  Einstein     Alex  Pinto     @alexcpsec   @MLSecProject   @NiddelCorp