SlideShare a Scribd company logo
Simseer.com
Malware Similarity and Clustering
Made Easy

Silvio Cesare <silvio@ruxcon.org.au>
Introduction
• Simseer.com is a set of web services to analyse
  malware using program structure as a signature..
  Why?

• AV String signatures not very robust.

• Can’t detect ‘approximate’ matches.

• Hard to generate signature for an entire family.

• Program structure improves signature-based
  methods.
Who am I?
• Ph.D. Student at Deakin University.

• Presented at Ruxcon, Black Hat, AusCERT, etc.

• Published in academia.

• Book author         

• Recently relocated to Canberra.
Outline
1. Introduction

2. Simseer.com’s Malware Services

3. Supporting Infrastructure

4. Other Services

5. Conclusion
Signatures
• In my other presentations.
• Signature is based on ‘set of control flow graphs’
Signature Extraction
• Transform ‘set of control flow graphs’ into a
  ‘feature vector’

• Decompilation + N-Grams                                               W|IE
                                                                        |IEH
                                                              W|IEH}R
                                                                        IEH}
                                                                        EH}R
                             proc(){
               L_0           L_0:                   W|IEH}R
                               while (v1 || v2) {
               L_3           L_1:
                                 if (v3) {
 true                        L_2:
               L_6
                                 } else {
        true                 L_4:
                                 }
 L_1           L_7           L_5:
                      true     }
 true                        L_7:
                               return;
 L_2           L_4
                             }
               true

               L_5
Simseer
• Begin start of demo...

• A revamp of my existing
  http://www.FooCodeChu.com service.

• Submit an archive of malware samples.

• Results
  ▫ A similarity matrix comparing samples.
  ▫ An evolutionary tree showing relationships.
Submission Page
Results
Simseer
• Demo complete...

• Use ‘distance between vectors’ to show
  similarity.

• Visualize using phylogenetics software.
SimseerCluster
• Begin demo...

• A new service.

• Submit an archive of malware samples.

• Define the number of clusters.

• Results
  ▫ Samples grouped into clusters.
  ▫ Cross checking samples with AV.
  ▫ Identification of families.
Submission Page
Results
SimseerCluster
• Demo complete...

• Use ‘similarity matrix’ and ‘cosine similarity’.

• Pass to ‘cluster analysis software’ – The Weka
  Machine Learning Toolkit.

• Use Hierarchical clustering.
SimseerSearch
• Begin demo...

• A new service.

• Submit a malware sample.

• Specify threshold of similarity.

• Results
  ▫ All samples in database similar to query.
  ▫ An AV report.
  ▫ Heuristics to detect obfuscations (packing).
Submission Page
Results
Query Benign

                                                                       r



SimseerSearch                     p
                                              d(p,q)
                                                        q




                                                                      Query Malicious
                                      Query




• Demo complete...
                                      Malware




• Use ‘nearest neighbour similarity search’ based
  on ‘Euclidean distance’.

• Packer detection based on entropy analysis.
Supporting Infrastructure
Other Services
• Other services on the same infrastructure
 ▫ Clonewise
 ▫ Bugwise
Clonewise – Detecting embedded
libraries.
Bugwise on real Debian Linux binaries
Future Work
• Integrate Cuckoo sandbox
 ▫ Unpacking with Volatility.
 ▫ Non EXE formats (PDF, DOC, etc).
 ▫ API Call classification (non signature-based).
Conclusion
• Free services.

• Control flow better than traditional string
  signatures.

• Try it!

• http://www.simseer.com

More Related Content

What's hot (20)

PDF
Spark as a distributed Scala
Alex Fruzenshtein
 
PDF
Python to scala
kao kuo-tung
 
PPTX
Java Tutorial Lab 5
Berk Soysal
 
PPT
jimmy hacking (at) Microsoft
Jimmy Schementi
 
PPTX
Go Concurrency Patterns
ElifTech
 
PPT
My first experience with lambda expressions in java
Scheidt & Bachmann
 
ODP
Functional programming in Javascript
Knoldus Inc.
 
PDF
JavaScript for real men
Ivano Malavolta
 
PDF
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
langer4711
 
PPT
Operator Overloading
Nilesh Dalvi
 
PPTX
Advanced oops concept using asp
shenbagavallijanarth
 
PPTX
Sync with async
prabathsl
 
PDF
Angular and The Case for RxJS
Sandi Barr
 
ODP
Functors, Applicatives and Monads In Scala
Knoldus Inc.
 
PPTX
Java Tutorial Lab 3
Berk Soysal
 
PPTX
What's new in C# 8.0 (beta)
Muralidharan Deenathayalan
 
PDF
JavaScript Execution Context
Juan Medina
 
PPT
Compilation
David Halliday
 
PDF
Kotlin scope functions
Waheed Nazir
 
ODP
JavaScript global object, execution contexts & closures
HDR1001
 
Spark as a distributed Scala
Alex Fruzenshtein
 
Python to scala
kao kuo-tung
 
Java Tutorial Lab 5
Berk Soysal
 
jimmy hacking (at) Microsoft
Jimmy Schementi
 
Go Concurrency Patterns
ElifTech
 
My first experience with lambda expressions in java
Scheidt & Bachmann
 
Functional programming in Javascript
Knoldus Inc.
 
JavaScript for real men
Ivano Malavolta
 
Stream Puzzlers – Traps and Pitfalls in Using Java 8 Streams
langer4711
 
Operator Overloading
Nilesh Dalvi
 
Advanced oops concept using asp
shenbagavallijanarth
 
Sync with async
prabathsl
 
Angular and The Case for RxJS
Sandi Barr
 
Functors, Applicatives and Monads In Scala
Knoldus Inc.
 
Java Tutorial Lab 3
Berk Soysal
 
What's new in C# 8.0 (beta)
Muralidharan Deenathayalan
 
JavaScript Execution Context
Juan Medina
 
Compilation
David Halliday
 
Kotlin scope functions
Waheed Nazir
 
JavaScript global object, execution contexts & closures
HDR1001
 

Viewers also liked (6)

PPT
Using Test Triggers for Improved Defect Detection
Charles Schultz
 
PPTX
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Silvio Cesare
 
PPT
Simseer - A Software Similarity Web Service
Silvio Cesare
 
PDF
Defect removal effectiveness
Roy Antony Arnold G
 
PPTX
Migration testing
Indium Software
 
PDF
Defect prevention techniques
Zarko Acimovic
 
Using Test Triggers for Improved Defect Detection
Charles Schultz
 
Simseer and Bugwise - Web Services for Binary-level Software Similarity and D...
Silvio Cesare
 
Simseer - A Software Similarity Web Service
Silvio Cesare
 
Defect removal effectiveness
Roy Antony Arnold G
 
Migration testing
Indium Software
 
Defect prevention techniques
Zarko Acimovic
 
Ad

Similar to Simseer.com - Malware Similarity and Clustering Made Easy (20)

PPTX
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
Silvio Cesare
 
KEY
Gae icc fall2011
Juan Gomez
 
PPTX
Taverna as a service
Rafael C. Jimenez
 
PDF
Looking for Bugs in MonoDevelop
PVS-Studio
 
PDF
Microservices Chaos Testing at Jet
C4Media
 
PPTX
Java-Intro.pptx
VijalJain3
 
PPTX
The Future of Node - @rvagg - NodeConf Christchurch 2015
rvagg
 
PDF
Testing swagger contracts without contract based testing
Алексей Стягайло
 
PDF
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt
 
PPTX
Scala, Play 2.0 & Cloud Foundry
Pray Desai
 
PDF
Better Code through Lint and Checkstyle
Marc Prengemann
 
PDF
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 
PDF
Flask With Server-Sent Event
Tencent
 
PDF
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
ICSM 2011
 
PDF
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
Coen De Roover
 
PPTX
The operation principles of PVS-Studio static code analyzer
Andrey Karpov
 
PDF
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
jeffz
 
PDF
C# for beginners
application developer
 
PPTX
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Xin Ye
 
PPTX
Net framework
Abhishek Mukherjee
 
FooCodeChu - Services for Software Analysis, Malware Detection, and Vulnerabi...
Silvio Cesare
 
Gae icc fall2011
Juan Gomez
 
Taverna as a service
Rafael C. Jimenez
 
Looking for Bugs in MonoDevelop
PVS-Studio
 
Microservices Chaos Testing at Jet
C4Media
 
Java-Intro.pptx
VijalJain3
 
The Future of Node - @rvagg - NodeConf Christchurch 2015
rvagg
 
Testing swagger contracts without contract based testing
Алексей Стягайло
 
Voxxed Days Vienna - The Why and How of Reactive Web-Applications on the JVM
Manuel Bernhardt
 
Scala, Play 2.0 & Cloud Foundry
Pray Desai
 
Better Code through Lint and Checkstyle
Marc Prengemann
 
Automated Testing for Terraform, Docker, Packer, Kubernetes, and More
C4Media
 
Flask With Server-Sent Event
Tencent
 
Postdoc symposium - A Logic Meta-Programming Foundation for Example-Driven Pa...
ICSM 2011
 
A Logic Meta-Programming Foundation for Example-Driven Pattern Detection in O...
Coen De Roover
 
The operation principles of PVS-Studio static code analyzer
Andrey Karpov
 
The Evolution of Async-Programming on .NET Platform (.Net China, C#)
jeffz
 
C# for beginners
application developer
 
Learning to Rank Relevant Files for Bug Reports using Domain Knowledge
Xin Ye
 
Net framework
Abhishek Mukherjee
 
Ad

More from Silvio Cesare (15)

PDF
A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
Silvio Cesare
 
PDF
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
Silvio Cesare
 
PPTX
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
 
PPTX
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Silvio Cesare
 
PPTX
Wire - A Formal Intermediate Language for Binary Analysis
Silvio Cesare
 
PPT
Effective flowgraph-based malware variant detection
Silvio Cesare
 
PPTX
Faster, More Effective Flowgraph-based Malware Classification
Silvio Cesare
 
PPTX
Automated Detection of Software Bugs and Vulnerabilities in Linux
Silvio Cesare
 
PPTX
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
Silvio Cesare
 
PPT
Simple Bugs and Vulnerabilities in Linux Distributions
Silvio Cesare
 
PPT
Fast Automated Unpacking and Classification of Malware
Silvio Cesare
 
PPT
Malware Classification Using Structured Control Flow
Silvio Cesare
 
PPT
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
Silvio Cesare
 
PPT
Security Applications For Emulation
Silvio Cesare
 
PDF
Auditing the Opensource Kernels
Silvio Cesare
 
A BEGINNER’S JOURNEY INTO THE WORLD OF HARDWARE HACKING
Silvio Cesare
 
A WHIRLWIND TOUR OF ACADEMIC TECHNIQUES FOR REAL-WORLD SECURITY RESEARCHERS
Silvio Cesare
 
Detecting Bugs in Binaries Using Decompilation and Data Flow Analysis
Silvio Cesare
 
Clonewise - Automatically Detecting Package Clones and Inferring Security Vu...
Silvio Cesare
 
Wire - A Formal Intermediate Language for Binary Analysis
Silvio Cesare
 
Effective flowgraph-based malware variant detection
Silvio Cesare
 
Faster, More Effective Flowgraph-based Malware Classification
Silvio Cesare
 
Automated Detection of Software Bugs and Vulnerabilities in Linux
Silvio Cesare
 
Malware Variant Detection Using Similarity Search over Sets of Control Flow G...
Silvio Cesare
 
Simple Bugs and Vulnerabilities in Linux Distributions
Silvio Cesare
 
Fast Automated Unpacking and Classification of Malware
Silvio Cesare
 
Malware Classification Using Structured Control Flow
Silvio Cesare
 
A Fast Flowgraph Based Classification System for Packed and Polymorphic Malwa...
Silvio Cesare
 
Security Applications For Emulation
Silvio Cesare
 
Auditing the Opensource Kernels
Silvio Cesare
 

Simseer.com - Malware Similarity and Clustering Made Easy

  • 1. Simseer.com Malware Similarity and Clustering Made Easy Silvio Cesare <[email protected]>
  • 2. Introduction • Simseer.com is a set of web services to analyse malware using program structure as a signature.. Why? • AV String signatures not very robust. • Can’t detect ‘approximate’ matches. • Hard to generate signature for an entire family. • Program structure improves signature-based methods.
  • 3. Who am I? • Ph.D. Student at Deakin University. • Presented at Ruxcon, Black Hat, AusCERT, etc. • Published in academia. • Book author  • Recently relocated to Canberra.
  • 4. Outline 1. Introduction 2. Simseer.com’s Malware Services 3. Supporting Infrastructure 4. Other Services 5. Conclusion
  • 5. Signatures • In my other presentations. • Signature is based on ‘set of control flow graphs’
  • 6. Signature Extraction • Transform ‘set of control flow graphs’ into a ‘feature vector’ • Decompilation + N-Grams W|IE |IEH W|IEH}R IEH} EH}R proc(){ L_0 L_0: W|IEH}R while (v1 || v2) { L_3 L_1: if (v3) { true L_2: L_6 } else { true L_4: } L_1 L_7 L_5: true } true L_7: return; L_2 L_4 } true L_5
  • 7. Simseer • Begin start of demo... • A revamp of my existing http://www.FooCodeChu.com service. • Submit an archive of malware samples. • Results ▫ A similarity matrix comparing samples. ▫ An evolutionary tree showing relationships.
  • 10. Simseer • Demo complete... • Use ‘distance between vectors’ to show similarity. • Visualize using phylogenetics software.
  • 11. SimseerCluster • Begin demo... • A new service. • Submit an archive of malware samples. • Define the number of clusters. • Results ▫ Samples grouped into clusters. ▫ Cross checking samples with AV. ▫ Identification of families.
  • 14. SimseerCluster • Demo complete... • Use ‘similarity matrix’ and ‘cosine similarity’. • Pass to ‘cluster analysis software’ – The Weka Machine Learning Toolkit. • Use Hierarchical clustering.
  • 15. SimseerSearch • Begin demo... • A new service. • Submit a malware sample. • Specify threshold of similarity. • Results ▫ All samples in database similar to query. ▫ An AV report. ▫ Heuristics to detect obfuscations (packing).
  • 18. Query Benign r SimseerSearch p d(p,q) q Query Malicious Query • Demo complete... Malware • Use ‘nearest neighbour similarity search’ based on ‘Euclidean distance’. • Packer detection based on entropy analysis.
  • 20. Other Services • Other services on the same infrastructure ▫ Clonewise ▫ Bugwise
  • 21. Clonewise – Detecting embedded libraries.
  • 22. Bugwise on real Debian Linux binaries
  • 23. Future Work • Integrate Cuckoo sandbox ▫ Unpacking with Volatility. ▫ Non EXE formats (PDF, DOC, etc). ▫ API Call classification (non signature-based).
  • 24. Conclusion • Free services. • Control flow better than traditional string signatures. • Try it! • http://www.simseer.com