BATTLING UNKNOWN MALWARE
WITH MACHINE LEARNING
DR. SVEN KRASSER CHIEF SCIENTIST
@SVENKRASSER
FALCON ON
VIRUSTOTAL
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SUBMITTING TO VIRUSTOTAL
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SCAN RESULTS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SCAN RESULTS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SCAN RESULTS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
MACHINE LEARNING
PRIMER
More on this: watch http://tinyurl.com/MLcrowdcast
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Some Data to Get Started:
1988 ANTHROPOMETRIC
SURVEY OF ARMY PERSONNEL
Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
• Over 4000 soldiers surveyed
• Over 100 measurements
• Reported by gender
Data
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FIRST LOOK
Height [mm]
Density
• Difference in
distribution
• Significant overlap
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SECOND
DIMENSION
Height [mm]
Weight[10-1
kg]
• Correlation
• Overlap
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FEATURE
SELECTION
“Buttock Circumference” [mm]
Weight[10-1
kg]
• Correlation
• Reduced overlap
• Selection of
features matters
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
• Let’s assume we
want to detect
males (blue)
• I.e. “blue” is our
positive class
• TP: classify blue
as blue
• Note some
misclassifications
• FP: classify red as
blue
• FN: classify blue
as red
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S
CLASSIFY
• Get more “blue”
right (true positives)
• Get more “red”
wrong (false
positives)
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
RECEIVER
OPERATING
CHARACTERISTICS
CURVE
False Positive Rate
TruePositiveRate
Detect	more	by	accepting	more	false	positives
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
MORE
DIMENSIONS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
MISSION ACCOMPLISHED:
WE JUST ADD MORE DIMENSIONS…
RIGHT?
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
CURSE OF DIMENSIONALITY
REDUCED
predictive performance
INCREASED
training time
SLOWER
classification
LARGER
memory footprint
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Source: https://commons.wikimedia.org/w/index.php?curid=2257082 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Source: https://commons.wikimedia.org/w/index.php?curid=2257082
Height (mm)
Weight[10-1
kg]
DIMENSIONALITY
AND SPARSENESS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
2016	CrowdStrike,	Inc.	All	rights	reserved.
Height (mm)
Weight[10-1
kg]
DIMENSIONALITY
AND SPARSENESS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S APPLY THIS TO
SECURITY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FILE
ANALYSIS
AKA Static Analysis
• THE GOOD
– Relatively fast
– Scalable
– No need to detonate
– Platform independent, can be done at gateway
• THE BAD
– Limited insight due to narrow view
– Different file types require different techniques
– Different subtypes need special consideration
– Packed files
– .Net
– Installers
– EXEs vs DLLs
– Obfuscations (yet good if detectable)
– Ineffective against exploitation and malware-less attacks
– Asymmetry: a fraction of a second to decide for the
defender, months to craft for the attacker
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FILE CONTENT
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
EXAMPLE FEATURES
32/64 BIT
EXECUTABLE
GUI
SUBSYSTEM
COMMAND
LINE
SUBSYSTEM
FILE SIZE TIMESTAMP
DEBUG
INFORMATION
PRESENT
PACKER TYPE FILE ENTROPY
NUMBER OF
SECTIONS
NUMBER
WRITABLE
NUMBER
READABLE
NUMBER
EXECUTABLE
DISTRIBUTION
OF SECTION
ENTROPY
IMPORTED DLL
NAMES
IMPORTED
FUNCTION
NAMES
COMPILER
ARTIFACTS
LINKER
ARTIFACTS
RESOURCE
DATA
EMBEDDED
PROTOCOL
STRINGS
EMBEDDED
IPS/DOMAINS
EMBEDDED
PATHS
EMBEDDED
PRODUCT
META DATA
DIGITAL
SIGNATURE
ICON
CONTENT …
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
String-based feature
Executablesectionsize-basedfeature
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
COMBINING
FEATURES
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Subspace Projection A
SubspaceProjectionB
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
COMBINING
FEATURES
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
False Positive Rate
TruePositiveRate
Detect	more	by	accepting	more	false	positives
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
ARMY DATA ROC
CURVE
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
False Positive Rate
TruePositiveRate
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
ML MALWARE
DETECTION ROC
CURVE
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
APTS & 99% OF MALWARE DETECTED…
36
Chanceofatleastone
successforadversary
Number of attempts
1%
>99%
500
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
MALWARE
40%
THREAT
SOPHISTICATION
MALWARE
STOPPING
MALWARE
IS NOT
ENOUGH
HARDERTOPREVENT
&DETECT
LOW
HIGH
HIGH
LOW
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
THREAT
SOPHISTICATION
MALWARE
NON-MALWARE
ATTACKS
MALWARE
40%
NATION-
STATES
60%
NON-MALWARE
ATTACKS
ORGANIZED
CRIMINAL GANGS
HACKTIVISTS/
VIGILANTES
TERRORISTS CYBER-
CRIMINALS
YOU NEED COMPLETE
BREACH
PREVENTION
HARDERTOPREVENT
&DETECT
LOW
HIGH
HIGH
LOW
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Next-Generation Endpoint Protection
Cloud Delivered. Enriched by Threat Intelligence
MANAGED
HUNTING
ENDPOINT DETECTION
AND RESPONSE
NEXT-GEN
ANTIVIRUS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
ML SETTINGS WITHIN FALCON HOST
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
ML PREVENTION IN ACTION
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
KEY
POINTS
• Machine Learning is an effective tool against
unknown malware
• Try it out on VirusTotal
• Trading off true positives and false positives
• Detecting 99% malware means an APT has a
100% chance of getting malware into your
environment
• The majority of intrusions are not malware-
based
• Avoid silent failure
• Use a comprehensive array of techniques
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
www.crowdstrike.com
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.

Battling Unknown Malware with Machine Learning