SlideShare a Scribd company logo
How Triton can help to reverse virtual machine based
software protections
How to don’t kill yourself when you reverse obfuscated codes.
Jonathan Salwan and Romain Thomas
CSAW SOS in NYC, November 10, 2016
About us
Romain Thomas
• Security Research Engineer at Quarkslab
• Working on obfuscation and software protection
Jonathan Salwan
• Security Research Engineer at Quarkslab
• Working on program analysis and software verification
2
Roadmap of this talk
Part 1 Short introduction to the Triton framework
Part 2 Short introduction to virtual machine based software protections
Part 3 Demo - Triton vs VMs
3
The Triton framework [6]
Triton in a nutshell
• A Dynamic Binary Analysis Framework
• Deals with the Intel x86 and x86 64 Instruction Set Architecture (ISA)
• Contains:
• Dynamic Symbolic Execution (DSE) engine [4, 7]
• Taint analysis engine
• Emulation engine
• Representation of the ISA behaviour into an Abstract Syntax Tree (AST)
• AST simplification engine
• Two syntax representations of the AST
• Python
• SMT2
5
Triton’s design
Symbolic
Execution
Engine
Taint
Engine
IR
SMT2-Lib
Semantics
SMT
Solver
Interface
SMT
Optimization
Passes
Triton internal components
Multi-Arch
design
SMT
Simplifications
Rules
Taint
Engine
API
C++ / Python
Pin
Valgrind
DynamoRio
Qemu
DB (e.g: mysql)
Example of Tracers
LibTriton.so
6
The API’s input - Opcode to semantics
API
Instruction semantics
over AST
Instruction
7
The API’s input - Semantics with a context
API
Instruction semantics
over AST
Instruction
Context
8
The API’s input - Taint Analysis
API
Taint analysisInstruction semantics
over AST
9
The API’s input - Symbolic Execution
API
π
Symbolic ExecutionInstruction semantics
over AST
10
The API’s input - Simplification / Transformation
API
Simplification / Transformation
Instruction semantics
over AST
11
The API’s input - AST representations
Instruction semantics
over AST
(bvadd (_ bv1 8) (_ bv2 8))
((0x1 + 0x2) & 0xFF)
12
The API’s input - Symbolic Emulation
API
Instruction 1
Instruction 2
Instruction 3
Instruction 4
Symbolic
Emulation
13
Example - How to define an opcode and context
>>> inst = Instruction("x48x31xD0") # xor rax, rdx
>>> inst.setAddress(0x400000)
>>> inst.updateContext(Register(REG.RAX, 0x1234))
>>> inst.updateContext(Register(REG.RDX, 0x5678))
>>> processing(inst)
14
Example - How to get semantics expressions
>>> processing(inst)
>>> print inst
400000: xor rax, rdx
>>> for expr in inst.getSymbolicExpressions():
... print expr
...
ref_0 = (0x1234 ^ 05678) # XOR operation
ref_1 = 0x0 # Clears carry flag
ref_2 = 0x0 # Clears overflow flag
ref_3 = ((0x1 ^ [... skipped ...] & 0x1)) # Parity flag
ref_4 = ((ref_0 >> 63) & 0x1) # Sign flag
ref_5 = (0x1 if (ref_0 == 0x0) else 0x0) # Zero flag
ref_6 = 0x400003 # Program Counter 15
Example - How to get implicit and explicit read registers
>>> for r in inst.getReadRegisters():
... print r
...
(rax:64 bv[63..0], 0x1234)
(rdx:64 bv[63..0], 0x5678)
16
Example - How to get implicit and explicit written registers
>>> for w in inst.getWrittenRegisters():
... print w
...
(rax:64 bv[63..0], (0x1234 ^ 0x5678))
(rip:64 bv[63..0], 0x400003)
(cf:1 bv[0..0], 0x0)
(of:1 bv[0..0], 0x0)
(pf:1 bv[0..0], ... skipped ...)
(sf:1 bv[0..0], ((ref_0 >> 63) & 0x1))
(zf:1 bv[0..0], (0x1 if (ref_0 == 0x0) else 0x0))
17
To resume: What kind of information can I get from an instruction?
• All implicit and explicit semantics of an instruction
• GET, PUT, LOAD, STORE
• Semantics (side effects included) representation via an abstract syntax tree based
on the Static Single Assignment (SSA) form
18
What about emulation?
>>> inst1 = Instruction("x48xc7xc0x05x00x00x00") # mov rax, 5
>>> inst2 = Instruction("x48x83xC0x02") # add rax, 2
>>> processing(inst1)
>>> processing(inst2)
>>> getFullAstFromId(getSymbolicRegisterId(REG.RAX))
((0x5 + 0x2) & 0xFFFFFFFFFFFFFFFF)
>>> getAstFromId(getSymbolicRegisterId(REG.RAX)).evaluate()
7L
19
Ok, but what can I do with all of this?
• Use taint analysis to help during reverse engineering
• Use symbolic execution to cover code
• Use symbolic execution to know what value(s) can hold a register or memory cell
• Simplify expressions for deobfuscation
• Transform expressions for obfuscation
• Match behaviour models for vulnerabilities research
• Be imaginative :)
20
Mmmmh, and where instruction sequences can come from?
• From dynamic tracers like Pin, Valgrind, Qemu, ...
• From a memory dump
• From static tools like IDA or whatever...
21
Cool, but how many instruction semantics are supported by Triton?
• Development:
• 256 Intel x86 64 instructions 1
• Included 116 SSE/MMX/AVX instructions
• Testing:
• The tests suite 2
of the Qemu TCG 3
• Traces differential 4
1
http://triton.quarkslab.com/documentation/doxygen/SMT Semantics Supported page.html
2
http://github.com/qemu/qemu/tree/master/tests/tcg
3
http://wiki.qemu.org/Documentation/TCG
4
http://triton.quarkslab.com/blog/What-kind-of-semantics-information-Triton-can-provide/#4
22
Virtual Machine Based Software
Protections
VM Based Software Protections
Definition:
It’s a kind of obfuscation which transforms an original instruction set (e.g. x86) into
another custom instruction set (VM implementation).
24
Example: Virtualization
mov rax, 0x123456
and rax, rbx
call func
push 0x1 # rax_id
push 0x123456
call VM_MOVE
push rbx
push rax
mov rcx, [rsp]
mov rdx, [rsp - 0x4]
and rcx, rdx
mov rax, rcx
mov rbx, 0x1
call trampoline 25
Where are VMs
• Languages: Python, Java...
• Obfuscator: VM Protect 5, Tigress 6 [1, 3], Denuvo 7
• Malwares: Zeus 8
• CTF...
5
http://vmpsoft.com/
6
http://tigress.cs.arizona.edu/
7
http://www.denuvo.com/
8
http://www.miasm.re/blog/2016/09/03/zeusvm analysis.html
26
VM abstract architecture
Fetch Instruction
Decode Instruction
Dispatch
Handler 1Handler 2 Handler 3
Terminator
27
VM abstract architecture
Fetch Instruction:
Fetch the instruction which will be executed by the VM.
Decode Instruction:
Decode the instruction according to the VM instruction set.
Example:
decode(01 11 12):
• Opcode: 0x01
• Operand 1: 0x11
• Operand 2: 0x12
28
VM abstract architecture
Dispatcher:
Jump to the right handler according to opcode and/or operands.
Handlers:
Handlers are the implementation of the VM instruction set.
For instance, the handler for the instruction
mov REG, IMM
could be:
xor REG, REG
or REG, IMM
Terminator:
Finishes the VM execution or continues its execution.
29
Dispatcher
We can have two kinds of dispatcher:
• switch case like
• jump table
30
A switch case like dispatcher
31
A jump table based dispatcher
32
Using Triton to reverse a VM
Fetch Instruction
Decode Instruction
Dispatch
Handler 2Handler 1 Handler 3
Terminator
Triton
Triton
33
Demo: Tigress VM
Tigress challenges
35
Tigress challenges
$ ./tigress-challenge 1234
3920664950602727424
$ ./tigress-challenge 326423564
16724117216240346858
36
Tigress challenges
Problem: Given a very secret algorithm obfuscated with a VM. How can we recover
the algorithm without fully reversing the VM?
37
Step 1: Symbolically emulate the binary
Trace semantics
Symbolic
Emulation
Obfuscated binary
38
Step 2: Define the user input as symbolic variable
π
39
Step 3: Concretize everything which is not related to user input
+
x
π +
3 4
-
1 x
2 5
+
x
π 7
9
40
Step 4: Use a better canonical representation of expressions
• Arybo [2] uses the Algebraic Normal Form (ANF) representation
ππ
Triton AST Arybo AST
41
Step 5: Possible use of symbolic simplifications
ππ
Arybo AST Arybo AST
on steroids
8
https://pythonhosted.org/arybo/concepts.html
42
Step 6: From Arybo to LLVM-IR
π
Arybo AST LLVM-IR
43
Step 7: Recompile with -O2 optimization and win!
LLVM-IR
Deobfuscated binary
44
Results with only one trace
45
Cover paths to reconstruct the CFG
U =
46
Results with the union of two traces
47
Time of extraction per trace
48
Let me try by myself
Release: Everything related to this analysis is available on github 9.
9
https://github.com/JonathanSalwan/Tigress protection
49
Demo: Unknown VM
VM Architecture
Fetch Instruction
Decode Instruction
Dispatch
Switch case
Handler 1Handler 2 Handler 3
Terminator
op0 op1 op2 op3 op4
51
VM Architecture
52
Goal
Fetch Instruction
Decode Instruction
Dispatch
Switch case
Handler 1Handler 2 Handler 3
Terminator
op0 op1 op2 op3 op4
53
Goal
Fetch Instruction
Decode Instruction
Dispatch
Switch case
Handler 1Handler 2 Handler 3
Terminator
op0 op1 op2 op3 op4
Tim
e
out
54
Goal
Fetch Instruction
Decode Instruction
Dispatch
Switch case
Handler 1Handler 2 Handler 3
Terminator
op0 op1 op2 op3 op4
Step
1
Step2
55
Goal
Decode
BB2
BB3 BB4
Handler
c1→2 and c2→4
(BB4 and c4→5) or (BB3 and c3→5)
c1→2
c2→4
c3→5
c4→5
56
Conclusion
Conclusion
• Symbolic execution is powerful against obfuscations
• Use mathematical complexity expressions against such attacks
• The goal is to imply a timeout on SMT solvers side
58
Thanks
Any Questions?
59
Acknowledgements
• Thanks to Brendan Dolan-Gavitt for his invitation to the S.O.S workshop!
• Kudos to Adrien Guinet for his Arybo 10 framework!
10
https://github.com/quarkslab/arybo
60
Contact us
• Romain Thomas
• rthomas at quarkslab com
• @rh0main
• Jonathan Salwan
• jsalwan at quarkslab com
• @JonathanSalwan
• Triton team
• triton at quarkslab com
• @qb triton
• irc: #qb triton@freenode.org
61
References I
C. Collberg, S. Martin, J. Myers, and J. Nagra.
Distributed application tamper detection via continuous software updates.
In Proceedings of the 28th Annual Computer Security Applications Conference,
ACSAC ’12, pages 319–328, New York, NY, USA, 2012. ACM.
N. Eyrolles, A. Guinet, and M. Videau.
Arybo: Manipulation, canonicalization and identification of mixed
boolean-arithmetic symbolic expressions.
GreHack, France, Grenoble, 2016.
Y. Kanzaki, A. Monden, and C. Collberg.
Code artificiality: A metric for the code stealth based on an n-gram model.
In SPRO 2015 International Workshop on Software Protection, 2015.
62
References II
J. C. King.
Symbolic execution and program testing.
Communications of the ACM, 19(7):385–394, 1976.
C. Lattner and V. Adve.
LLVM: A compilation framework for lifelong program analysis and
transformation.
pages 75–88, San Jose, CA, USA, Mar 2004.
63
References III
F. Saudel and J. Salwan.
Triton: A dynamic symbolic execution framework.
In Symposium sur la s´ecurit´e des technologies de l’information et des
communications, SSTIC, France, Rennes, June 3-5 2015, pages 31–54. SSTIC,
2015.
K. Sen, D. Marinov, and G. Agha.
Cute: a concolic unit testing engine for c.
In ACM SIGSOFT Software Engineering Notes, volume 30, pages 263–272. ACM,
2005.
64

More Related Content

What's hot (20)

PDF
TDOH x 台科 pwn課程
Weber Tsai
 
PDF
Kubernetes ネットワーキングのすべて
LINE Corporation
 
PDF
Vivado hls勉強会1(基礎編)
marsee101
 
PDF
Xen in Safety-Critical Systems - Critical Summit 2022
Stefano Stabellini
 
PDF
Developing the fastest HTTP/2 server
Kazuho Oku
 
PPTX
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
PDF
Networking in Java with NIO and Netty
Constantine Slisenka
 
PDF
Codeception introduction and use in Yii
IlPeach
 
PDF
Zynq MPSoC勉強会 Codec編
Tetsuya Morizumi
 
PDF
The Microkernel Mach Under NeXTSTEP
Gregor Schmidt
 
PPTX
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
PDF
Ixgbe internals
SUSE Labs Taipei
 
PDF
Android OTA updates
Gary Bisson
 
PPTX
Bug Bounty #Defconlucknow2016
Shubham Gupta
 
PPTX
Apache Kafka Security
DataWorks Summit/Hadoop Summit
 
PDF
超絶技巧プログラミングの世界(FTD2015)
mametter
 
PDF
from Source to Binary: How GNU Toolchain Works
National Cheng Kung University
 
PDF
仕様起因の手戻りを減らして開発効率アップを目指すチャレンジ 【DeNA TechCon 2020 ライブ配信】
DeNA
 
PDF
x86とコンテキストスイッチ
Masami Ichikawa
 
PPTX
Load Balancing Apps in Docker Swarm with NGINX
NGINX, Inc.
 
TDOH x 台科 pwn課程
Weber Tsai
 
Kubernetes ネットワーキングのすべて
LINE Corporation
 
Vivado hls勉強会1(基礎編)
marsee101
 
Xen in Safety-Critical Systems - Critical Summit 2022
Stefano Stabellini
 
Developing the fastest HTTP/2 server
Kazuho Oku
 
Getting up to speed with MirrorMaker 2 | Mickael Maison, IBM and Ryanne Dolan...
HostedbyConfluent
 
Networking in Java with NIO and Netty
Constantine Slisenka
 
Codeception introduction and use in Yii
IlPeach
 
Zynq MPSoC勉強会 Codec編
Tetsuya Morizumi
 
The Microkernel Mach Under NeXTSTEP
Gregor Schmidt
 
Flexible and Real-Time Stream Processing with Apache Flink
DataWorks Summit
 
Ixgbe internals
SUSE Labs Taipei
 
Android OTA updates
Gary Bisson
 
Bug Bounty #Defconlucknow2016
Shubham Gupta
 
Apache Kafka Security
DataWorks Summit/Hadoop Summit
 
超絶技巧プログラミングの世界(FTD2015)
mametter
 
from Source to Binary: How GNU Toolchain Works
National Cheng Kung University
 
仕様起因の手戻りを減らして開発効率アップを目指すチャレンジ 【DeNA TechCon 2020 ライブ配信】
DeNA
 
x86とコンテキストスイッチ
Masami Ichikawa
 
Load Balancing Apps in Docker Swarm with NGINX
NGINX, Inc.
 

Similar to How Triton can help to reverse virtual machine based software protections (20)

PDF
GOD MODE UNLOCKED - Hardware Backdoors in x86 CPUs
Priyanka Aash
 
PDF
Escalating Privileges in Linux using Fault Injection - FDTC 2017
Cristofaro Mune
 
PDF
Meltdown & Spectre attacks
Marian Marinov
 
PDF
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
CODE BLUE
 
PDF
GOD MODE Unlocked: Hardware backdoors in x86 CPUs
Priyanka Aash
 
PDF
Shellcoding in linux
Ajin Abraham
 
PDF
[Ruxcon 2011] Post Memory Corruption Memory Analysis
Moabi.com
 
PPTX
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 
PDF
OT Security - h-c0n 2020
Jose Palanco
 
PDF
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat Security Conference
 
PPTX
Grow and Shrink - Dynamically Extending the Ruby VM Stack
KeitaSugiyama1
 
PDF
Finding Xori: Malware Analysis Triage with Automated Disassembly
Priyanka Aash
 
PDF
running stable diffusion on android
Koan-Sin Tan
 
PPT
Georgy Nosenko - An introduction to the use SMT solvers for software security
DefconRussia
 
PPTX
Track c-High speed transaction-based hw-sw coverification -eve
chiportal
 
PDF
Java Memory Model
Łukasz Koniecki
 
PDF
The Spectre of Meltdowns
Andriy Berestovskyy
 
PDF
Valgrind
aidanshribman
 
PDF
Debugging linux kernel tools and techniques
Satpal Parmar
 
PDF
Xvisor: embedded and lightweight hypervisor
National Cheng Kung University
 
GOD MODE UNLOCKED - Hardware Backdoors in x86 CPUs
Priyanka Aash
 
Escalating Privileges in Linux using Fault Injection - FDTC 2017
Cristofaro Mune
 
Meltdown & Spectre attacks
Marian Marinov
 
PANDEMONIUM: Automated Identification of Cryptographic Algorithms using Dynam...
CODE BLUE
 
GOD MODE Unlocked: Hardware backdoors in x86 CPUs
Priyanka Aash
 
Shellcoding in linux
Ajin Abraham
 
[Ruxcon 2011] Post Memory Corruption Memory Analysis
Moabi.com
 
The System of Automatic Searching for Vulnerabilities or how to use Taint Ana...
Positive Hack Days
 
OT Security - h-c0n 2020
Jose Palanco
 
BlueHat v18 || A mitigation for kernel toctou vulnerabilities
BlueHat Security Conference
 
Grow and Shrink - Dynamically Extending the Ruby VM Stack
KeitaSugiyama1
 
Finding Xori: Malware Analysis Triage with Automated Disassembly
Priyanka Aash
 
running stable diffusion on android
Koan-Sin Tan
 
Georgy Nosenko - An introduction to the use SMT solvers for software security
DefconRussia
 
Track c-High speed transaction-based hw-sw coverification -eve
chiportal
 
Java Memory Model
Łukasz Koniecki
 
The Spectre of Meltdowns
Andriy Berestovskyy
 
Valgrind
aidanshribman
 
Debugging linux kernel tools and techniques
Satpal Parmar
 
Xvisor: embedded and lightweight hypervisor
National Cheng Kung University
 
Ad

Recently uploaded (20)

PPTX
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
PPTX
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
PPTX
Green Building & Energy Conservation ppt
Sagar Sarangi
 
PPTX
site survey architecture student B.arch.
sri02032006
 
PPTX
Presentation on Foundation Design for Civil Engineers.pptx
KamalKhan563106
 
PDF
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
PDF
Comparative Analysis of the Use of Iron Ore Concentrate with Different Binder...
msejjournal
 
PDF
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
PPTX
Data_Analytics_Presentation_By_Malik_Azanish_Asghar.pptx
azanishmalik1
 
PPT
Total time management system and it's applications
karunanidhilithesh
 
PPTX
Smart_Cities_IoT_Integration_Presentation.pptx
YashBhisade1
 
PDF
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
PPT
04 Origin of Evinnnnnnnnnnnnnnnnnnnnnnnnnnl-notes.ppt
LuckySangalala1
 
PPTX
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
PPTX
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
PPTX
drones for disaster prevention response.pptx
NawrasShatnawi1
 
PDF
Number Theory practice session 25.05.2025.pdf
DrStephenStrange4
 
PDF
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
PDF
Detailed manufacturing Engineering and technology notes
VIKKYsing
 
PDF
OT-cybersecurity-solutions-from-TXOne-Deployment-Model-Overview-202306.pdf
jankokersnik70
 
原版一样(Acadia毕业证书)加拿大阿卡迪亚大学毕业证办理方法
Taqyea
 
artificial intelligence applications in Geomatics
NawrasShatnawi1
 
Green Building & Energy Conservation ppt
Sagar Sarangi
 
site survey architecture student B.arch.
sri02032006
 
Presentation on Foundation Design for Civil Engineers.pptx
KamalKhan563106
 
Set Relation Function Practice session 24.05.2025.pdf
DrStephenStrange4
 
Comparative Analysis of the Use of Iron Ore Concentrate with Different Binder...
msejjournal
 
IoT - Unit 2 (Internet of Things-Concepts) - PPT.pdf
dipakraut82
 
Data_Analytics_Presentation_By_Malik_Azanish_Asghar.pptx
azanishmalik1
 
Total time management system and it's applications
karunanidhilithesh
 
Smart_Cities_IoT_Integration_Presentation.pptx
YashBhisade1
 
UNIT-4-FEEDBACK AMPLIFIERS AND OSCILLATORS (1).pdf
Sridhar191373
 
04 Origin of Evinnnnnnnnnnnnnnnnnnnnnnnnnnl-notes.ppt
LuckySangalala1
 
Benefits_^0_Challigi😙🏡💐8fenges[1].pptx
akghostmaker
 
Types of Bearing_Specifications_PPT.pptx
PranjulAgrahariAkash
 
drones for disaster prevention response.pptx
NawrasShatnawi1
 
Number Theory practice session 25.05.2025.pdf
DrStephenStrange4
 
Ethics and Trustworthy AI in Healthcare – Governing Sensitive Data, Profiling...
AlqualsaDIResearchGr
 
Detailed manufacturing Engineering and technology notes
VIKKYsing
 
OT-cybersecurity-solutions-from-TXOne-Deployment-Model-Overview-202306.pdf
jankokersnik70
 
Ad

How Triton can help to reverse virtual machine based software protections

  • 1. How Triton can help to reverse virtual machine based software protections How to don’t kill yourself when you reverse obfuscated codes. Jonathan Salwan and Romain Thomas CSAW SOS in NYC, November 10, 2016
  • 2. About us Romain Thomas • Security Research Engineer at Quarkslab • Working on obfuscation and software protection Jonathan Salwan • Security Research Engineer at Quarkslab • Working on program analysis and software verification 2
  • 3. Roadmap of this talk Part 1 Short introduction to the Triton framework Part 2 Short introduction to virtual machine based software protections Part 3 Demo - Triton vs VMs 3
  • 5. Triton in a nutshell • A Dynamic Binary Analysis Framework • Deals with the Intel x86 and x86 64 Instruction Set Architecture (ISA) • Contains: • Dynamic Symbolic Execution (DSE) engine [4, 7] • Taint analysis engine • Emulation engine • Representation of the ISA behaviour into an Abstract Syntax Tree (AST) • AST simplification engine • Two syntax representations of the AST • Python • SMT2 5
  • 6. Triton’s design Symbolic Execution Engine Taint Engine IR SMT2-Lib Semantics SMT Solver Interface SMT Optimization Passes Triton internal components Multi-Arch design SMT Simplifications Rules Taint Engine API C++ / Python Pin Valgrind DynamoRio Qemu DB (e.g: mysql) Example of Tracers LibTriton.so 6
  • 7. The API’s input - Opcode to semantics API Instruction semantics over AST Instruction 7
  • 8. The API’s input - Semantics with a context API Instruction semantics over AST Instruction Context 8
  • 9. The API’s input - Taint Analysis API Taint analysisInstruction semantics over AST 9
  • 10. The API’s input - Symbolic Execution API π Symbolic ExecutionInstruction semantics over AST 10
  • 11. The API’s input - Simplification / Transformation API Simplification / Transformation Instruction semantics over AST 11
  • 12. The API’s input - AST representations Instruction semantics over AST (bvadd (_ bv1 8) (_ bv2 8)) ((0x1 + 0x2) & 0xFF) 12
  • 13. The API’s input - Symbolic Emulation API Instruction 1 Instruction 2 Instruction 3 Instruction 4 Symbolic Emulation 13
  • 14. Example - How to define an opcode and context >>> inst = Instruction("x48x31xD0") # xor rax, rdx >>> inst.setAddress(0x400000) >>> inst.updateContext(Register(REG.RAX, 0x1234)) >>> inst.updateContext(Register(REG.RDX, 0x5678)) >>> processing(inst) 14
  • 15. Example - How to get semantics expressions >>> processing(inst) >>> print inst 400000: xor rax, rdx >>> for expr in inst.getSymbolicExpressions(): ... print expr ... ref_0 = (0x1234 ^ 05678) # XOR operation ref_1 = 0x0 # Clears carry flag ref_2 = 0x0 # Clears overflow flag ref_3 = ((0x1 ^ [... skipped ...] & 0x1)) # Parity flag ref_4 = ((ref_0 >> 63) & 0x1) # Sign flag ref_5 = (0x1 if (ref_0 == 0x0) else 0x0) # Zero flag ref_6 = 0x400003 # Program Counter 15
  • 16. Example - How to get implicit and explicit read registers >>> for r in inst.getReadRegisters(): ... print r ... (rax:64 bv[63..0], 0x1234) (rdx:64 bv[63..0], 0x5678) 16
  • 17. Example - How to get implicit and explicit written registers >>> for w in inst.getWrittenRegisters(): ... print w ... (rax:64 bv[63..0], (0x1234 ^ 0x5678)) (rip:64 bv[63..0], 0x400003) (cf:1 bv[0..0], 0x0) (of:1 bv[0..0], 0x0) (pf:1 bv[0..0], ... skipped ...) (sf:1 bv[0..0], ((ref_0 >> 63) & 0x1)) (zf:1 bv[0..0], (0x1 if (ref_0 == 0x0) else 0x0)) 17
  • 18. To resume: What kind of information can I get from an instruction? • All implicit and explicit semantics of an instruction • GET, PUT, LOAD, STORE • Semantics (side effects included) representation via an abstract syntax tree based on the Static Single Assignment (SSA) form 18
  • 19. What about emulation? >>> inst1 = Instruction("x48xc7xc0x05x00x00x00") # mov rax, 5 >>> inst2 = Instruction("x48x83xC0x02") # add rax, 2 >>> processing(inst1) >>> processing(inst2) >>> getFullAstFromId(getSymbolicRegisterId(REG.RAX)) ((0x5 + 0x2) & 0xFFFFFFFFFFFFFFFF) >>> getAstFromId(getSymbolicRegisterId(REG.RAX)).evaluate() 7L 19
  • 20. Ok, but what can I do with all of this? • Use taint analysis to help during reverse engineering • Use symbolic execution to cover code • Use symbolic execution to know what value(s) can hold a register or memory cell • Simplify expressions for deobfuscation • Transform expressions for obfuscation • Match behaviour models for vulnerabilities research • Be imaginative :) 20
  • 21. Mmmmh, and where instruction sequences can come from? • From dynamic tracers like Pin, Valgrind, Qemu, ... • From a memory dump • From static tools like IDA or whatever... 21
  • 22. Cool, but how many instruction semantics are supported by Triton? • Development: • 256 Intel x86 64 instructions 1 • Included 116 SSE/MMX/AVX instructions • Testing: • The tests suite 2 of the Qemu TCG 3 • Traces differential 4 1 http://triton.quarkslab.com/documentation/doxygen/SMT Semantics Supported page.html 2 http://github.com/qemu/qemu/tree/master/tests/tcg 3 http://wiki.qemu.org/Documentation/TCG 4 http://triton.quarkslab.com/blog/What-kind-of-semantics-information-Triton-can-provide/#4 22
  • 23. Virtual Machine Based Software Protections
  • 24. VM Based Software Protections Definition: It’s a kind of obfuscation which transforms an original instruction set (e.g. x86) into another custom instruction set (VM implementation). 24
  • 25. Example: Virtualization mov rax, 0x123456 and rax, rbx call func push 0x1 # rax_id push 0x123456 call VM_MOVE push rbx push rax mov rcx, [rsp] mov rdx, [rsp - 0x4] and rcx, rdx mov rax, rcx mov rbx, 0x1 call trampoline 25
  • 26. Where are VMs • Languages: Python, Java... • Obfuscator: VM Protect 5, Tigress 6 [1, 3], Denuvo 7 • Malwares: Zeus 8 • CTF... 5 http://vmpsoft.com/ 6 http://tigress.cs.arizona.edu/ 7 http://www.denuvo.com/ 8 http://www.miasm.re/blog/2016/09/03/zeusvm analysis.html 26
  • 27. VM abstract architecture Fetch Instruction Decode Instruction Dispatch Handler 1Handler 2 Handler 3 Terminator 27
  • 28. VM abstract architecture Fetch Instruction: Fetch the instruction which will be executed by the VM. Decode Instruction: Decode the instruction according to the VM instruction set. Example: decode(01 11 12): • Opcode: 0x01 • Operand 1: 0x11 • Operand 2: 0x12 28
  • 29. VM abstract architecture Dispatcher: Jump to the right handler according to opcode and/or operands. Handlers: Handlers are the implementation of the VM instruction set. For instance, the handler for the instruction mov REG, IMM could be: xor REG, REG or REG, IMM Terminator: Finishes the VM execution or continues its execution. 29
  • 30. Dispatcher We can have two kinds of dispatcher: • switch case like • jump table 30
  • 31. A switch case like dispatcher 31
  • 32. A jump table based dispatcher 32
  • 33. Using Triton to reverse a VM Fetch Instruction Decode Instruction Dispatch Handler 2Handler 1 Handler 3 Terminator Triton Triton 33
  • 36. Tigress challenges $ ./tigress-challenge 1234 3920664950602727424 $ ./tigress-challenge 326423564 16724117216240346858 36
  • 37. Tigress challenges Problem: Given a very secret algorithm obfuscated with a VM. How can we recover the algorithm without fully reversing the VM? 37
  • 38. Step 1: Symbolically emulate the binary Trace semantics Symbolic Emulation Obfuscated binary 38
  • 39. Step 2: Define the user input as symbolic variable π 39
  • 40. Step 3: Concretize everything which is not related to user input + x π + 3 4 - 1 x 2 5 + x π 7 9 40
  • 41. Step 4: Use a better canonical representation of expressions • Arybo [2] uses the Algebraic Normal Form (ANF) representation ππ Triton AST Arybo AST 41
  • 42. Step 5: Possible use of symbolic simplifications ππ Arybo AST Arybo AST on steroids 8 https://pythonhosted.org/arybo/concepts.html 42
  • 43. Step 6: From Arybo to LLVM-IR π Arybo AST LLVM-IR 43
  • 44. Step 7: Recompile with -O2 optimization and win! LLVM-IR Deobfuscated binary 44
  • 45. Results with only one trace 45
  • 46. Cover paths to reconstruct the CFG U = 46
  • 47. Results with the union of two traces 47
  • 48. Time of extraction per trace 48
  • 49. Let me try by myself Release: Everything related to this analysis is available on github 9. 9 https://github.com/JonathanSalwan/Tigress protection 49
  • 51. VM Architecture Fetch Instruction Decode Instruction Dispatch Switch case Handler 1Handler 2 Handler 3 Terminator op0 op1 op2 op3 op4 51
  • 53. Goal Fetch Instruction Decode Instruction Dispatch Switch case Handler 1Handler 2 Handler 3 Terminator op0 op1 op2 op3 op4 53
  • 54. Goal Fetch Instruction Decode Instruction Dispatch Switch case Handler 1Handler 2 Handler 3 Terminator op0 op1 op2 op3 op4 Tim e out 54
  • 55. Goal Fetch Instruction Decode Instruction Dispatch Switch case Handler 1Handler 2 Handler 3 Terminator op0 op1 op2 op3 op4 Step 1 Step2 55
  • 56. Goal Decode BB2 BB3 BB4 Handler c1→2 and c2→4 (BB4 and c4→5) or (BB3 and c3→5) c1→2 c2→4 c3→5 c4→5 56
  • 58. Conclusion • Symbolic execution is powerful against obfuscations • Use mathematical complexity expressions against such attacks • The goal is to imply a timeout on SMT solvers side 58
  • 60. Acknowledgements • Thanks to Brendan Dolan-Gavitt for his invitation to the S.O.S workshop! • Kudos to Adrien Guinet for his Arybo 10 framework! 10 https://github.com/quarkslab/arybo 60
  • 61. Contact us • Romain Thomas • rthomas at quarkslab com • @rh0main • Jonathan Salwan • jsalwan at quarkslab com • @JonathanSalwan • Triton team • triton at quarkslab com • @qb triton • irc: #qb [email protected] 61
  • 62. References I C. Collberg, S. Martin, J. Myers, and J. Nagra. Distributed application tamper detection via continuous software updates. In Proceedings of the 28th Annual Computer Security Applications Conference, ACSAC ’12, pages 319–328, New York, NY, USA, 2012. ACM. N. Eyrolles, A. Guinet, and M. Videau. Arybo: Manipulation, canonicalization and identification of mixed boolean-arithmetic symbolic expressions. GreHack, France, Grenoble, 2016. Y. Kanzaki, A. Monden, and C. Collberg. Code artificiality: A metric for the code stealth based on an n-gram model. In SPRO 2015 International Workshop on Software Protection, 2015. 62
  • 63. References II J. C. King. Symbolic execution and program testing. Communications of the ACM, 19(7):385–394, 1976. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis and transformation. pages 75–88, San Jose, CA, USA, Mar 2004. 63
  • 64. References III F. Saudel and J. Salwan. Triton: A dynamic symbolic execution framework. In Symposium sur la s´ecurit´e des technologies de l’information et des communications, SSTIC, France, Rennes, June 3-5 2015, pages 31–54. SSTIC, 2015. K. Sen, D. Marinov, and G. Agha. Cute: a concolic unit testing engine for c. In ACM SIGSOFT Software Engineering Notes, volume 30, pages 263–272. ACM, 2005. 64