SlideShare a Scribd company logo
The Secret Recipe for Automating
Android Malware Analysisa
Lorenzo Cavallarob
<lorenzo.cavallaro@rhul.ac.uk>, March 24-25, 2017
aUK EPSRC grants EP/K033344/1 and EP/L022710/1
bJoint work with S. Dash, G. Suarez-Tangil, S. Khan, K. Tam,
M. Ahmadi, J. Kinder, and K. Kok
$ WHOAMI
Antifork
Research
s0ftpj
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2006-2008: Visiting PhD Scholar–Prof. R. Sekar
Systems security
2008-2010: PostDoc–Profs Vigna & Kruegel
Malware Analysis & Detection
2010-2012: PostDoc–Prof. A. S. Tanenbaum
OS Dependability (MINIX3) & Systems Security
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2006-2008: Visiting PhD Scholar–Prof. R. Sekar
Systems security
2008-2010: PostDoc–Profs Vigna & Kruegel
Malware Analysis & Detection
2010-2012: PostDoc–Prof. A. S. Tanenbaum
OS Dependability (MINIX3) & Systems Security
Associate Professor of Information Security
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2006-2008: Visiting PhD Scholar–Prof. R. Sekar
Systems security
2008-2010: PostDoc–Profs Vigna & Kruegel
Malware Analysis & Detection
2010-2012: PostDoc–Prof. A. S. Tanenbaum
OS Dependability (MINIX3) & Systems Security
Associate Professor of Information Security
Royal Holloway, University of London
Founded in 1879 by Thomas Holloway
→ Entrepreneur and Philanthropist
→ Holloway's pills and ointments
Egham–still commuting distance to London!
Featured in Avengers: Age of Ultron :-)
Academic Centre of Excellence in Cyber Security
Research
Centre for Doctoral Traning in Cyber Security
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2006-2008: Visiting PhD Scholar–Prof. R. Sekar
Systems security
2008-2010: PostDoc–Profs Vigna & Kruegel
Malware Analysis & Detection
2010-2012: PostDoc–Prof. A. S. Tanenbaum
OS Dependability (MINIX3) & Systems Security
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2006-2008: Visiting PhD Scholar–Prof. R. Sekar
Systems security
2008-2010: PostDoc–Profs Vigna & Kruegel
Malware Analysis & Detection
2010-2012: PostDoc–Prof. A. S. Tanenbaum
OS Dependability (MINIX3) & Systems Security
Systems Security Research Lab
Founded recently on Sep 2014
Machine learning and program analysis to
protect systems from a broad range of
attacks
→ Botnet, automatic heap exploit
generation, situational awareness,
Android security
Practical and publicly-available tools
Funded by UK EPSRC, EU FP7, Intel
Security (McAfee), and Royal Holloway
2
$ WHOAMI
Antifork
Research
s0ftpj
BSc & MSc in CS
PhD in CS (Computer
Security)
2006-2008: Visiting PhD Scholar–Prof. R. Sekar
Systems security
2008-2010: PostDoc–Profs Vigna & Kruegel
Malware Analysis & Detection
2010-2012: PostDoc–Prof. A. S. Tanenbaum
OS Dependability (MINIX3) & Systems Security
Systems Security Research Lab
Founded recently on Sep 2014
Machine learning and program analysis to
protect systems from a broad range of
attacks
→ Botnet, automatic heap exploit
generation, situational awareness,
Android security
Practical and publicly-available tools
Funded by UK EPSRC, EU FP7, Intel
Security (McAfee), and Royal Holloway
2
The Secret Recipe for Automating Android Malware Analysis - Lorenzo Cavallaro - Codemotion Rome 2017
THE RISE IN ANDROID MALWARE
4
STATUS QUO
Year Method Venue
Type
Feature # Malware DR/FP(%) ACC(%)
Det Class
2014 DroidAPIMiner SecureComm  − API,PKG,PAR 3,987 99/2.2 −
2014 DroidMiner ESORICS   CG,API 2,466 95.3/0.4 92
2014 Drebin NDSS  − PER,STR,API,INT 5,560 94.0/1.0 −
2014 DroidSIFT ACM CCS   API-F 2,200 98.0/5.15 93
2014 DroidLegacy ACM PPREW   API 1,052 93.0/3.0 98
2015 AppAudit IEEE SP  − API-F 1,005 99.3/0.61 −
2015 MudFlow ICSE  − API-F 10,552 90.1/18.7 −
2015 Marvin ACM COMPSAC  − PER, INT, ST, PN 15,741 98.24/0.0 −
2015 RevealDroid TR GMU   PER,API,API-F,INT,PKG 9,054 98.2/18.7 93
2017 MaMaDroid NDSS  − Abstract APIs Markov Chain 80,000 99/1 −
2017 DroidSieve ACM CODASPY   Syntactic-  Resource-centric 100,000 99.7/0 −
2016 Madam IEEE TDSC  − SYSC, API, PER, SMS, USR 2,800 96/0.2 − −
5
STATUS QUO
Year Method Venue
Type
Feature # Malware DR/FP(%) ACC(%)
Det Class
2014 DroidAPIMiner SecureComm  − API,PKG,PAR 3,987 99/2.2 −
2014 DroidMiner ESORICS   CG,API 2,466 95.3/0.4 92
2014 Drebin NDSS  − PER,STR,API,INT 5,560 94.0/1.0 −
2014 DroidSIFT ACM CCS   API-F 2,200 98.0/5.15 93
2014 DroidLegacy ACM PPREW   API 1,052 93.0/3.0 98
2015 AppAudit IEEE SP  − API-F 1,005 99.3/0.61 −
2015 MudFlow ICSE  − API-F 10,552 90.1/18.7 −
2015 Marvin ACM COMPSAC  − PER, INT, ST, PN 15,741 98.24/0.0 −
2015 RevealDroid TR GMU   PER,API,API-F,INT,PKG 9,054 98.2/18.7 93
2017 MaMaDroid NDSS  − Abstract APIs Markov Chain 80,000 99/1 −
2017 DroidSieve ACM CODASPY   Syntactic-  Resource-centric 100,000 99.7/0 −
2016 Madam IEEE TDSC  − SYSC, API, PER, SMS, USR 2,800 96/0.2 − −
Most focus on statically-extracted features
→ Issues with obfuscation, dynamically  native code
How far would dynamically-extracted features go?
RQ1 Automatic reconstruction of apps behaviors
(As fewer features as possible with rich semantics)
RQ2 Machine learning with high-accuracy results
(Challenging contexts: multi-class classification,
sparse behaviors)
5
RQ1—CopperDroid
Automatic Reconstruction of Apps Behaviors
DYNAMIC ANALYSIS FOR ANDROID
DroidScope/DECAF1
→ Dalvik VM method, asm insn, and system call tracing
→ 2-level VMI to get to Dalvik VM semantics
Droidbox2 and TaintDroid3
Other approaches generally built on top of the tools above
1https://github.com/sycurelab/DECAF/tree/master/DroidScope/qemu
2https://github.com/pjlantz/droidbox
3http://www.appanalysis.org/
7
SYSTEM CALL-CENTRIC ANALYSIS
Established technique to characterize process behaviors4
Identifying state-modifying actions crucial to analysis
4https://www.cs.unm.edu/%2E/forrest/publications/acsac08.pdf
8
SYSTEM CALL-CENTRIC ANALYSIS
Established technique to characterize process behaviors4
Identifying state-modifying actions crucial to analysis
Can it be applied to Android?
Android architecture is different from traditional devices
State-modifying actions manifest at multiple abstractions
→ OS interactions (e.g., filesystem, network, process)
→ Android-specific behaviors (e.g., SMS, phone calls)
4https://www.cs.unm.edu/%2E/forrest/publications/acsac08.pdf
8
SYSTEM CALL-CENTRIC ANALYSIS
Key Insight
System calls provide the right semantic abstraction given
the reconstruction of Inter-Component Communications
(ICC) behaviors
ICC (aka Binder transactions) are carried out as ioctl
system calls
→ CopperDroid automatically unmarshalls such calls and
reconstruct Android app behaviorsa
→ No modification to the OS
→ It works automatically across the Android fragmented
ecosystem
aKimberly Tam, Salahuddin J. Khan, Aristide Fattori, and Lorenzo Cavallaro.
CopperDroid: Automatic Reconstruction of Android Malware Behaviors. In
22nd Annual Network and Distributed System Security Symposium (NDSS),
2015
4https://www.cs.unm.edu/%2E/forrest/publications/acsac08.pdf
8
THE BINDER PROTOCOL
IPC/RPC
Binder protocols enable fast inter-process communication
Allows apps to invoke other app component functions
Binder objects handled by Binder Driver in kernel
→ Serialized/marshalled passing through kernel
→ Results in input output control (ioctl) system calls
Android Interface Definition Language (AIDL)
AIDL defines which/how services can be invoked remotely
Describes how to marshal method parameters
9
IPC BINDER: AN EXAMPLE
Application
PendingIntent sentIntent = PendingIntent.getBroadcast(SMS.this,
0, new Intent(SENT), 0);
SmsManager sms = SmsManager.getDefault();
sms.sendTextMessage(7855551234, null, Hi␣There, sentIntent, null);
10
IPC BINDER: AN EXAMPLE
Application
android.telephony.SmsManager
public void sendTextMessage(...) {
...
ISms iccISms = ISms.Stub.asInterface(ServiceManager.getService(isms));
if (iccISms != null)
iccISms.sendText(destinationAddress , scAddress, text, sentIntent, deliveryIntent);
...
10
IPC BINDER: AN EXAMPLE
Application
android.telephony.SmsManager
com.android.internal.telephony.ISms
public void sendText(...) {
android.os.Parcel _data = android.os.Parcel.obtain();
try {
_data.writeInterfaceToken(DESCRIPTOR);
_data.writeString(destAddr);
...
mRemote.transact(Stub.TRANSACTION_sendText , _data, _reply, 0);
}
10
IPC BINDER: AN EXAMPLE
Application
android.telephony.SmsManager
com.android.internal.telephony.ISms
Kernel (drivers/staging/android/binder.c)
ioctl
10
IPC BINDER: AN EXAMPLE
Application
android.telephony.SmsManager
com.android.internal.telephony.ISms
Kernel (drivers/staging/android/binder.c)
ioctl
ioctl(4, 0xc0186201, ...
x4bx00x00x00x49x00x20x00x74x00x61x00
x6bx00x65x00x20x00x70x00x6cx00x65x00
x61x00x73x00x75x00x72x00x65x00x20x00
x69x00x6ex00x20x00x68x00x75x00x72x00
x74x00x69x00x6ex00x67x00x20x00x73x00 ...)
10
IPC BINDER: AN EXAMPLE
Application
android.telephony.SmsManager
com.android.internal.telephony.ISms
Kernel (drivers/staging/android/binder.c)
ioctl
ioctl(/dev/binder, BINDER_WRITE_READ , ...
InterfaceToken = com.android.internal.telephony.ISms,
method: sendText,
destAddr = 7855551234,
scAddr = null,
text = Hi␣There,
sentIntent = Intent(SENT),
deliverIntent = null)
10
TRACING SYSTEM CALLS ON ANDROID ARM THROUGH QEMU
A system call induces a User - Kernel transition
On ARM invoked through the swi instruction (SoftWare
Interrupt)
r7: invoked system call number
r0-r5: parameters
lr: return address
CopperDroid's Approach
instruments QEMU's emulation of the swi instruction
instruments QEMU to intercept every cpsr_write (Kernel →
User)
Perform traditional VMI to associate system calls to threads
11
TRACING SYSTEM CALLS ON ANDROID ARM THROUGH QEMU
Wh'/E^
YDh
EZK/
hƐĞƌ ŵŽĚĞ ;W ϬͿ
ĞƌŶĞů ŵŽĚĞ ;WϭͿ
ĂůǀŝŬ ͬ Zd
WůƵŐͲŝŶ DĂŶĂŐĞƌ
ƉƉ/WƉƉ
^LJƐƚĞŵ Ăůů dƌĂĐŬŝŶŐ ŝŶĚĞƌ ŶĂůLJƐŝƐ
KWWZZK/ Wh'Ͳ/E
Wh'/E^ KƌĂĐůĞ
dW
ŶĚƌŽŝĚͬŝŶƵdž ĞƌŶĞů
ĐƉƐƌͺǁƌŝƚĞ
;ŬĞƌŶĞůїƵƐĞƌͿ
tƌŝƚĞƐ ƚŽ ĐƵƌƌĞŶƚ
ƐƚĂƚƵƐ ƉƌŽŐƌĂŵ
ƌĞŐ
Ɛǁŝ
;ƵƐĞƌїŬĞƌŶĞůͿ
^ŽĨƚtĂƌĞ
/ŶƚĞƌƌƵƉƚ
ŝŶƐƚƌƵĐƚŝŽŶ
11
BINDER STRUCTURE WITHIN IOCTL
CopperDroid inspects the Binder protocol in detail by
intercepting a subset of the ioctls issued by userspace Apps.
write_size
write_consumed
write_buffer
read_size
…
BC_* Params BC_TR Params BC_* Params
ioctl(binder_fd, BINDER_WRITE_READ, binder_write_read);
12
AUTOMATIC BINDER UNMARSHALLING
CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs
BC_* Params BC_TR Params BC_* Params
target
code
uid
…
data_size
buffer
struct
binder_transaction_data
x4bx00x00x00x49x00x20x00
x74x00x61x00x6bx00x65x00
x20x00x70x00x6cx00x65x00
x61x00x73x00x75x00x72x00
x65x00x20x00x69x00x6ex00
x20x00x68x00x75x00x72x00
x74x00x69x00x6ex00x67 ...
CopperDroid uses a modified
AIDL parser to automatically
generate signatures of each
method (code) for each inter-
face (InterfaceToken).
13
AUTOMATIC BINDER UNMARSHALLING
CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs
BC_* Params BC_TR Params BC_* Params
target
code
uid
…
data_size
buffer
InterfaceToken Param 1 Param 2 Param 3 …
struct
binder_transaction_data
ISms.sendText(???, ???, ???, ... )
13
AUTOMATIC BINDER UNMARSHALLING
CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs
BC_* Params BC_TR Params BC_* Params
target
code
uid
…
data_size
buffer
struct
binder_transaction_data
public void sendText(...) {
android.os.Parcel _data =
android.os.Parcel.obtain();
try {
...
_data.writeString(destAddr);
_data.writeString(srcAddr);
_data.writeString(text);
...
mRemote.transact(
Stub.TRANSACTION_sendText,
_data, _reply, 0);
}
13
AUTOMATIC BINDER UNMARSHALLING
CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs
BC_* Params BC_TR Params BC_* Params
target
code
uid
…
data_size
buffer
InterfaceToken Param 1 Param 2 Param 3 …
struct
binder_transaction_data
ISms.sendText(7855551234, ... )
13
AUTOMATIC ANDROID OBJECTS UNMARSHALLING
Primitive types (e.g., String text)
→ A few manually-written procedures
Complex Android objects
→ 300+ Android objects–manual unmarshalling: does not scale
 no scientific
→ Finds object CREATOR field
→ Use reflection (type introspection, then intercession)
IBinder object reference
→ A handle (pointer) sent instead of marshalled object
→ Look earlier in trace to map each handle to an object
CopperDroid's Oracle unmarshalls all three automatically
14
Wh'/E^
WůƵŐͲŝŶ DĂŶĂŐĞƌ
^LJƐƚĞŵ Ăůů dƌĂĐŬŝŶŐ ŝŶĚĞƌ ŶĂůLJƐŝƐ
KWWZZK/ Wh'Ͳ/E
ŽƉƉĞƌƌŽŝĚ
ŵƵůĂƚŽƌ
ƉƉ
sĂŶŝůůĂ
ŵƵůĂƚŽƌ
KƌĂĐůĞ
dW
15
AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE
TYPE
string, string, string, PendingIntent, PendingIntent
DATA
x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35
x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00
x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74 x00
x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00 x00
x00 xa0 x00 x00 x00 x00 x00 x00 x00 ...
OUTPUT
16
AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE
TYPE
string, string, string, PendingIntent, PendingIntent
DATA
x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35
x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00
x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74
x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00
x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ...
OUTPUT
telephony.ISms.sendText( String destAddr = 7855551234, ... )
Type[0] = Primitive string
Use ReadString() (and increment data offset by length of string)
16
AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE
TYPE
string, string, string, PendingIntent, PendingIntent
DATA
x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35
x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00
x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74
x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00
x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ...
OUTPUT
com.android.internal.telephony.ISms.sendText( String destAddr =
7855551234, String srcAddr = null, String text = Hi there,
... )
Type[1] and Type[2] are also Primitive string
Use ReadString() (and increment data offset by length of strings)
17
AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE
TYPE
string,string, string, PendingIntent, PendingIntent
DATA
x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35
x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00
x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74
x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00
x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ...
OUTPUT
com.android.internal.telephony.ISms.sendText( String destAddr =
7855551234, String srcAddr = null, String text = Hi there,
Intent sentIntent { type = BINDER_TYPE_HANDLE, flags = 0x7F |
FLAT_BINDER_FLAG_ACCEPT_FDS handle = 0xa, cookie = 0x0 }, ... )
Type[3] = IBinder PendingIntent
Unmarshal using com.Android.Intent (AIDL) and increment
buffer pointer
Handle points to data to be unmarshalled in a previous Binder
(ioctl) call
18
AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE
TYPE
string,string, string, PendingIntent, PendingIntent
DATA
x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35
x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00
x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74
x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00
x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ...
OUTPUT
com.android.internal.telephony.ISms.sendText( String destAddr =
7855551234, String srcAddr = null, String text = Hi there,
Intent sentIntent { Intent(SENT) }, ... )
Each handle is paired with a parcelable object
CopperDroid sends each handle and parcelable object to the Oracle
18
Outputs Observed from
CopperDroid
FILESYSTEM TRANSACTIONS
1 class: FS ACCESS,
2 low: [
3 {
4 blob: {'flags': 131072, 'mode': 1, 'filename': u'/etc/media_codecs.xml'},
5 id: 187369,
6 sysname: open,
7 ts: 1455718126.798,
8 },
9 {
10 blob: {'size': 4096L, 'filename': u'/etc/media_codecs.xml'},
11 id: 187371,
12 sysname: read,
13 ts: 1455718126.798,
14 xref: 187369
15 },
16 {
17 blob: {'filename': u'/etc/media_codecs.xml'},
18 id: 187389,
19 sysname: close,
20 ts: 1455718126.799,
21 xref: 187369
22 }
23 ],
24 procname: /system/bin/mediaserver
20
NETWORK TRANSACTIONS
1 class: NETWORK ACCESS,
2 low: [
3 {
4 blob: {'socket domain': 10, 'socket type': 1, 'socket protocol': 0},
5 id: 62,
6 sysname: socket,
7 ts: 1445024980.686,
8 },
9 {
10 blob: {'host': '::ffff:134.219.148.11', 'port': 80, 'returnValue ': 0},
11 id: 63,
12 sysname: connect,
13 ts: 1445024980.687,
14 },
15 {
16 blob: =%22%27GET+%2Findex.html+HTTP%2F1.1%5C%5Cr%5C%5CnUser-Agent%3A+
Dalvik%2F1.6.0+%28Linux%3B+U%3B+Android+4.4.4%3B+sdk+Build%2FKK%29%5C%5
Cr%5C%5CnHost%3A+s2lab.isg.rhul.ac.uk%5C%5Cr%5C%5CnConnection%3A+Keep-
Alive%5C%5Cr%5C%5CnAccept-Encoding%3A+gzip%5C%5Cr%5C%5Cn%5C%5Cr%5C%5Cn%
27%22,
17 id: 164,
18 sysname: sendto,
19 ts: 1445024980.720,
20 },
21 ],
22 procname: com.cd2.nettest.nettest,
23 subclass: HTTP
21
NETWORK TRANSACTIONS
1 class: NETWORK ACCESS,
2 low: [
3 {
4 blob: {'socket domain': 10, 'socket type': 1, 'socket protocol': 0},
5 id: 62,
6 sysname: socket,
7 ts: 1445024980.686,
8 },
9 {
10 blob: {'host': '::ffff:134.219.148.11', 'port': 80, 'returnValue ': 0},
11 id: 63,
12 sysname: connect,
13 ts: 1445024980.687,
14 },
15 {
16 blob: =%22%27GET+%2Findex.html+HTTP%2F1.1%5C%5Cr%5C%5CnUser-Agent%3A+
Dalvik%2F1.6.0+%28Linux%3B+U%3B+Android+4.4.4%3B+sdk+Build%2FKK%29%5C%5
Cr%5C%5CnHost%3A+s2lab.isg.rhul.ac.uk%5C%5Cr%5C%5CnConnection%3A+Keep-
Alive%5C%5Cr%5C%5CnAccept-Encoding%3A+gzip%5C%5Cr%5C%5Cn%5C%5Cr%5C%5Cn%
27%22,
17 id: 164,
18 sysname: sendto,
19 ts: 1445024980.720,
20 },
21 ],
22 procname: com.cd2.nettest.nettest,
23 subclass: HTTP
Composite behaviors (e.g., filesystem and network
transactions)
We perform a value-based data flow analysis by
building a system call-related DDG and def-use chains
→ Each observed system call is initially considered as an
unconnected node
→ Forward slicing inserts edges for every inferred
dependence between two calls
→ Nodes and edges are annotated with the system call
argument constraints
→ Annotations needed for the creation of def-use chains
→ Def-use chains relate the output value of specific
system calls to the input of (non-necessarily adjacent)
others
21
BINDER TRANSACTIONS
1 class: SMS SEND,
2 low: [
3 {
4 blob: {
5 method: sendText,
6 params: [
7 callingPkg = com.load.wap,
8 destAddr = 3170,
9 scAddr = null,
10 text = 999287346 418 Java (256) vip 2012-02-25 17:47:56
newoperastore.ru y
11 ]
12 },
13 method_name: com.android.internal.telephony.ISms.sendText(),
14 sysname: ioctl,
15 ts: 1444337887.816,
16 type: BINDER
17 }
18 ],
19 procname: com.load.wap
22
RQ2—DroidScribe
Classifying Malware with Runtime Behavior
RESEARCH OBJECTIVES
Runtime behaviors as discriminator of maliciousness
→ Independent of any syntactic artifact
→ Visible in managed and native code alike
Family Identification
→ Crucial for analysis of threats and mitigation planning
Goal Dynamic analysis for classification challenging conditions
Our contributions5
RQ2.1: What is the best level abstraction?
RQ2.2: Can we deal with sparse behaviors?
5Dash et al., ``DroidScribe: Classifying Android Malware Based on Runtime
Behavior'', in IEEE SP Workshop MoST 2016
24
OVERVIEW OF THE CLASSIFICATION FRAMEWORK
Family 1 Family 2 Family N
25
SYSTEM-CALLS VS. ABSTRACT BEHAVIORS
RQ2.1 What is the best level of abstraction?
Experiments on the Drebin dataset (5,246 malware samples).
Reconstructing Binder calls adds 141 meaningful features.
High level behaviors added 3 explanatory features.
sys rec_b rec_b+
30
40
50
60
70
80
90
Accuracy(%)
(a) Accuracy
sys rec_b rec_b+
10
15
20
25
30
35
Runtime(sec)
(b) Runtime
26
SET-BASED PREDICTIONS
Dynamic analysis is limited by code coverage
Classifier has only partial information about behaviors
Identify when malware cannot be reliable classified into only
one family
→ Based on a measure of the statistical confidence
Helpful human analyst by identifying the top matching
families, supported by statistical evidence
27
CLASSIFICATION WITH STATISTICAL CONFIDENCE
Conformal Predictor (CP)
A statistical learning algorithm for classification tasks
Provides statistical evidence on the results
Credibility
Supports how good a sample fits into a class
Confidence
Indicates if there are other good choices
Robust Against Outliers
Aware of values from other members of the same class
28
COMPUTING P-VALUES
Nonconformity Measure (NCM) is a geometric measure of
how well a sample is far from a class.
→ For SVM, the NCM N z
D of a sample z w.r.t. class D is sum
distances from all hyperplanes bounding the class D.
N z
D = ∑
i
d(z,Hi)
P-value is a statistical measure of how well a sample fits in a
class.
→ P-value Pz
D represents the proportion of samples in D that
more different than z w.r.t. D.
Pz
D =
|{j = 1,...,n : N j
D ≥ N z
D }|
n
28
IN AN IDEAL WORLD
Given a new object s, conformal predictor picks the class with the
highest p-value and return a singular prediction.
29
OBTAINING PREDICTION SETS
Given a new object s, we can set a significance-level e for p-values
and obtain a prediction set Γ e includes labels whose p-value is
greater than e for the sample.P-value
e
A B C D
Prediction Set = {A, C, D}
0.50
confidence
0.20
0.60
0.40
0.30
0.00
1.00
significance-level (e) = 0.30
confidence = (1 - e) = 0.70
30
WHEN TO USE CONFORMAL PREDICTION?
CP is an expensive algorithm
→ For each sample, we need to derive a p-value for each class
→ Computation complexity of O(nc) where n is number of
samples and c is the number of classes
31
WHEN TO USE CONFORMAL PREDICTION?
CP is an expensive algorithm
→ For each sample, we need to derive a p-value for each class
→ Computation complexity of O(nc) where n is number of
samples and c is the number of classes
Conformal Evaluation1
Provide statistical evaluation of the quality of a ML algorithm
→ Quality threshold to understand when should be trusting SVM
→ Statistical evidences of the choices of SVM
→ Selectively invoke CP to alleviate runtime performance
1Jordaney, R., Wang Z., Papini D., Nouretdinov I., Cavallaro L. ``Misleading
Metrics: On Evaluating Machine Learning for Malware with Confidence.'' TR
2016-1, Royal Holloway, University of London, 2016.
31
CONFIDENCE OF CORRECT SVM DECISIONS
SMSreg
Kmin
Imlog
FakeInstaller
Glodream
Yzhc
Jifake
DroidKungFu
SendPay
BaseBridge
Boxer
Adrd
LinuxLotoor
Iconosys
GinMaster
MobileTx
FakeDoc
Opfake
Plankton
Gappusin
Geinimi
DroidDream
FakeRun
0.0
0.2
0.4
0.6
0.8
1.0
Confidence
32
ACCURACY VS. PREDICTION SET SIZE
RQ2.2 Can we deal with sparse behaviors?
0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.0
p-value thresholds (1.0-confidence)
0.82
0.84
0.86
0.88
0.90
0.92
0.94
0.96
0.98
1.00
Precision/Recall Recall
Precision
Set size
0
5
10
15
20
25
Numberofclasses
Accuracy improves with the prediction set size
33
CONCLUSION
CopperDroid: automatic reconstruction of apps behaviors6
→ System calls to abstract OS- and Android-specific behaviors
→ Resilient to changes to the runtime and Android versions
6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf
7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf
8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and
http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission
34
CONCLUSION
CopperDroid: automatic reconstruction of apps behaviors6
→ System calls to abstract OS- and Android-specific behaviors
→ Resilient to changes to the runtime and Android versions
Classification with such semantics: It... Could... Work!7
→ Selective set-based classification (CE/CP)
→ (WIP: binary classification and different feature engineering)
6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf
7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf
8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and
http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission
34
CONCLUSION
CopperDroid: automatic reconstruction of apps behaviors6
→ System calls to abstract OS- and Android-specific behaviors
→ Resilient to changes to the runtime and Android versions
Classification with such semantics: It... Could... Work!7
→ Selective set-based classification (CE/CP)
→ (WIP: binary classification and different feature engineering)
Statistical evaluation of ML seems promising8
→ Identify concept drift and and when to trust a prediction
→ Early eval: TPR from 37.5% to 92.7% in realistic settings
→ Identifies previously-unknown classes or malicious samples
RESTful API (done) and online deployment (soon), write me!
6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf
7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf
8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and
http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission
34
CONCLUSION
CopperDroid: automatic reconstruction of apps behaviors6
→ System calls to abstract OS- and Android-specific behaviors
→ Resilient to changes to the runtime and Android versions
Classification with such semantics: It... Could... Work!7
→ Selective set-based classification (CE/CP)
→ (WIP: binary classification and different feature engineering)
Statistical evaluation of ML seems promising8
→ Identify concept drift and and when to trust a prediction
→ Early eval: TPR from 37.5% to 92.7% in realistic settings
→ Identifies previously-unknown classes or malicious samples
RESTful API (done) and online deployment (soon), write me!
Shameless Plug: I Am Hiring!
Multiple PhD positions
Two 2-year PostDoc positions (Android security)
→ CS/CEng background
→ Expertise in program analysis and/or machine learning
→ Available from around Sep 2017 (negotiable)
Visit the Systems Security Research Lab at
http://s2lab.isg.rhul.ac.uk
Contact me at Lorenzo.Cavallaro@rhul.ac.uk6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf
7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf
8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and
http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission
34

More Related Content

PDF
Building advanced Chats Bots and Voice Interactive Assistants - Stève Sfartz ...
Codemotion
 
PDF
Declarative Import with Magento 2 Import Framework (M2IF)
Tim Wagner
 
PPTX
Rancher master class globalized edge workloads with k3s
Joseph Marhee
 
PDF
Situational Awareness, Botnet and Malware Detection in the Modern Era - Davi...
Codemotion
 
PPTX
A basic overview of Containers
Divakar Sharma
 
PDF
Serhiy Kalinets "Building Service Mesh with .NET Core"
Fwdays
 
PDF
In Search of Segmentation
Adrian Cockcroft
 
PPTX
How to Work Efficiently in a Hybrid Git-Perforce Environment
Perforce
 
Building advanced Chats Bots and Voice Interactive Assistants - Stève Sfartz ...
Codemotion
 
Declarative Import with Magento 2 Import Framework (M2IF)
Tim Wagner
 
Rancher master class globalized edge workloads with k3s
Joseph Marhee
 
Situational Awareness, Botnet and Malware Detection in the Modern Era - Davi...
Codemotion
 
A basic overview of Containers
Divakar Sharma
 
Serhiy Kalinets "Building Service Mesh with .NET Core"
Fwdays
 
In Search of Segmentation
Adrian Cockcroft
 
How to Work Efficiently in a Hybrid Git-Perforce Environment
Perforce
 

What's hot (20)

PDF
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
PDF
Gsm library for proteus the engineering projects
ZerihunDemere
 
PPTX
Concourse in the Real World: A Case Study in CI/CD and DevOps
VMware Tanzu
 
PDF
Continuous delivery of embedded systems embedded meetup
Mike Long
 
PPT
TypeScript - Javascript done right
Wekoslav Stefanovski
 
PDF
0581OS_FM_Final_NT
Vibhor Kumar
 
PDF
Hacking for Salone: Drone Races - Di Saverio; Lippolis - Codemotion Milan 2016
Codemotion
 
PPTX
Building Service Mesh with .NET Core, Сергей Калинец
Sigma Software
 
PDF
Profiling distributed Java applications
Constantine Slisenka
 
PPTX
An overview of ring central sdk
Anirban Sen Chowdhary
 
PDF
Better Swift from the Foundation up #tryswiftnyc17 09-06
Carl Brown
 
PDF
Progressive Deployment & NoDeploy
Yi-Feng Tzeng
 
PPTX
Introduction to vb.net
suraj pandey
 
PDF
International Journal of Engineering Research and Development
IJERD Editor
 
PDF
Swift GUI Development without Xcode
Carl Brown
 
PPTX
Clean Code Part i - Design Patterns and Best Practices -
Theo Jungeblut
 
PDF
Building a Language Server for Eclipse MicroProfile
YK Chang
 
PDF
iOSDevUK Conference- DevOps for iOS Apps
Shashikant Jagtap
 
PDF
PixelCrayons: Hire India's Top PHP Developers
PixelCrayons
 
PPTX
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
NETFest
 
Lightweight Virtualization with Linux Containers and Docker | YaC 2013
dotCloud
 
Gsm library for proteus the engineering projects
ZerihunDemere
 
Concourse in the Real World: A Case Study in CI/CD and DevOps
VMware Tanzu
 
Continuous delivery of embedded systems embedded meetup
Mike Long
 
TypeScript - Javascript done right
Wekoslav Stefanovski
 
0581OS_FM_Final_NT
Vibhor Kumar
 
Hacking for Salone: Drone Races - Di Saverio; Lippolis - Codemotion Milan 2016
Codemotion
 
Building Service Mesh with .NET Core, Сергей Калинец
Sigma Software
 
Profiling distributed Java applications
Constantine Slisenka
 
An overview of ring central sdk
Anirban Sen Chowdhary
 
Better Swift from the Foundation up #tryswiftnyc17 09-06
Carl Brown
 
Progressive Deployment & NoDeploy
Yi-Feng Tzeng
 
Introduction to vb.net
suraj pandey
 
International Journal of Engineering Research and Development
IJERD Editor
 
Swift GUI Development without Xcode
Carl Brown
 
Clean Code Part i - Design Patterns and Best Practices -
Theo Jungeblut
 
Building a Language Server for Eclipse MicroProfile
YK Chang
 
iOSDevUK Conference- DevOps for iOS Apps
Shashikant Jagtap
 
PixelCrayons: Hire India's Top PHP Developers
PixelCrayons
 
.NET Fest 2019. Сергей Медведев. How serverless makes Integration TDD a reali...
NETFest
 
Ad

Similar to The Secret Recipe for Automating Android Malware Analysis - Lorenzo Cavallaro - Codemotion Rome 2017 (20)

PDF
Security Testing ModernApps_v1.0
Neelu Tripathy
 
PPTX
Droidstat-X, Android Applications Security Analyser Xmind Generator
Cláudio André
 
PDF
Final_Presentation_FlowDroid
Kruti Sharma
 
PDF
Security for automation in Internet of Things by using one time password
SHASHANK WANKHADE
 
PDF
Harnessing DDS in Next Generation Healthcare Systems
ADLINK Technology IoT
 
PDF
The Seven Most Dangerous New Attack Techniques, and What's Coming Next
Priyanka Aash
 
PDF
The Seven Most Dangerous New Attack Techniques, and What's Coming Next
Priyanka Aash
 
DOCX
Charan Resume
Charan Mukkamala
 
PDF
H017445260
IOSR Journals
 
PDF
Mobile Security Assessment
Sylvain Martinez
 
PDF
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
apidays
 
PPT
Hota iitd
Pratik Narang
 
PPT
P2P Security
Chittaranjan Hota
 
PDF
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Jorge Cardoso
 
PDF
[CLASS 2014] Palestra Técnica - Delfin Rodillas
TI Safe
 
PDF
Introduction to Android Application Security Testing - 2nd Sep 2017
Satheesh Kumar V
 
PDF
CONFidence2015: Real World Threat Hunting - Martin Nystrom
PROIDEA
 
PPTX
전력 계통망에 있어서 보안일반 및 이슈와 기술 그리고 정책 방향-소셜 네트워크 서비스 등 차세대 기술 환경 맥락으로-
JM code group
 
PDF
Chapter 1, Transformasi antivirus
Adi Saputra
 
DOC
amresh_updated_exp
amresh0510
 
Security Testing ModernApps_v1.0
Neelu Tripathy
 
Droidstat-X, Android Applications Security Analyser Xmind Generator
Cláudio André
 
Final_Presentation_FlowDroid
Kruti Sharma
 
Security for automation in Internet of Things by using one time password
SHASHANK WANKHADE
 
Harnessing DDS in Next Generation Healthcare Systems
ADLINK Technology IoT
 
The Seven Most Dangerous New Attack Techniques, and What's Coming Next
Priyanka Aash
 
The Seven Most Dangerous New Attack Techniques, and What's Coming Next
Priyanka Aash
 
Charan Resume
Charan Mukkamala
 
H017445260
IOSR Journals
 
Mobile Security Assessment
Sylvain Martinez
 
apidays LIVE London 2021 - API Security in Highly Volatile Threat Landscapes ...
apidays
 
Hota iitd
Pratik Narang
 
P2P Security
Chittaranjan Hota
 
Cloud Operations and Analytics: Improving Distributed Systems Reliability usi...
Jorge Cardoso
 
[CLASS 2014] Palestra Técnica - Delfin Rodillas
TI Safe
 
Introduction to Android Application Security Testing - 2nd Sep 2017
Satheesh Kumar V
 
CONFidence2015: Real World Threat Hunting - Martin Nystrom
PROIDEA
 
전력 계통망에 있어서 보안일반 및 이슈와 기술 그리고 정책 방향-소셜 네트워크 서비스 등 차세대 기술 환경 맥락으로-
JM code group
 
Chapter 1, Transformasi antivirus
Adi Saputra
 
amresh_updated_exp
amresh0510
 
Ad

More from Codemotion (20)

PDF
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
PDF
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
PPTX
Pastore - Commodore 65 - La storia
Codemotion
 
PPTX
Pennisi - Essere Richard Altwasser
Codemotion
 
PPTX
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
PPTX
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
PDF
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
PDF
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
PDF
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
PDF
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
PDF
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
PDF
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
PPTX
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
PPTX
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
PDF
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
PDF
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
PDF
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
PDF
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 
Fuzz-testing: A hacker's approach to making your code more secure | Pascal Ze...
Codemotion
 
Pompili - From hero to_zero: The FatalNoise neverending story
Codemotion
 
Pastore - Commodore 65 - La storia
Codemotion
 
Pennisi - Essere Richard Altwasser
Codemotion
 
Michel Schudel - Let's build a blockchain... in 40 minutes! - Codemotion Amst...
Codemotion
 
Richard Süselbeck - Building your own ride share app - Codemotion Amsterdam 2019
Codemotion
 
Eward Driehuis - What we learned from 20.000 attacks - Codemotion Amsterdam 2019
Codemotion
 
Francesco Baldassarri - Deliver Data at Scale - Codemotion Amsterdam 2019 -
Codemotion
 
Martin Förtsch, Thomas Endres - Stereoscopic Style Transfer AI - Codemotion A...
Codemotion
 
Melanie Rieback, Klaus Kursawe - Blockchain Security: Melting the "Silver Bul...
Codemotion
 
Angelo van der Sijpt - How well do you know your network stack? - Codemotion ...
Codemotion
 
Lars Wolff - Performance Testing for DevOps in the Cloud - Codemotion Amsterd...
Codemotion
 
Sascha Wolter - Conversational AI Demystified - Codemotion Amsterdam 2019
Codemotion
 
Michele Tonutti - Scaling is caring - Codemotion Amsterdam 2019
Codemotion
 
Pat Hermens - From 100 to 1,000+ deployments a day - Codemotion Amsterdam 2019
Codemotion
 
James Birnie - Using Many Worlds of Compute Power with Quantum - Codemotion A...
Codemotion
 
Don Goodman-Wilson - Chinese food, motor scooters, and open source developmen...
Codemotion
 
Pieter Omvlee - The story behind Sketch - Codemotion Amsterdam 2019
Codemotion
 
Dave Farley - Taking Back “Software Engineering” - Codemotion Amsterdam 2019
Codemotion
 
Joshua Hoffman - Should the CTO be Coding? - Codemotion Amsterdam 2019
Codemotion
 

Recently uploaded (20)

PPTX
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
Doc9.....................................
SofiaCollazos
 
PDF
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 
Agile Chennai 18-19 July 2025 | Emerging patterns in Agentic AI by Bharani Su...
AgileNetwork
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Tea4chat - another LLM Project by Kerem Atam
a0m0rajab1
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
CIFDAQ's Market Wrap : Bears Back in Control?
CIFDAQ
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
Doc9.....................................
SofiaCollazos
 
Unlocking the Future- AI Agents Meet Oracle Database 23ai - AIOUG Yatra 2025.pdf
Sandesh Rao
 

The Secret Recipe for Automating Android Malware Analysis - Lorenzo Cavallaro - Codemotion Rome 2017

  • 1. The Secret Recipe for Automating Android Malware Analysisa Lorenzo Cavallarob <[email protected]>, March 24-25, 2017 aUK EPSRC grants EP/K033344/1 and EP/L022710/1 bJoint work with S. Dash, G. Suarez-Tangil, S. Khan, K. Tam, M. Ahmadi, J. Kinder, and K. Kok
  • 3. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2
  • 4. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2006-2008: Visiting PhD Scholar–Prof. R. Sekar Systems security 2008-2010: PostDoc–Profs Vigna & Kruegel Malware Analysis & Detection 2010-2012: PostDoc–Prof. A. S. Tanenbaum OS Dependability (MINIX3) & Systems Security 2
  • 5. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2006-2008: Visiting PhD Scholar–Prof. R. Sekar Systems security 2008-2010: PostDoc–Profs Vigna & Kruegel Malware Analysis & Detection 2010-2012: PostDoc–Prof. A. S. Tanenbaum OS Dependability (MINIX3) & Systems Security Associate Professor of Information Security 2
  • 6. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2006-2008: Visiting PhD Scholar–Prof. R. Sekar Systems security 2008-2010: PostDoc–Profs Vigna & Kruegel Malware Analysis & Detection 2010-2012: PostDoc–Prof. A. S. Tanenbaum OS Dependability (MINIX3) & Systems Security Associate Professor of Information Security Royal Holloway, University of London Founded in 1879 by Thomas Holloway → Entrepreneur and Philanthropist → Holloway's pills and ointments Egham–still commuting distance to London! Featured in Avengers: Age of Ultron :-) Academic Centre of Excellence in Cyber Security Research Centre for Doctoral Traning in Cyber Security 2
  • 7. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2006-2008: Visiting PhD Scholar–Prof. R. Sekar Systems security 2008-2010: PostDoc–Profs Vigna & Kruegel Malware Analysis & Detection 2010-2012: PostDoc–Prof. A. S. Tanenbaum OS Dependability (MINIX3) & Systems Security 2
  • 8. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2006-2008: Visiting PhD Scholar–Prof. R. Sekar Systems security 2008-2010: PostDoc–Profs Vigna & Kruegel Malware Analysis & Detection 2010-2012: PostDoc–Prof. A. S. Tanenbaum OS Dependability (MINIX3) & Systems Security Systems Security Research Lab Founded recently on Sep 2014 Machine learning and program analysis to protect systems from a broad range of attacks → Botnet, automatic heap exploit generation, situational awareness, Android security Practical and publicly-available tools Funded by UK EPSRC, EU FP7, Intel Security (McAfee), and Royal Holloway 2
  • 9. $ WHOAMI Antifork Research s0ftpj BSc & MSc in CS PhD in CS (Computer Security) 2006-2008: Visiting PhD Scholar–Prof. R. Sekar Systems security 2008-2010: PostDoc–Profs Vigna & Kruegel Malware Analysis & Detection 2010-2012: PostDoc–Prof. A. S. Tanenbaum OS Dependability (MINIX3) & Systems Security Systems Security Research Lab Founded recently on Sep 2014 Machine learning and program analysis to protect systems from a broad range of attacks → Botnet, automatic heap exploit generation, situational awareness, Android security Practical and publicly-available tools Funded by UK EPSRC, EU FP7, Intel Security (McAfee), and Royal Holloway 2
  • 11. THE RISE IN ANDROID MALWARE 4
  • 12. STATUS QUO Year Method Venue Type Feature # Malware DR/FP(%) ACC(%) Det Class 2014 DroidAPIMiner SecureComm − API,PKG,PAR 3,987 99/2.2 − 2014 DroidMiner ESORICS CG,API 2,466 95.3/0.4 92 2014 Drebin NDSS − PER,STR,API,INT 5,560 94.0/1.0 − 2014 DroidSIFT ACM CCS API-F 2,200 98.0/5.15 93 2014 DroidLegacy ACM PPREW API 1,052 93.0/3.0 98 2015 AppAudit IEEE SP − API-F 1,005 99.3/0.61 − 2015 MudFlow ICSE − API-F 10,552 90.1/18.7 − 2015 Marvin ACM COMPSAC − PER, INT, ST, PN 15,741 98.24/0.0 − 2015 RevealDroid TR GMU PER,API,API-F,INT,PKG 9,054 98.2/18.7 93 2017 MaMaDroid NDSS − Abstract APIs Markov Chain 80,000 99/1 − 2017 DroidSieve ACM CODASPY Syntactic- Resource-centric 100,000 99.7/0 − 2016 Madam IEEE TDSC − SYSC, API, PER, SMS, USR 2,800 96/0.2 − − 5
  • 13. STATUS QUO Year Method Venue Type Feature # Malware DR/FP(%) ACC(%) Det Class 2014 DroidAPIMiner SecureComm − API,PKG,PAR 3,987 99/2.2 − 2014 DroidMiner ESORICS CG,API 2,466 95.3/0.4 92 2014 Drebin NDSS − PER,STR,API,INT 5,560 94.0/1.0 − 2014 DroidSIFT ACM CCS API-F 2,200 98.0/5.15 93 2014 DroidLegacy ACM PPREW API 1,052 93.0/3.0 98 2015 AppAudit IEEE SP − API-F 1,005 99.3/0.61 − 2015 MudFlow ICSE − API-F 10,552 90.1/18.7 − 2015 Marvin ACM COMPSAC − PER, INT, ST, PN 15,741 98.24/0.0 − 2015 RevealDroid TR GMU PER,API,API-F,INT,PKG 9,054 98.2/18.7 93 2017 MaMaDroid NDSS − Abstract APIs Markov Chain 80,000 99/1 − 2017 DroidSieve ACM CODASPY Syntactic- Resource-centric 100,000 99.7/0 − 2016 Madam IEEE TDSC − SYSC, API, PER, SMS, USR 2,800 96/0.2 − − Most focus on statically-extracted features → Issues with obfuscation, dynamically native code How far would dynamically-extracted features go? RQ1 Automatic reconstruction of apps behaviors (As fewer features as possible with rich semantics) RQ2 Machine learning with high-accuracy results (Challenging contexts: multi-class classification, sparse behaviors) 5
  • 15. DYNAMIC ANALYSIS FOR ANDROID DroidScope/DECAF1 → Dalvik VM method, asm insn, and system call tracing → 2-level VMI to get to Dalvik VM semantics Droidbox2 and TaintDroid3 Other approaches generally built on top of the tools above 1https://github.com/sycurelab/DECAF/tree/master/DroidScope/qemu 2https://github.com/pjlantz/droidbox 3http://www.appanalysis.org/ 7
  • 16. SYSTEM CALL-CENTRIC ANALYSIS Established technique to characterize process behaviors4 Identifying state-modifying actions crucial to analysis 4https://www.cs.unm.edu/%2E/forrest/publications/acsac08.pdf 8
  • 17. SYSTEM CALL-CENTRIC ANALYSIS Established technique to characterize process behaviors4 Identifying state-modifying actions crucial to analysis Can it be applied to Android? Android architecture is different from traditional devices State-modifying actions manifest at multiple abstractions → OS interactions (e.g., filesystem, network, process) → Android-specific behaviors (e.g., SMS, phone calls) 4https://www.cs.unm.edu/%2E/forrest/publications/acsac08.pdf 8
  • 18. SYSTEM CALL-CENTRIC ANALYSIS Key Insight System calls provide the right semantic abstraction given the reconstruction of Inter-Component Communications (ICC) behaviors ICC (aka Binder transactions) are carried out as ioctl system calls → CopperDroid automatically unmarshalls such calls and reconstruct Android app behaviorsa → No modification to the OS → It works automatically across the Android fragmented ecosystem aKimberly Tam, Salahuddin J. Khan, Aristide Fattori, and Lorenzo Cavallaro. CopperDroid: Automatic Reconstruction of Android Malware Behaviors. In 22nd Annual Network and Distributed System Security Symposium (NDSS), 2015 4https://www.cs.unm.edu/%2E/forrest/publications/acsac08.pdf 8
  • 19. THE BINDER PROTOCOL IPC/RPC Binder protocols enable fast inter-process communication Allows apps to invoke other app component functions Binder objects handled by Binder Driver in kernel → Serialized/marshalled passing through kernel → Results in input output control (ioctl) system calls Android Interface Definition Language (AIDL) AIDL defines which/how services can be invoked remotely Describes how to marshal method parameters 9
  • 20. IPC BINDER: AN EXAMPLE Application PendingIntent sentIntent = PendingIntent.getBroadcast(SMS.this, 0, new Intent(SENT), 0); SmsManager sms = SmsManager.getDefault(); sms.sendTextMessage(7855551234, null, Hi␣There, sentIntent, null); 10
  • 21. IPC BINDER: AN EXAMPLE Application android.telephony.SmsManager public void sendTextMessage(...) { ... ISms iccISms = ISms.Stub.asInterface(ServiceManager.getService(isms)); if (iccISms != null) iccISms.sendText(destinationAddress , scAddress, text, sentIntent, deliveryIntent); ... 10
  • 22. IPC BINDER: AN EXAMPLE Application android.telephony.SmsManager com.android.internal.telephony.ISms public void sendText(...) { android.os.Parcel _data = android.os.Parcel.obtain(); try { _data.writeInterfaceToken(DESCRIPTOR); _data.writeString(destAddr); ... mRemote.transact(Stub.TRANSACTION_sendText , _data, _reply, 0); } 10
  • 23. IPC BINDER: AN EXAMPLE Application android.telephony.SmsManager com.android.internal.telephony.ISms Kernel (drivers/staging/android/binder.c) ioctl 10
  • 24. IPC BINDER: AN EXAMPLE Application android.telephony.SmsManager com.android.internal.telephony.ISms Kernel (drivers/staging/android/binder.c) ioctl ioctl(4, 0xc0186201, ... x4bx00x00x00x49x00x20x00x74x00x61x00 x6bx00x65x00x20x00x70x00x6cx00x65x00 x61x00x73x00x75x00x72x00x65x00x20x00 x69x00x6ex00x20x00x68x00x75x00x72x00 x74x00x69x00x6ex00x67x00x20x00x73x00 ...) 10
  • 25. IPC BINDER: AN EXAMPLE Application android.telephony.SmsManager com.android.internal.telephony.ISms Kernel (drivers/staging/android/binder.c) ioctl ioctl(/dev/binder, BINDER_WRITE_READ , ... InterfaceToken = com.android.internal.telephony.ISms, method: sendText, destAddr = 7855551234, scAddr = null, text = Hi␣There, sentIntent = Intent(SENT), deliverIntent = null) 10
  • 26. TRACING SYSTEM CALLS ON ANDROID ARM THROUGH QEMU A system call induces a User - Kernel transition On ARM invoked through the swi instruction (SoftWare Interrupt) r7: invoked system call number r0-r5: parameters lr: return address CopperDroid's Approach instruments QEMU's emulation of the swi instruction instruments QEMU to intercept every cpsr_write (Kernel → User) Perform traditional VMI to associate system calls to threads 11
  • 27. TRACING SYSTEM CALLS ON ANDROID ARM THROUGH QEMU Wh'/E^ YDh EZK/ hƐĞƌ ŵŽĚĞ ;W ϬͿ ĞƌŶĞů ŵŽĚĞ ;WϭͿ ĂůǀŝŬ ͬ Zd WůƵŐͲŝŶ DĂŶĂŐĞƌ ƉƉ/WƉƉ ^LJƐƚĞŵ Ăůů dƌĂĐŬŝŶŐ ŝŶĚĞƌ ŶĂůLJƐŝƐ KWWZZK/ Wh'Ͳ/E Wh'/E^ KƌĂĐůĞ dW ŶĚƌŽŝĚͬŝŶƵdž ĞƌŶĞů ĐƉƐƌͺǁƌŝƚĞ ;ŬĞƌŶĞůїƵƐĞƌͿ tƌŝƚĞƐ ƚŽ ĐƵƌƌĞŶƚ ƐƚĂƚƵƐ ƉƌŽŐƌĂŵ ƌĞŐ Ɛǁŝ ;ƵƐĞƌїŬĞƌŶĞůͿ ^ŽĨƚtĂƌĞ /ŶƚĞƌƌƵƉƚ ŝŶƐƚƌƵĐƚŝŽŶ 11
  • 28. BINDER STRUCTURE WITHIN IOCTL CopperDroid inspects the Binder protocol in detail by intercepting a subset of the ioctls issued by userspace Apps. write_size write_consumed write_buffer read_size … BC_* Params BC_TR Params BC_* Params ioctl(binder_fd, BINDER_WRITE_READ, binder_write_read); 12
  • 29. AUTOMATIC BINDER UNMARSHALLING CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs BC_* Params BC_TR Params BC_* Params target code uid … data_size buffer struct binder_transaction_data x4bx00x00x00x49x00x20x00 x74x00x61x00x6bx00x65x00 x20x00x70x00x6cx00x65x00 x61x00x73x00x75x00x72x00 x65x00x20x00x69x00x6ex00 x20x00x68x00x75x00x72x00 x74x00x69x00x6ex00x67 ... CopperDroid uses a modified AIDL parser to automatically generate signatures of each method (code) for each inter- face (InterfaceToken). 13
  • 30. AUTOMATIC BINDER UNMARSHALLING CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs BC_* Params BC_TR Params BC_* Params target code uid … data_size buffer InterfaceToken Param 1 Param 2 Param 3 … struct binder_transaction_data ISms.sendText(???, ???, ???, ... ) 13
  • 31. AUTOMATIC BINDER UNMARSHALLING CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs BC_* Params BC_TR Params BC_* Params target code uid … data_size buffer struct binder_transaction_data public void sendText(...) { android.os.Parcel _data = android.os.Parcel.obtain(); try { ... _data.writeString(destAddr); _data.writeString(srcAddr); _data.writeString(text); ... mRemote.transact( Stub.TRANSACTION_sendText, _data, _reply, 0); } 13
  • 32. AUTOMATIC BINDER UNMARSHALLING CopperDroid analyzes BC_TRANSACTIONs and BC_REPLYs BC_* Params BC_TR Params BC_* Params target code uid … data_size buffer InterfaceToken Param 1 Param 2 Param 3 … struct binder_transaction_data ISms.sendText(7855551234, ... ) 13
  • 33. AUTOMATIC ANDROID OBJECTS UNMARSHALLING Primitive types (e.g., String text) → A few manually-written procedures Complex Android objects → 300+ Android objects–manual unmarshalling: does not scale no scientific → Finds object CREATOR field → Use reflection (type introspection, then intercession) IBinder object reference → A handle (pointer) sent instead of marshalled object → Look earlier in trace to map each handle to an object CopperDroid's Oracle unmarshalls all three automatically 14
  • 34. Wh'/E^ WůƵŐͲŝŶ DĂŶĂŐĞƌ ^LJƐƚĞŵ Ăůů dƌĂĐŬŝŶŐ ŝŶĚĞƌ ŶĂůLJƐŝƐ KWWZZK/ Wh'Ͳ/E ŽƉƉĞƌƌŽŝĚ ŵƵůĂƚŽƌ ƉƉ sĂŶŝůůĂ ŵƵůĂƚŽƌ KƌĂĐůĞ dW 15
  • 35. AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE TYPE string, string, string, PendingIntent, PendingIntent DATA x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35 x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74 x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00 x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ... OUTPUT 16
  • 36. AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE TYPE string, string, string, PendingIntent, PendingIntent DATA x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35 x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00 x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74 x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00 x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ... OUTPUT telephony.ISms.sendText( String destAddr = 7855551234, ... ) Type[0] = Primitive string Use ReadString() (and increment data offset by length of string) 16
  • 37. AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE TYPE string, string, string, PendingIntent, PendingIntent DATA x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35 x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00 x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74 x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00 x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ... OUTPUT com.android.internal.telephony.ISms.sendText( String destAddr = 7855551234, String srcAddr = null, String text = Hi there, ... ) Type[1] and Type[2] are also Primitive string Use ReadString() (and increment data offset by length of strings) 17
  • 38. AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE TYPE string,string, string, PendingIntent, PendingIntent DATA x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35 x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00 x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74 x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00 x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ... OUTPUT com.android.internal.telephony.ISms.sendText( String destAddr = 7855551234, String srcAddr = null, String text = Hi there, Intent sentIntent { type = BINDER_TYPE_HANDLE, flags = 0x7F | FLAT_BINDER_FLAG_ACCEPT_FDS handle = 0xa, cookie = 0x0 }, ... ) Type[3] = IBinder PendingIntent Unmarshal using com.Android.Intent (AIDL) and increment buffer pointer Handle points to data to be unmarshalled in a previous Binder (ioctl) call 18
  • 39. AUTOMATIC UNMARSHALLING ORACLE: SMS EXAMPLE TYPE string,string, string, PendingIntent, PendingIntent DATA x0A x00 x00 x00 x34 x00 x38 x00 x35 x00 x35 x00 x35 x00 x35 x00 x31 x00 x32 x00 x33 x00 x34 x00 x00 x00 x00 x00 x08 x00 x00 x00 x48 x00 x69 x00 x20 x00 x74 x00 x68 x00 x65 x00 x72 x00 x65 x00 x85*hs x7f x00 x00 x00 xa0 x00 x00 x00 x00 x00 x00 x00 ... OUTPUT com.android.internal.telephony.ISms.sendText( String destAddr = 7855551234, String srcAddr = null, String text = Hi there, Intent sentIntent { Intent(SENT) }, ... ) Each handle is paired with a parcelable object CopperDroid sends each handle and parcelable object to the Oracle 18
  • 41. FILESYSTEM TRANSACTIONS 1 class: FS ACCESS, 2 low: [ 3 { 4 blob: {'flags': 131072, 'mode': 1, 'filename': u'/etc/media_codecs.xml'}, 5 id: 187369, 6 sysname: open, 7 ts: 1455718126.798, 8 }, 9 { 10 blob: {'size': 4096L, 'filename': u'/etc/media_codecs.xml'}, 11 id: 187371, 12 sysname: read, 13 ts: 1455718126.798, 14 xref: 187369 15 }, 16 { 17 blob: {'filename': u'/etc/media_codecs.xml'}, 18 id: 187389, 19 sysname: close, 20 ts: 1455718126.799, 21 xref: 187369 22 } 23 ], 24 procname: /system/bin/mediaserver 20
  • 42. NETWORK TRANSACTIONS 1 class: NETWORK ACCESS, 2 low: [ 3 { 4 blob: {'socket domain': 10, 'socket type': 1, 'socket protocol': 0}, 5 id: 62, 6 sysname: socket, 7 ts: 1445024980.686, 8 }, 9 { 10 blob: {'host': '::ffff:134.219.148.11', 'port': 80, 'returnValue ': 0}, 11 id: 63, 12 sysname: connect, 13 ts: 1445024980.687, 14 }, 15 { 16 blob: =%22%27GET+%2Findex.html+HTTP%2F1.1%5C%5Cr%5C%5CnUser-Agent%3A+ Dalvik%2F1.6.0+%28Linux%3B+U%3B+Android+4.4.4%3B+sdk+Build%2FKK%29%5C%5 Cr%5C%5CnHost%3A+s2lab.isg.rhul.ac.uk%5C%5Cr%5C%5CnConnection%3A+Keep- Alive%5C%5Cr%5C%5CnAccept-Encoding%3A+gzip%5C%5Cr%5C%5Cn%5C%5Cr%5C%5Cn% 27%22, 17 id: 164, 18 sysname: sendto, 19 ts: 1445024980.720, 20 }, 21 ], 22 procname: com.cd2.nettest.nettest, 23 subclass: HTTP 21
  • 43. NETWORK TRANSACTIONS 1 class: NETWORK ACCESS, 2 low: [ 3 { 4 blob: {'socket domain': 10, 'socket type': 1, 'socket protocol': 0}, 5 id: 62, 6 sysname: socket, 7 ts: 1445024980.686, 8 }, 9 { 10 blob: {'host': '::ffff:134.219.148.11', 'port': 80, 'returnValue ': 0}, 11 id: 63, 12 sysname: connect, 13 ts: 1445024980.687, 14 }, 15 { 16 blob: =%22%27GET+%2Findex.html+HTTP%2F1.1%5C%5Cr%5C%5CnUser-Agent%3A+ Dalvik%2F1.6.0+%28Linux%3B+U%3B+Android+4.4.4%3B+sdk+Build%2FKK%29%5C%5 Cr%5C%5CnHost%3A+s2lab.isg.rhul.ac.uk%5C%5Cr%5C%5CnConnection%3A+Keep- Alive%5C%5Cr%5C%5CnAccept-Encoding%3A+gzip%5C%5Cr%5C%5Cn%5C%5Cr%5C%5Cn% 27%22, 17 id: 164, 18 sysname: sendto, 19 ts: 1445024980.720, 20 }, 21 ], 22 procname: com.cd2.nettest.nettest, 23 subclass: HTTP Composite behaviors (e.g., filesystem and network transactions) We perform a value-based data flow analysis by building a system call-related DDG and def-use chains → Each observed system call is initially considered as an unconnected node → Forward slicing inserts edges for every inferred dependence between two calls → Nodes and edges are annotated with the system call argument constraints → Annotations needed for the creation of def-use chains → Def-use chains relate the output value of specific system calls to the input of (non-necessarily adjacent) others 21
  • 44. BINDER TRANSACTIONS 1 class: SMS SEND, 2 low: [ 3 { 4 blob: { 5 method: sendText, 6 params: [ 7 callingPkg = com.load.wap, 8 destAddr = 3170, 9 scAddr = null, 10 text = 999287346 418 Java (256) vip 2012-02-25 17:47:56 newoperastore.ru y 11 ] 12 }, 13 method_name: com.android.internal.telephony.ISms.sendText(), 14 sysname: ioctl, 15 ts: 1444337887.816, 16 type: BINDER 17 } 18 ], 19 procname: com.load.wap 22
  • 46. RESEARCH OBJECTIVES Runtime behaviors as discriminator of maliciousness → Independent of any syntactic artifact → Visible in managed and native code alike Family Identification → Crucial for analysis of threats and mitigation planning Goal Dynamic analysis for classification challenging conditions Our contributions5 RQ2.1: What is the best level abstraction? RQ2.2: Can we deal with sparse behaviors? 5Dash et al., ``DroidScribe: Classifying Android Malware Based on Runtime Behavior'', in IEEE SP Workshop MoST 2016 24
  • 47. OVERVIEW OF THE CLASSIFICATION FRAMEWORK Family 1 Family 2 Family N 25
  • 48. SYSTEM-CALLS VS. ABSTRACT BEHAVIORS RQ2.1 What is the best level of abstraction? Experiments on the Drebin dataset (5,246 malware samples). Reconstructing Binder calls adds 141 meaningful features. High level behaviors added 3 explanatory features. sys rec_b rec_b+ 30 40 50 60 70 80 90 Accuracy(%) (a) Accuracy sys rec_b rec_b+ 10 15 20 25 30 35 Runtime(sec) (b) Runtime 26
  • 49. SET-BASED PREDICTIONS Dynamic analysis is limited by code coverage Classifier has only partial information about behaviors Identify when malware cannot be reliable classified into only one family → Based on a measure of the statistical confidence Helpful human analyst by identifying the top matching families, supported by statistical evidence 27
  • 50. CLASSIFICATION WITH STATISTICAL CONFIDENCE Conformal Predictor (CP) A statistical learning algorithm for classification tasks Provides statistical evidence on the results Credibility Supports how good a sample fits into a class Confidence Indicates if there are other good choices Robust Against Outliers Aware of values from other members of the same class 28
  • 51. COMPUTING P-VALUES Nonconformity Measure (NCM) is a geometric measure of how well a sample is far from a class. → For SVM, the NCM N z D of a sample z w.r.t. class D is sum distances from all hyperplanes bounding the class D. N z D = ∑ i d(z,Hi) P-value is a statistical measure of how well a sample fits in a class. → P-value Pz D represents the proportion of samples in D that more different than z w.r.t. D. Pz D = |{j = 1,...,n : N j D ≥ N z D }| n 28
  • 52. IN AN IDEAL WORLD Given a new object s, conformal predictor picks the class with the highest p-value and return a singular prediction. 29
  • 53. OBTAINING PREDICTION SETS Given a new object s, we can set a significance-level e for p-values and obtain a prediction set Γ e includes labels whose p-value is greater than e for the sample.P-value e A B C D Prediction Set = {A, C, D} 0.50 confidence 0.20 0.60 0.40 0.30 0.00 1.00 significance-level (e) = 0.30 confidence = (1 - e) = 0.70 30
  • 54. WHEN TO USE CONFORMAL PREDICTION? CP is an expensive algorithm → For each sample, we need to derive a p-value for each class → Computation complexity of O(nc) where n is number of samples and c is the number of classes 31
  • 55. WHEN TO USE CONFORMAL PREDICTION? CP is an expensive algorithm → For each sample, we need to derive a p-value for each class → Computation complexity of O(nc) where n is number of samples and c is the number of classes Conformal Evaluation1 Provide statistical evaluation of the quality of a ML algorithm → Quality threshold to understand when should be trusting SVM → Statistical evidences of the choices of SVM → Selectively invoke CP to alleviate runtime performance 1Jordaney, R., Wang Z., Papini D., Nouretdinov I., Cavallaro L. ``Misleading Metrics: On Evaluating Machine Learning for Malware with Confidence.'' TR 2016-1, Royal Holloway, University of London, 2016. 31
  • 56. CONFIDENCE OF CORRECT SVM DECISIONS SMSreg Kmin Imlog FakeInstaller Glodream Yzhc Jifake DroidKungFu SendPay BaseBridge Boxer Adrd LinuxLotoor Iconosys GinMaster MobileTx FakeDoc Opfake Plankton Gappusin Geinimi DroidDream FakeRun 0.0 0.2 0.4 0.6 0.8 1.0 Confidence 32
  • 57. ACCURACY VS. PREDICTION SET SIZE RQ2.2 Can we deal with sparse behaviors? 0.5 0.45 0.4 0.35 0.3 0.25 0.2 0.15 0.1 0.05 0.0 p-value thresholds (1.0-confidence) 0.82 0.84 0.86 0.88 0.90 0.92 0.94 0.96 0.98 1.00 Precision/Recall Recall Precision Set size 0 5 10 15 20 25 Numberofclasses Accuracy improves with the prediction set size 33
  • 58. CONCLUSION CopperDroid: automatic reconstruction of apps behaviors6 → System calls to abstract OS- and Android-specific behaviors → Resilient to changes to the runtime and Android versions 6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf 7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf 8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission 34
  • 59. CONCLUSION CopperDroid: automatic reconstruction of apps behaviors6 → System calls to abstract OS- and Android-specific behaviors → Resilient to changes to the runtime and Android versions Classification with such semantics: It... Could... Work!7 → Selective set-based classification (CE/CP) → (WIP: binary classification and different feature engineering) 6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf 7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf 8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission 34
  • 60. CONCLUSION CopperDroid: automatic reconstruction of apps behaviors6 → System calls to abstract OS- and Android-specific behaviors → Resilient to changes to the runtime and Android versions Classification with such semantics: It... Could... Work!7 → Selective set-based classification (CE/CP) → (WIP: binary classification and different feature engineering) Statistical evaluation of ML seems promising8 → Identify concept drift and and when to trust a prediction → Early eval: TPR from 37.5% to 92.7% in realistic settings → Identifies previously-unknown classes or malicious samples RESTful API (done) and online deployment (soon), write me! 6http://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf 7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf 8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission 34
  • 61. CONCLUSION CopperDroid: automatic reconstruction of apps behaviors6 → System calls to abstract OS- and Android-specific behaviors → Resilient to changes to the runtime and Android versions Classification with such semantics: It... Could... Work!7 → Selective set-based classification (CE/CP) → (WIP: binary classification and different feature engineering) Statistical evaluation of ML seems promising8 → Identify concept drift and and when to trust a prediction → Early eval: TPR from 37.5% to 92.7% in realistic settings → Identifies previously-unknown classes or malicious samples RESTful API (done) and online deployment (soon), write me! Shameless Plug: I Am Hiring! Multiple PhD positions Two 2-year PostDoc positions (Android security) → CS/CEng background → Expertise in program analysis and/or machine learning → Available from around Sep 2017 (negotiable) Visit the Systems Security Research Lab at http://s2lab.isg.rhul.ac.uk Contact me at [email protected]://s2lab.isg.rhul.ac.uk/papers/files/ndss2015.pdf 7http://s2lab.isg.rhul.ac.uk/papers/files/most2016.pdf 8http://s2lab.isg.rhul.ac.uk/papers/files/rhul2016.pdf and http://s2lab.isg.rhul.ac.uk/papers/files/aisec2016.pdf and work in submission 34