Skip to main content

Datasets

Standard Dataset

Andro-Dumpsys: Android Malware Dataset from Memory-Acquired Odex Bytecode and Creator-Centric Features

Citation Author(s):
Jae-wook Jang (Korea University)
Hyunjae Kang (Deloitt Anjin LLC)
Jiyoung Woo (Korea University)
Aziz Mohaisen (State University of New York at Buffalo)
Huy Kang Kim (Korea University)
Submitted by:
Saehoon Oh
Last updated:
DOI:
10.21227/0as6-j314
Data Format:
Research Article Link:
No Ratings Yet

Abstract

This dataset is derived from the Andro-Dumpsys system, which analyzes Android applications through volatile memory acquisition and similarity-based profiling. During execution in an emulator, the system extracts odex bytecode to address challenges introduced by anti-analysis techniques such as packing, dynamic loading, and dex encryption. Creator-centric artifacts—including certificate serial numbers, operation code patterns, metadata from AndroidManifest.xml, suspicious API sequences, permission usage, and system command traces—are parsed to construct behavioral profiles. The dataset includes malware and benign application samples along with extracted descriptive information.

Instructions:

This dataset contains malware and benign Android applications used in the evaluation of the Andro-Dumpsys system. Most samples are compressed using 7zip and require a password to decompress. Malware samples and benign samples are provided separately, along with a CSV file summarizing extracted profiles.

Dataset components include:

  • result_malware_description_906_150126.csv: textual metadata describing 906 malware samples.
  • Benign sample archive: AndroDumpsys_Benign_1776_151015.7z
  • Malware archive: Malware.zip
  • All samples were executed in an emulator, with memory acquisition capturing odex bytecode for further profiling.
  • Analysis features include:
    • certificate serial numbers
    • suspicious API sequences
    • critical permission usage (requested + API-related)
    • system commands
    • presence of forged files in assets/lib/res paths
    • intents from AndroidManifest.xml

Most malware samples are packed or dynamically loaded, and their bytecode is extracted via a volatile memory dump during runtime.

Acknowledgement
This dataset was produced as part of research supported by the ICT R&D Program of MSIP/IITP (14-912-06-002, The Development of Script-based Cyber Attack Protection Technology) and additionally supported by a Korea University Grant.