Artificial Intelligence | IEEE DataPort

Pre_Trained LLM

This dataset was created to provide comparative statistics on various general-purpose and pre-trained financial large language models (LLMs) based on their financial capabilities. We also emphasize on which specific characteristics contributed to their financial capacity. We also provide data on whether the models were open- or closed-sourced. Our intent was to generate a categorical heatmap from the CSV data.

Categories:

Mal Netminer Dataset

The Mal-Netminer dataset consists of malware samples and their corresponding system call graphs used to analyze behavioral patterns through social network analysis (SNA). The dataset was developed as part of the Mal-Netminer framework, which classifies malware by extracting topological properties from system call graphs rather than relying on traditional signature or frequency-based approaches.

Categories:

CAFUC

The CAFUC Flight Maneuver Dataset is a high-quality, real-world flight training dataset collected in 2023 from fixed-wing aircraft (C172S and SR20 ) at the Civil Aviation Flight University of China (CAFUC). It is organized into four category folders based on the geometric trajectory shapes of flight maneuvers. The dataset contains 150,000 1Hz-sampled frames (41.6 total flight hours) with 64-dimensional features (e.g., latitude, altitude, engine parameters) and 4 manual labels, covering 4 flight subjects, 14,356 basic maneuvers, and 168 trainee pilots.

Categories:

Artificial Intelligence

CAFUC2

This dataset is dedicated to anomaly detection in flight training scenarios, constructed based on 120 original normal flight training CSV files with a total of 993,655 valid normal data records. To support the training, validation, and performance evaluation of anomaly detection models, four types of typical flight anomalies—throttle surge, course deviation, engine failure, and excessive pitch—have been injected into the normal data.

Categories:

Artificial Intelligence

Bengali Real and Deepfake Audio Dataset for Deepfake Detection

Synthetic speech or audio deepfakes are increasingly threatening to information veracity and public trust. Though deepfake detection has seen a lot of interest, the scarcity of open-source datasets for Bengali speech hampers progress in this field. To fill this gap we present the Bengali Real and Deepfake Audio Dataset, which is a curated repository of real and fictitious speech data in Bengali language that can facilitate deepfake detection research and speaker forensics.

Categories:

Artificial Intelligence

MAVLink Message ID Sequence Dataset

This dataset consists of MAVLink message ID sequences collected from a UAV system operating in a Hardware-in-the-Loop (HITL) simulation environment. Communication between the UAV and the Ground Control Station (GCS) was monitored, and only MAVLink protocol packets were filtered from the captured traffic. From each MAVLink packet, the message ID was extracted and stored as a sequence in NumPy array format.

Categories:

Review with Security Concern Dataset

This dataset contains user reviews collected from 8,999 popular Android game applications on the Google Play Store, each with more than 10 million downloads. A total of 56,439,878 reviews were gathered to investigate whether user feedback reveals meaningful security concerns.

Categories:

EEG Data for ADHD/Control Children

This dataset contains EEG recordings collected from 103 Iranian children aged 6–10 years. According to DSM-5 diagnostic criteria, 49 participants were diagnosed with ADHD (22 females, 27 males), while 54 were healthy controls (24 females, 30 males). ADHD participants were recruited from clinical centers, and controls were selected from summer leisure centers in Mashhad, Iran. Only participants with IQ > 75 and without epilepsy or comorbid psychiatric disorders were included. No participant was taking medication at the time of EEG recording.

Categories:

SAE J1939 ATTACK DATASET

This dataset was used in the paper, "Expanding the Attack Scenarios of SAE J1939: A Comprehensive Analysis of Established and Novel Vulnerabilities in Transport Protocol," presented at ESCAR USA in June 2024.

For more information about this dataset, please refer to our description paper below.

TESTBED

Categories:

SAPIMMDS: Function-Oriented Mobile Malware Analysis Dataset Based on Suspicious API Call Patterns

This dataset accompanies the research presented in “Function-Oriented Mobile Malware Analysis as First Aid” and provides behavior-oriented metadata for 906 Android malware samples collected from real-world smishing and spyware incidents in South Korea.

Categories: