Skip to main content

Artificial Intelligence

This dataset was created to provide comparative statistics on various general-purpose and pre-trained financial large language models (LLMs) based on their financial capabilities. We also emphasize on which specific characteristics contributed to their financial capacity. We also provide data on whether the models were open- or closed-sourced. Our intent was to generate a categorical heatmap from the CSV data. 

Categories:

The Mal-Netminer dataset consists of malware samples and their corresponding system call graphs used to analyze behavioral patterns through social network analysis (SNA). The dataset was developed as part of the Mal-Netminer framework, which classifies malware by extracting topological properties from system call graphs rather than relying on traditional signature or frequency-based approaches.

Categories:

The CAFUC Flight Maneuver Dataset is a high-quality, real-world flight training dataset collected in 2023 from fixed-wing aircraft (C172S  and SR20 ) at the Civil Aviation Flight University of China (CAFUC). It is organized into four category folders  based on the geometric trajectory shapes of flight maneuvers. The dataset contains 150,000 1Hz-sampled frames (41.6 total flight hours) with 64-dimensional features (e.g., latitude, altitude, engine parameters) and 4 manual labels, covering 4 flight subjects, 14,356 basic maneuvers, and 168 trainee pilots.

Categories:

This dataset is dedicated to anomaly detection in flight training scenarios, constructed based on 120 original normal flight training CSV files with a total of 993,655 valid normal data records. To support the training, validation, and performance evaluation of anomaly detection models, four types of typical flight anomalies—throttle surge, course deviation, engine failure, and excessive pitch—have been injected into the normal data.

Categories:

Synthetic speech or audio deepfakes are increasingly threatening to information veracity and public trust. Though deepfake detection has seen a lot of interest, the scarcity of open-source datasets for Bengali speech hampers progress in this field. To fill this gap we present the Bengali Real and Deepfake Audio Dataset, which is a curated repository of real and fictitious speech data in Bengali language that can facilitate deepfake detection research and speaker forensics.

Categories:

This dataset consists of MAVLink message ID sequences collected from a UAV system operating in a Hardware-in-the-Loop (HITL) simulation environment. Communication between the UAV and the Ground Control Station (GCS) was monitored, and only MAVLink protocol packets were filtered from the captured traffic. From each MAVLink packet, the message ID was extracted and stored as a sequence in NumPy array format.

Categories:

This dataset contains EEG recordings collected from 103 Iranian children aged 6–10 years. According to DSM-5 diagnostic criteria, 49 participants were diagnosed with ADHD (22 females, 27 males), while 54 were healthy controls (24 females, 30 males). ADHD participants were recruited from clinical centers, and controls were selected from summer leisure centers in Mashhad, Iran. Only participants with IQ > 75 and without epilepsy or comorbid psychiatric disorders were included. No participant was taking medication at the time of EEG recording.

Categories: