Energy Consumption and Runtime Traces of Lightweight LLM Inference
- Citation Author(s):
- Submitted by:
- Berlin V
- Last updated:
- DOI:
- 10.21227/dbxv-dz97
- Data Format:
- Categories:
- Keywords:
Abstract
This dataset contains runtime and energy consumption measurements for multiple lightweight large language models (LLMs) under varying inference configurations. Experiments were conducted on a CPU–GPU hybrid system equipped with an NVIDIA GeForce RTX 3060 (8 GB VRAM) GPU and an Intel i7 CPU. The evaluated models include GPT-2, Falcon-rw-1B, LLaMA-3.2-1B, and DeepSeek-R1-Distill-Qwen-1.5B, along with their quantized variants.
Energy consumption was profiled using PyJoules, which integrates with pynvml (for GPU) and Intel RAPL (for CPU) to collect fine-grained hardware-level energy data during inference.
Each inference trace corresponds to a unique combination of:
- Input token length T_in,
- Output token length T_out
- Quantization threshold Qn ∈ {1,2,4,6}
Token lengths vary in powers of two (T_in, T_out ∈ {8,16,32,64,128,256,512}.
Each configuration was executed 25 times in randomized order for robustness, yielding 24,500 total traces across all models.
The dataset provides a reproducible benchmark for analyzing LLM inference efficiency, scalability, and energy-performance trade-offs under hardware constraints.
Instructions:
Users can load the dataset using standard data analysis tools (e.g., Python/pandas). Regression or ML models can be fitted to predict runtime and energy consumption across LLM configurations. The dataset also supports workload scheduling research. For example, optimizing the model or quantization selection to minimize total energy use while maintaining performance. Do not forget to cite us :)