Energy Consumption and Runtime Traces of Lightweight LLM Inference

Citation Author(s):: Aadhith M (Vellore Institute of Technology)

Gopal Singh S (Vellore Institute of Technology)
Submitted by:: Berlin V
Last updated:: Sun, 10/26/2025 - 18:30
DOI:: 10.21227/dbxv-dz97
Data Format:: *.csv

37 views

Categories:

Electric Utility

Keywords:

large language models

ACCESS DATASET CITE

Abstract

This dataset contains runtime and energy consumption measurements for multiple lightweight large language models (LLMs) under varying inference configurations. Experiments were conducted on a CPU–GPU hybrid system equipped with an NVIDIA GeForce RTX 3060 (8 GB VRAM) GPU and an Intel i7 CPU. The evaluated models include GPT-2, Falcon-rw-1B, LLaMA-3.2-1B, and DeepSeek-R1-Distill-Qwen-1.5B, along with their quantized variants.

Energy consumption was profiled using PyJoules, which integrates with pynvml (for GPU) and Intel RAPL (for CPU) to collect fine-grained hardware-level energy data during inference.

Each inference trace corresponds to a unique combination of:

Input token length T_in,
Output token length T_out
Quantization threshold Qn ∈ {1,2,4,6}

Token lengths vary in powers of two (T_in, T_out ∈ {8,16,32,64,128,256,512}.
Each configuration was executed 25 times in randomized order for robustness, yielding 24,500 total traces across all models.

The dataset provides a reproducible benchmark for analyzing LLM inference efficiency, scalability, and energy-performance trade-offs under hardware constraints.

Instructions:

Users can load the dataset using standard data analysis tools (e.g., Python/pandas). Regression or ML models can be fitted to predict runtime and energy consumption across LLM configurations. The dataset also supports workload scheduling research. For example, optimizing the model or quantization selection to minimize total energy use while maintaining performance. Do not forget to cite us :)