VLLM/sglang lm_eval MMLU等准确度评测

Luchang-Li

已于 2025-06-30 19:35:07 修改

阅读量307

点赞数 4

CC 4.0 BY-SA版权

分类专栏：大模型 VLLM/sglang 推理引擎文章标签：大模型评测 MMLU 准确度

于 2025-05-29 08:18:10 首次发布

本文链接：https://blog.csdn.net/u013701860/article/details/148295675

推理引擎同时被 3 个专栏收录

36 篇文章

订阅专栏

大模型

25 篇文章

订阅专栏

VLLM/sglang

12 篇文章

订阅专栏

Ref

https://github.com/EleutherAI/lm-evaluation-harness

如何使用lm-evaluation-harness零代码评估大模型

【GPT】中文大语言模型梳理与测评（C-Eval 、AGIEval、MMLU、SuperCLUE）_superclue c-eval哪个更权威-CSDN博客

lm_eval评测方法

# pip install lm-eval
pip install lm-eval[api]

源码安装

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness
pip install -e .

查看支持的评测任务

lm-eval --tasks list

VLLM/SGLang serving API评测

lm_eval \
--model local-completions \
--tasks mmlu \
--batch_size=8 \
--model_args '{"model": "Qwen/Qwen3-8B-FP8", "base_url": "http://localhost:8000/v1/completions", "num_concurrent": 8}'

这个可以评测VLLM/SGLang启动的serving服务提供的api接口，从而评测VLLM/sglang不同部署方案的效果。