MiniCPM-V4.0开源，多模态能力进化，手机可用，还有最全CookBook！-阿里云开发者社区

MiniCPM-V4.0开源，多模态能力进化，手机可用，还有最全CookBook！

2025-08-08 209

版权

本文内容由阿里云实名注册用户自发贡献，版权归原作者所有，阿里云开发者社区不拥有其著作权，亦不承担相应法律责任。具体规则请查看《阿里云开发者社区用户服务协议》和《阿里云开发者社区知识产权保护指引》。如果您发现本社区中有涉嫌抄袭的内容，填写侵权投诉表单进行举报，一经查实，本社区将立刻删除涉嫌侵权内容。

本文涉及的产品

交互式建模 PAI-DSW，每月250计算时 3个月

模型在线服务 PAI-EAS，A10/V100等 500元 1个月

模型训练 PAI-DLC，100CU*H 3个月

简介： 今天，面壁小钢炮新一代多模态模型 MiniCPM-V 4.0 正式开源。依靠 4B 参数，取得在 OpenCompass、OCRBench、MathVista 等多个榜单上取得了同级 SOTA 成绩，且实现了在手机上稳定、丝滑运行。此外，官方也正式开源了推理部署工具 MiniCPM-V CookBook，帮助开发者面向不同需求、不同场景、不同设备，均可实现开箱即用的轻量、简易部署。

今天，面壁小钢炮新一代多模态模型 MiniCPM-V 4.0 正式开源。依靠 4B 参数，取得 在 OpenCompass、OCRBench、MathVista 等多个榜单上取得了同级 SOTA 成绩，且 实现了在手机上稳定、丝滑运行。此外，官方也正式开源了 推理部署工具 MiniCPM-V CookBook，帮助开发者面向不同需求、不同场景、不同设备，均可实现开箱即用的轻量、简易部署。

话不多说，先来看看 MiniCPM-V 4.0 在手机上运行的惊艳效果。作为最适合在手机上运行的模型尺寸，MiniCPM-V 4.0 以 4B 的参数量真正做到了稳定运行、快速响应，且在手机、平板等设备长时间连续使用无发热、无卡顿。

📎001.mp4

📎002.mp4

目前，可支持 MiniCPM-V 4.0 本地部署的 IOS App 已开源，开发者可在 CookBook 中下载使用。

➤ 开源链接

Github： 🔗 https://github.com/OpenBMB/MiniCPM-o

Hugging Face: 🔗 https://huggingface.co/openbmb/MiniCPM-V-4

ModelScope: 🔗 https://modelscope.cn/models/OpenBMB/MiniCPM-V-4

CookBook: 🔗 https://github.com/OpenSQZ/MiniCPM-V-CookBook

4B 参数，综合性能达到同级SOTA

作为端侧多模态模型的新晋王者，MiniCPM-V 4.0 在 4B 参数量级的 PK 中，在单图、多图、视频理解等多模态能力上已达到同级 SOTA 级别。在 OpenCompass、OCRBench、MathVista、MMVet、MMBench V1.1、MMStar、AI2D、HallusionBench 等评测基准测试中，MiniCPM-V 4.0 综合性能均为同级最高。

其中，在 OpenCompass 测评中，MiniCPM-V 4.0 综合性能超过 Qwen2.5-VL 3B 模型和 InternVL2.5 4B 模型，甚至可比肩 GPT-4.1-mini、 Claude 3.5 Sonnet。相较于上一代 MiniCPM-V 2.6 的 8B 模型，MiniCPM-V 4.0 在 模型参数减半 的同时，多模态能力也实现了显著提升。

编辑

总的来说，MiniCPM-V 4.0 再一次验证了大模型“知识密度”定律 Densing Law，也再一次刷新了端侧多模态模型的能力上限。

低显存+快响应，打造端侧丝滑运行的模型标杆

之所以能在手机、PC 等端侧丝滑、流畅的完成实时视频理解、图像理解等任务，除了 MiniCPM-V 4.0 出色的效果以外，也得益于独特的模型结构设计，实现了同尺寸模型下可最快的首响时间与更低的显存占用。

经在 Apple M4 Metal 上测试，正常运行 MiniCPM-V 4.0 模型，显存占用仅为 3.33 GB，比Qwen2.5-VL 3B、Gemma 3-4B更低。

编辑

同样，在 Apple M4 Metal 上进行图片理解测试中，MiniCPM-V 4.0 模型借助 ANE + Metal 辅助加速，让首次响应时间大幅缩短，实现了同尺寸最佳，且随着输入的图片分辨率提高，首响时间快的优势更为明显。

编辑

此外，研究团队利用 2 张 4090 GPU 对模型并发量、吞吐量进行了测试。实验结果显示，在算力资源可支持的范围内，随着并发量的增加，MiniCPM-V 4.0 模型总吞吐量优势更为明显。例如在 256 并发用户需求下，MiniCPM-V 4.0 吞吐量高达 13856 tokens/s，远超 Qwen2.5-VL 的 7153 tokens/s、Gemma 3 的 7607 tokens/s。

编辑

CookBook 上线，面向各类场景轻松部署

为了广大的开发者群体能够方便部署并使用 MiniCPM-V 4.0 模型，官方与上海期智研究院首次系统开源了推理部署工具 MiniCPM-V CookBook，面向多种场景实现开箱即用的轻量部署，并提供详尽文档以降低部署门槛、加速落地。

CookBook: 🔗 https://github.com/OpenSQZ/MiniCPM-V-CookBook

编辑

MiniCPM-V CookBook 在框架兼容性上做到了“三端并举”，再次扩大了 MiniCPM-V 模型的用户群体。面向 个人开发者，可通过 llama.cpp 和 Ollama 等框架，在手机、平板、PC 等端侧实现部署，并完成图像问答或简易多模态实验；面对 企业侧的高并发场景，MiniCPM-V 与 vLLM、SGLang 高并发服务框架深度集成，获得高吞吐、低时延的稳定服务；而 学术与算法研究者 则可以基于 Hugging Face Transformers 等继续做二次开发、Prompt 注入和量化对比实验，快速验证新想法、分享复现实验。

编辑

同时，MiniCPM-V CookBook 不仅给出一键启动的 FastAPI 私有 Web-Demo，方便快速搭建 RAG 知识库或内部服务；还内置 GGUF、BNB 及 AutoAWQ 多条量化流水线，结合量化模型实现低资源高效部署；同时提供完整 iOS 示例，使端侧设备如 iPhone 和 Pad 上的实时多模态交互依旧保持“丝滑”体验。

模型实战

模型推理

from PIL import Image
import torch
from modelscope import AutoModel, AutoTokenizer
model_path = 'OpenBMB/MiniCPM-V-4'
model = AutoModel.from_pretrained(model_path, trust_remote_code=True,
                                  # sdpa or flash_attention_2, no eager
                                  attn_implementation='sdpa', torch_dtype=torch.bfloat16)
model = model.eval().cuda()
tokenizer = AutoTokenizer.from_pretrained(
    model_path, trust_remote_code=True)
image = Image.open('./assets/single.png').convert('RGB')
display(image.resize((400, 400)))
# First round chat 
question = "What is the landform in the picture?"
msgs = [{'role': 'user', 'content': [image, question]}]
answer = model.chat(
    msgs=msgs,
    image=image,
    tokenizer=tokenizer
)
print(answer)
# Second round chat, pass history context of multi-turn conversation
msgs.append({"role": "assistant", "content": [answer]})
msgs.append({"role": "user", "content": [
            "What should I pay attention to when traveling here?"]})
answer = model.chat(
    msgs=msgs,
    image=None,
    tokenizer=tokenizer
)
print(answer)

模型微调

我们介绍使用ms-swift对MiniCPM-V4.0进行自我认知训练。ms-swift是魔搭社区官方提供的大模型与多模态大模型训练部署框架。

ms-swift开源地址：https://github.com/modelscope/ms-swift

我们将展示可运行的微调demo，并给出自定义数据集的格式。

在开始微调之前，请确保您的环境已准备妥当。

# pip install git+https://github.com/modelscope/ms-swift.git
git clone https://github.com/modelscope/ms-swift.git
cd ms-swift
pip install -e .

如果您需要自定义数据集微调模型，你可以将数据准备成以下格式。

{"messages": [{"role": "user", "content": "<image><image>What is the difference between the two images?"}, {"role": "assistant", "content": "The first one is a kitten, and the second one is a puppy."}], "images": ["/xxx/x.jpg", "/xxx/x.png"]}

训练脚本：

# 10.5GiB
CUDA_VISIBLE_DEVICES=0 \
MAX_PIXELS=1003520 \
swift sft \
    --model OpenBMB/MiniCPM-V-4 \
    --dataset 'AI-ModelScope/LaTeX_OCR:human_handwrite#20000' \
    --split_dataset_ratio 0.01 \
    --train_type lora \
    --torch_dtype bfloat16 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-4 \
    --lora_rank 8 \
    --lora_alpha 32 \
    --target_modules all-linear \
    --freeze_vit true \
    --gradient_accumulation_steps 16 \
    --eval_steps 50 \
    --save_steps 50 \
    --save_total_limit 2 \
    --logging_steps 5 \
    --max_length 2048 \
    --output_dir output \
    --warmup_ratio 0.05 \
    --dataloader_num_workers 4

训练完成后，使用以下命令进行推理：

CUDA_VISIBLE_DEVICES=0 \
swift infer \
    --model output/vx-xxx/checkpoint-xxx \
    --stream true \
    --load_data_args true \
    --max_new_tokens 2048

编辑

推送模型到ModelScope：

swift export \
    --model output/vx-xxx/checkpoint-xxx \
    --push_to_hub true \
    --hub_model_id '<your-model-id>' \
    --hub_token '<your-sdk-token>'