ONNX Runtime 推理示例：使用 CPU 量化 MobileNet V2 模型

原创于 2025-06-30 09:16:09 发布 · 243 阅读

CC 4.0 BY-SA版权

ONNX Runtime 推理示例：使用 CPU 量化 MobileNet V2 模型

引言

在深度学习模型部署中，模型量化是一种重要的优化技术，它能够显著减少模型大小并提高推理速度，同时保持较高的准确率。本文将详细介绍如何使用 ONNX Runtime 对 MobileNet V2 模型进行量化，并在 CPU 上运行量化后的模型。

环境准备

在开始之前，我们需要准备以下环境：

Python 3.8 环境
Jupyter Notebook
必要的 Python 包：
- PyTorch 1.8.0
- torchvision 0.9.0
- ONNX Runtime 1.8.0
- ONNX
- Pillow

可以通过以下命令安装这些依赖：

pip install torch==1.8.0 torchvision==0.9.0 torchaudio===0.8.0
pip install onnxruntime==1.8.0
pip install onnx
pip install pillow

1. 加载预训练模型并导出为 ONNX 格式

1.1 加载预训练 MobileNet V2 模型

MobileNet V2 是一种轻量级的卷积神经网络，特别适合移动端和嵌入式设备。我们可以直接从 torchvision 加载预训练好的模型：

from torchvision import models
mobilenet_v2 = models.mobilenet_v2(pretrained=True)

1.2 导出模型为 ONNX 格式

ONNX (Open Neural Network Exchange) 是一种开放的模型表示格式，支持跨框架的模型转换和部署。我们将 PyTorch 模型导出为 ONNX 格式：

import torch

# 定义输入尺寸
image_height = 224
image_width = 224

# 创建随机输入张量
x = torch.randn(1, 3, image_height, image_width, requires_grad=True)

# 导出模型
torch.onnx.export(mobilenet_v2,              # 要导出的模型
                 x,                         # 模型输入
                 "mobilenet_v2_float.onnx",  # 输出文件名
                 export_params=True,        # 导出训练好的参数
                 opset_version=12,          # ONNX 版本
                 do_constant_folding=True,  # 执行常量折叠优化
                 input_names=['input'],     # 输入节点名称
                 output_names=['output'])   # 输出节点名称

2. 运行 ONNX 模型示例

2.1 图像预处理

在运行模型前，我们需要对输入图像进行预处理：

from PIL import Image
import numpy as np

def preprocess_image(image_path, height, width, channels=3):
    # 打开并调整图像大小
    image = Image.open(image_path)
    image = image.resize((width, height), Image.ANTIALIAS)
    
    # 转换为 numpy 数组并归一化
    image_data = np.asarray(image).astype(np.float32)
    image_data = image_data.transpose([2, 0, 1])  # 转换为 CHW 格式
    
    # 标准化处理
    mean = np.array([0.079, 0.05, 0]) + 0.406
    std = np.array([0.005, 0, 0.001]) + 0.224
    for channel in range(image_data.shape[0]):
        image_data[channel, :, :] = (image_data[channel, :, :] / 255 - mean[channel]) / std[channel]
    
    # 添加 batch 维度
    image_data = np.expand_dims(image_data, 0)
    return image_data

2.2 下载 ImageNet 类别标签

# 下载 ImageNet 类别标签
!curl -o imagenet_classes.txt https://raw.githubusercontent.com/pytorch/hub/master/imagenet_classes.txt

# 读取类别
with open("imagenet_classes.txt", "r") as f:
    categories = [s.strip() for s in f.readlines()]

2.3 运行推理示例

import onnxruntime

# 创建 ONNX Runtime 会话
session_fp32 = onnxruntime.InferenceSession("mobilenet_v2_float.onnx")

def softmax(x):
    """计算 softmax 值"""
    e_x = np.exp(x - np.max(x))
    return e_x / e_x.sum()

def run_sample(session, image_file, categories):
    # 运行推理
    output = session.run([], {'input': preprocess_image(image_file, image_height, image_width)})[0]
    output = output.flatten()
    output = softmax(output)  # 可选：转换为概率
    
    # 获取 top5 预测结果
    top5_catid = np.argsort(-output)[:5]
    for catid in top5_catid:
        print(categories[catid], output[catid])

# 运行示例
run_sample(session_fp32, 'cat.jpg', categories)

3. 使用 ONNX Runtime 量化模型

3.1 实现校准数据读取器

量化过程需要校准数据来确定量化参数。我们需要实现一个 CalibrationDataReader：

from onnxruntime.quantization import CalibrationDataReader
import os

class MobilenetDataReader(CalibrationDataReader):
    def __init__(self, calibration_image_folder):
        self.image_folder = calibration_image_folder
        self.preprocess_flag = True
        self.enum_data_dicts = []
        self.datasize = 0

    def get_next(self):
        if self.preprocess_flag:
            self.preprocess_flag = False
            # 预处理校准数据
            nhwc_data_list = preprocess_func(self.image_folder, image_height, image_width, size_limit=0)
            self.datasize = len(nhwc_data_list)
            self.enum_data_dicts = iter([{'input': nhwc_data} for nhwc_data in nhwc_data_list])
        return next(self.enum_data_dicts, None)

3.2 执行静态量化

from onnxruntime.quantization import quantize_static, QuantType

# 指定校准数据目录
calibration_data_folder = "calibration_imagenet"
dr = MobilenetDataReader(calibration_data_folder)

# 执行量化
quantize_static('mobilenet_v2_float.onnx',
                'mobilenet_v2_uint8.onnx',
                dr)

# 比较模型大小
print('ONNX 全精度模型大小 (MB):', os.path.getsize("mobilenet_v2_float.onnx")/(1024*1024))
print('ONNX 量化模型大小 (MB):', os.path.getsize("mobilenet_v2_uint8.onnx")/(1024*1024))

3.3 运行量化模型

# 创建量化模型会话
session_quant = onnxruntime.InferenceSession("mobilenet_v2_uint8.onnx")

# 运行示例
run_sample(session_quant, 'cat.jpg', categories)

结果分析

量化后的模型通常会有以下优势：

模型大小减小：从浮点模型到8位整型量化，模型大小通常可以减少约75%。
推理速度提升：量化模型在CPU上的推理速度通常会有显著提升。
内存占用降低：量化模型运行时需要的内存更少。

然而，量化可能会带来轻微的精度损失，这需要通过适当的校准数据和量化参数来最小化。

结论

本文详细介绍了使用 ONNX Runtime 对 MobileNet V2 模型进行量化的完整流程。通过量化，我们可以显著减小模型大小并提高推理速度，这对于资源受限的设备尤其重要。在实际应用中，建议使用代表性的校准数据集来获得最佳的量化效果。

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考