PiscCode + Mediapipe 封装高效对象检测：从单帧到可视化-CSDN博客

在计算机视觉项目中，对象检测（Object Detection） 是最常见的任务之一。无论是视频监控、无人驾驶还是智能家居，快速且高效的检测模型都是必备的工具。而在 Python 生态中，Mediapipe 提供了轻量、高性能的检测方案。今天，我们就来讲讲如何用 Mediapipe Python 封装一个简洁的 FrameObject 类，实现单帧对象检测和可视化。

1. 项目背景

在实际开发中，我们常遇到这样的需求：

对视频或图片进行快速对象检测；
在每个目标上绘制矩形框，并标注类别和置信度；
支持每个类别固定颜色，方便视频中跟踪观察；
代码可复用、易扩展。

如果直接调用 Mediapipe 的 API，每次都需要重复编写检测、转换图像、绘制框的逻辑，非常繁琐。因此，将这些逻辑封装成一个类，不仅能提升代码整洁性，还能快速集成到任何项目中。

2. 依赖安装

本示例使用 Mediapipe Tasks Python API，需要安装以下库：

pip install mediapipe

mediapipe 提供了对象检测模型和 API。

此外，我们还需要准备一个 TFLite 检测模型，例如 EfficientDet Lite0。

3. `FrameObject` 类封装设计

封装思路如下：

初始化对象检测器
使用 vision.ObjectDetector 创建检测器，并支持传入 score_threshold。
检测并绘制结果
- 将 OpenCV 的 BGR 图像转为 Mediapipe 支持的 mp.Image；
- 调用 detector.detect 获取检测结果；
- 对每个检测框进行绘制，包括矩形框和标签。
增强可视化
- 每个类别分配固定随机颜色；
- 标签字体加大，带背景条，并显示在框内左上角；
- 防止文字超出图像边界。

4. 核心代码解析

# 类别颜色管理
self.category_colors = {}
def _get_color(self, category_name: str):
    if category_name not in self.category_colors:
        self.category_colors[category_name] = (
            random.randint(50, 255),
            random.randint(50, 255),
            random.randint(50, 255),
        )
    return self.category_colors[category_name]

这里我们为每个类别分配一个固定颜色，保证视频中相同类别始终使用同一种颜色，便于观察。

# 绘制检测框和标签
def _draw_detections(self, frame, detections):
    for det in detections:
        bbox = det.bounding_box
        start = (bbox.origin_x, bbox.origin_y)
        end = (bbox.origin_x + bbox.width, bbox.origin_y + bbox.height)
        color = self._get_color(det.categories[0].category_name)
        cv2.rectangle(frame, start, end, color, 2)
        # 绘制标签
        label = f"{det.categories[0].category_name} {det.categories[0].score:.2f}"
        cv2.putText(frame, label, (bbox.origin_x, bbox.origin_y + 20),
                    cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255,255,255), 2)

_draw_detections 是核心函数，负责在每帧图像上绘制检测结果，包括矩形框和标签文字。

# 处理单帧图像
def do(self, frame, device=None):
    if frame is None:
        return None
    mp_image = mp.Image(image_format=mp.ImageFormat.SRGB,
                        data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))
    detection_result = self.detector.detect(mp_image)
    if not detection_result.detections:
        return frame
    return self._draw_detections(frame, detection_result.detections)

do 方法接收单帧图像，返回带检测结果的图像，可以直接用于视频逐帧处理。

5. 优化点与扩展

类别固定颜色：便于视频中连续观察对象。
文字背景条：增强可读性，避免背景复杂导致标签难辨。

import cv2
import mediapipe as mp
import random
from mediapipe.tasks import python
from mediapipe.tasks.python import vision


class FrameObject:
    def __init__(self, model_path="模型文件地址/efficientdet_lite0.tflite",
                 score_threshold=0.5):
        """初始化 Mediapipe ObjectDetector"""
        base_options = python.BaseOptions(model_asset_path=model_path)
        options = vision.ObjectDetectorOptions(
            base_options=base_options,
            score_threshold=score_threshold
        )
        self.detector = vision.ObjectDetector.create_from_options(options)

        # 类别颜色表（固定每个类别的颜色）
        self.category_colors = {}

    def _get_color(self, category_name: str):
        """为每个类别分配固定的随机颜色"""
        if category_name not in self.category_colors:
            self.category_colors[category_name] = (
                random.randint(50, 255),
                random.randint(50, 255),
                random.randint(50, 255),
            )
        return self.category_colors[category_name]

    def _draw_detections(self, frame, detections):
        """在帧上绘制检测结果"""
        annotated = frame.copy()
        for det in detections:
            bbox = det.bounding_box
            start = (bbox.origin_x, bbox.origin_y)
            end = (bbox.origin_x + bbox.width, bbox.origin_y + bbox.height)

            if not det.categories:
                continue

            category = det.categories[0]
            label = f"{category.category_name} {category.score:.2f}"
            color = self._get_color(category.category_name)

            # 绘制检测框
            cv2.rectangle(annotated, start, end, color, 2)

            # 获取文字大小
            (tw, th), baseline = cv2.getTextSize(label,
                                                 cv2.FONT_HERSHEY_SIMPLEX,
                                                 1.0, 2)

            # 文字坐标（框内左上角）
            text_x = bbox.origin_x
            text_y = bbox.origin_y + th + 2

            # 防止文字超出图像边界
            text_y = min(text_y, frame.shape[0] - baseline - 2)

            # 绘制文字背景
            cv2.rectangle(annotated,
                          (text_x, text_y - th - baseline),
                          (text_x + tw, text_y + baseline),
                          color, -1)

            # 绘制文字（白色）
            cv2.putText(annotated, label, (text_x, text_y),
                        cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 255, 255), 2)
        return annotated

    def do(self, frame, device=None):
        """处理单帧图像，返回带检测框的帧"""
        if frame is None:
            return None

        # 转 Mediapipe Image
        mp_image = mp.Image(image_format=mp.ImageFormat.SRGB,
                            data=cv2.cvtColor(frame, cv2.COLOR_BGR2RGB))

        # 执行检测
        detection_result = self.detector.detect(mp_image)

        if not detection_result.detections:
            return frame

        # 绘制检测框和标签
        return self._draw_detections(frame, detection_result.detections)