实例分割训练yolov5实例分割模型详细记录 rk3588部署yolov5实例分割模型全程记录

郭庆汝

已于 2025-07-18 10:37:55 修改

阅读量335

点赞数 4

CC 4.0 BY-SA版权

文章标签： yolov5实例分割螺丝螺母分割数据集

于 2025-07-15 18:22:18 首次发布

本文链接：https://blog.csdn.net/guoqingru0311/article/details/149354857

实例分割训练yolov5实例分割模型详细记录 rk3588部署yolov5实例分割模型全程记录

数据集标注
设置数据集配置文件
设置模型配置文件
开启模型训练
模型测试
rk3588开发板端部署yolov5分割模型

本文训练的事yolov5-v7实例分割模型的详细记录，
关于yolov5-v7的官方代码可从官方githu下载

https://github.com/ultralytics/yolov5/tree/v7.0

由于我的实际场景需要部署在rk3588开发板上，所以使用的是瑞芯微官方github的代码

# 克隆master分支
https://github.com/airockchip/yolov5?tab=readme-ov-file

说明：瑞芯微官方的yolov5代码是在yolov5官方（v7.0）代码基础上修改而来，如果想要部署在rk3588等系列开发板上的话，使用瑞芯微官方提供的代码可以直接训练然后导出，不用再进行相关修改，而使用yolo官方提供的代码需要进行相关的修改，两个项目训练流程并无差异。

# 瑞芯微官方相关的model zoo
https://github.com/airockchip/rknn_model_zoo/tree/main/examples/yolov5_seg

数据集标注

制作分割数据集需要利用到labelme软件，我用的版本是3.16.7
虚拟环境中运行下属指令：

# ubutnu
pip install pycocotools
# Windows
pip install pycocotools-windows

# 安装3.16.7
pip install labelme==3.16.7

运行labelme

labelme

在这里插入图片描述
得到的json文件格式如下所示：

将标注文件有.json格式转换为.txt格式

在这里插入图片描述

import json
import glob
import os

import cv2
import numpy as np

json_path = r"/home/data/project/customer_AAA/rk3588/yolov5-airockchip/Data_save/data_nut_bolt"; #此处填写存放json文件的地址

txt_dir=r"/home/data/project/customer_AAA/rk3588/yolov5-airockchip/Data_save/txts"  # 用于存放生成的txt文件的目录

labels = ['bolt','nut']#此处填写你标注的标签名称   除了修改这里，还需要修改下面的for循环中类别相关内容
json_files = glob.glob(json_path + "/*.json")

for json_file in json_files:
    print(json_file)
    f = open(json_file)
    json_info = json.load(f)
    # print(json_info.keys())
    img = cv2.imread(os.path.join(json_path, json_info["imagePath"]))
    height, width, _ = img.shape

    np_w_h = np.array([[width, height]], np.int32)

    file_name = os.path.basename(json_file)

    txt_name = file_name.replace(".json", ".txt")


    txt_path=os.path.join(txt_dir,txt_name)


    f = open(txt_path, "w")
    txt_content = ""
    for point_json in json_info["shapes"]:
        np_points = np.array(point_json["points"], np.int32)
        norm_points = np_points / np_w_h
        norm_points_list = norm_points.tolist()
        if point_json['label'] == labels[0]:
            txt_content += "0 " + " ".join([" ".join([str(cell[0]), str(cell[1])]) for cell in norm_points_list]) + "\n"
        elif point_json['label'] == labels[1]:
            txt_content += "1 " + " ".join([" ".join([str(cell[0]), str(cell[1])]) for cell in norm_points_list]) + "\n"

    f.write(txt_content)

在这里插入图片描述

转换完成后，会得到对应于每个图像的**.txt**文件
在这里插入图片描述
每一行的数据含义如下所示：

类别标签  归一化后的点坐标

在这里插入图片描述
数据集划分：
执行如下代码：

import os
import random
import shutil

rootpath = r'/home/data/project/customer_AAA/rk3588/yolov5-airockchip/Data_save/data_nut_bolt_result/'#此处为img和txt文件夹存放位置，地址后面要有/结尾

set1 = ['images','labels']
set2 = ['train','val']
for s1 in set1:
    if not os.path.exists(rootpath+s1):
        os.mkdir(rootpath+s1)
    for s2 in set2:
        if not os.path.exists(rootpath+s1+'/'+s2):
            os.mkdir(rootpath+s1+'/'+s2)


# 这是原始图片路径
img_path = rootpath+'img'
# 这是生成的txt路径
txt_path = rootpath+'txt'
file_names = os.listdir(img_path)
l = 0.8
n = len(file_names)
train_files = random.sample(file_names, int(n*l))
for file in file_names:
    print(file)
    if not os.path.exists(txt_path+'/'+file[:-3]+'txt'):
        os.remove(img_path+'/'+file)
        print(file[:-3]+'txt,不存在')
        continue
    if file in train_files:
        shutil.copy(img_path+'/'+file,rootpath+'images/train/'+file)
        shutil.copy(txt_path+'/'+file[:-3]+'txt',rootpath+'labels/train/'+file[:-3]+'txt')
    else:
        shutil.copy(img_path+'/'+file,rootpath+'images/val/'+file)
        shutil.copy(txt_path+'/'+file[:-3]+'txt',rootpath+'labels/val/'+file[:-3]+'txt')
print('ok!!')
print(len(train_files))

然后再同级目录下新建一个data_nut_bolt_result目录，将仅仅包含图像数据的文件目录（img）与生成的包含.txt文件的目录（txt）放置在该目录下

上述代码中，需要注意rootpath的填写路径，因为路径用到的地方是字符串拼接
执行完毕后会在同级目录下生成对应的目录文件
在这里插入图片描述

设置数据集配置文件

在目录yolov5/data下
将coco128-seg.yaml配置文件备份，命名为coco128-seg_nut_bolt.yaml

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
# COCO128-seg dataset https://www.kaggle.com/ultralytics/coco128 (first 128 images from COCO train2017) by Ultralytics
# Example usage: python train.py --data coco128.yaml
# parent
# ├── yolov5
# └── datasets
#     └── coco128-seg  ← downloads here (7 MB)


# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: /home/data/project/customer_AAA/rk3588/yolov5-airockchip/Data_seg/data_nut_bolt  # dataset root dir
train: images/train  # train images (relative to 'path') 128 images
val: images/val  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes
names:
  0: bolt
  1: nut

由于我的项目是进行螺丝、螺母分割，所以填写的识别类别是两类。

设置模型配置文件

在目录yolov5/models下，拷贝yolov5m-seg.yaml文件重命名为yolov5m-seg.yaml

(python3.8-tk2-2.0) root@f95e42ca3a90:/home/data/project/customer_AAA/rk3588/yolov5-airockchip/models# tree
.
|-- __init__.py
|-- __pycache__
|   |-- __init__.cpython-38.pyc
|   |-- common.cpython-38.pyc
|   |-- common_rk_plug_in.cpython-38.pyc
|   |-- experimental.cpython-38.pyc
|   `-- yolo.cpython-38.pyc
|-- common.py
|-- common_rk_plug_in.py
|-- experimental.py
|-- hub
|   |-- anchors.yaml
|   |-- yolov3-spp.yaml
|   |-- yolov3-tiny.yaml
|   |-- yolov3.yaml
|   |-- yolov5-bifpn.yaml
|   |-- yolov5-fpn.yaml
|   |-- yolov5-p2.yaml
|   |-- yolov5-p34.yaml
|   |-- yolov5-p6.yaml
|   |-- yolov5-p7.yaml
|   |-- yolov5-panet.yaml
|   |-- yolov5l6.yaml
|   |-- yolov5m6.yaml
|   |-- yolov5n6.yaml
|   |-- yolov5s-LeakyReLU.yaml
|   |-- yolov5s-ghost.yaml
|   |-- yolov5s-transformer.yaml
|   |-- yolov5s6.yaml
|   `-- yolov5x6.yaml
|-- segment
|   |-- yolov5l-seg.yaml
|   |-- yolov5m-seg.yaml
|   |-- yolov5n-seg.yaml
|   |-- yolov5n-seg_nut_bolt.yaml
|   |-- yolov5s-seg.yaml
|   `-- yolov5x-seg.yaml
|-- tf.py
|-- yolo.py
|-- yolov5l.yaml
|-- yolov5m.yaml
|-- yolov5n.yaml
|-- yolov5s.yaml
|-- yolov5s_2007_2012.yaml
`-- yolov5x.yaml

配置文件内容如下所示：

在这里插入图片描述

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license

# Parameters
nc: 2  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.25  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, Conv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, Conv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, Conv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, Conv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, Conv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, Conv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, Conv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Segment, [nc, anchors, 32, 256]],  # Detect(P3, P4, P5)
  ]

在这里插入图片描述

开启模型训练

在项目根目录下执行以下指令开启模型训练：

torchrun --nproc-per-node 2  ./segment/train.py  --workers 0 --data data/coco128-seg_nut_bolt.yaml  --cfg models/segment/yolov5n-seg_nut_bolt.yaml  --img 640  --weights yolov5n-seg.pt --batch-size 128 --epochs 100

训练开始，绘制的图像如下所示：
在这里插入图片描述

在这里插入图片描述
训练的各个阶段曲线如下所示：

模型验证集统计数据如下所示：

Validating runs/train-seg/exp/weights/best.pt...
Fusing layers... 
YOLOv5n-seg_nut_bolt summary: 224 layers, 1881103 parameters, 0 gradients, 6.7 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95)     Mask(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 00:03
                   all         85        604      0.995      0.998      0.994      0.841      0.972      0.975      0.962      0.736
                  bolt         85        300      0.994      0.997      0.995      0.856      0.947       0.95       0.93      0.641
                   nut         85        304      0.997          1      0.993      0.826      0.997          1      0.993      0.831
Results saved to runs/train-seg/exp

模型测试

执行如下检测指令

python segment/predict.py --weights runs/train-seg/exp/weights/best.pt   --source Data_seg/data_nut_bolt/images/val/6.jpg

检测效果如下所示：
在这里插入图片描述

rk3588开发板端部署yolov5分割模型

将训练的模型转换成.rknn格式的模型文件
**注意：**上述模型是基于瑞芯微官方的yolov5（7.0）版本训练得到，所以直接可以进行转换成onnx格式，如果是yolo官方的yolov5代码，则需要改动相关的结构再转换

先转换成onnx格式，执行代码如下：

# 项目根目录下执行，得到onnx文件
python export.py --rknpu --weight runs/train-seg/exp/weights/yolov5n-seg.pt

在这里插入图片描述

再转换成.rknn格式文件
依托于瑞芯微github官方项目

https://github.com/airockchip/rknn_model_zoo/tree/v2.0.0

采用官方提供的yolov5-seg示例
其中yolov5-seg项目目录结构如下所示：

(python3.8-tk2-2.0) root@f95e42ca3a90:/home/data/project/customer_AAA/rk3588/rknn_model_zoo-2.0.0/examples/yolov5_seg# tree
.
|-- README.md
|-- cpp
|   |-- CMakeLists.txt
|   |-- easy_timer.h
|   |-- main.cc
|   |-- postprocess.h
|   |-- rknpu1
|   |   |-- postprocess.cc
|   |   `-- yolov5_seg.cc
|   |-- rknpu2
|   |   |-- postprocess.cc
|   |   `-- yolov5_seg.cc
|   `-- yolov5_seg.h
|-- model
|   |-- anchors_yolov5.txt
|   |-- bus.jpg
|   |-- coco_80_labels_list.txt
|   `-- download_model.sh
|-- model_comparison
|   |-- yolov5_seg_graph_comparison.jpg
|   `-- yolov5_seg_output_comparison.jpg
|-- python
|   |-- convert.py
|   `-- yolov5_seg.py
`-- reference_results
    |-- yolov5s_seg_c_demo_result.png
    `-- yolov5s_seg_python_demo_result.png

7 directories, 20 files

将上述转换得到的yolov5n-seg.onnx文件拷贝至rknn_model_zoo-2.0.0/examples/yolov5_seg/python目录下：

python convert.py ../model/yolov5s-seg.onnx rk3588

对convert.py脚本进行修改，最终如下所示

import sys
from rknn.api import RKNN

DATASET_PATH = './data_list.txt'    # 测试图像路径列表
DEFAULT_RKNN_PATH = './yolov5_seg.rknn'
DEFAULT_QUANT = True

def parse_arg():
    if len(sys.argv) < 3:
        print("Usage: python3 {} onnx_model_path [platform] [dtype(optional)] [output_rknn_path(optional)]".format(sys.argv[0]));
        print("       platform choose from [rk3562, rk3566, rk3568, rk3588, rk1808, rv1109, rv1126]")
        print("       dtype choose from    [i8, fp] for [rk3562,rk3566,rk3568,rk3588]")
        print("       dtype choose from    [u8, fp] for [rk1808,rv1109,rv1126]")
        exit(1)

    model_path = sys.argv[1]
    platform = sys.argv[2]

    do_quant = DEFAULT_QUANT
    if len(sys.argv) > 3:
        model_type = sys.argv[3]
        if model_type not in ['i8', 'u8', 'fp']:
            print("ERROR: Invalid model type: {}".format(model_type))
            exit(1)
        elif model_type in ['i8', 'u8']:
            do_quant = True
        else:
            do_quant = False

    if len(sys.argv) > 4:
        output_path = sys.argv[4]
    else:
        output_path = DEFAULT_RKNN_PATH

    return model_path, platform, do_quant, output_path

if __name__ == '__main__':
    model_path, platform, do_quant, output_path = parse_arg()

    # Create RKNN object
    rknn = RKNN(verbose=False)

    # Pre-process config
    print('--> Config model')
    rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]], target_platform=platform)
    print('done')

    # Load model
    print('--> Loading model')
    ret = rknn.load_onnx(model=model_path)
    if ret != 0:
        print('Load model failed!')
        exit(ret)
    print('done')

    # Build model
    print('--> Building model')
    ret = rknn.build(do_quantization=do_quant, dataset=DATASET_PATH)
    if ret != 0:
        print('Build model failed!')
        exit(ret)
    print('done')

    # Export rknn model
    print('--> Export rknn model')
    ret = rknn.export_rknn(output_path)
    if ret != 0:
        print('Export rknn model failed!')
        exit(ret)
    print('done')

    # Release
    rknn.release()

执行如下指令：

python convert.py ./yolov5n-seg.onnx   rk3588

在这里插入图片描述

测试脚本：

# 加载onnx文件测试
python yolov5_seg.py --model_path  yolov5n-seg.onnx  --img_folder data_set     --img_save

在这里插入图片描述

分割效果一般，估计训练100轮次还是不够，哈哈哈

如果服务器与rk3588链接的话，便可在服务器终端执行如下指令，采用开发板npu推理模型。

我的操作是将项目rknn_model_zoo-2.0.0上传到开发板端，模型放置于rknn_model_zoo-2.0.0/examples/yolov5_seg/python目录下
模型文件上传到**rknn_model_zoo-2.0.0/examples/yolov5_seg/model/**目录下，执行python脚本

检测本地目录下的图像：

python yolov5_seg_gqr.py --model_path   yolov5s_seg.rknn --img_folder ./test/  --img_save

在这里插入图片描述


import os
import cv2
import sys
import argparse
import numpy as np
from pathlib import Path
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import torch.nn.functional as F
import torch
import torchvision

from rknnlite.api import RKNNLite
from copy import copy



class Letter_Box_Info():
    def __init__(self, shape, new_shape, w_ratio, h_ratio, dw, dh, pad_color) -> None:
        self.origin_shape = shape
        self.new_shape = new_shape
        self.w_ratio = w_ratio
        self.h_ratio = h_ratio
        self.dw = dw 
        self.dh = dh
        self.pad_color = pad_color

class COCO_test_helper():
    def __init__(self, enable_letter_box = False) -> None:
        self.record_list = []
        self.enable_ltter_box = enable_letter_box
        if self.enable_ltter_box is True:
            self.letter_box_info_list = []
        else:
            self.letter_box_info_list = None

    def letter_box(self, im, new_shape, pad_color=(0,0,0), info_need=False):
        # Resize and pad image while meeting stride-multiple constraints
        shape = im.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

        # Compute padding
        ratio = r  # width, height ratios
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

        dw /= 2  # divide padding into 2 sides
        dh /= 2

        if shape[::-1] != new_unpad:  # resize
            im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
        im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=pad_color)  # add border
        
        if self.enable_ltter_box is True:
            self.letter_box_info_list.append(Letter_Box_Info(shape, new_shape, ratio, ratio, dw, dh, pad_color))
        if info_need is True:
            return im, ratio, (dw, dh)
        else:
            return im

    def direct_resize(self, im, new_shape, info_need=False):
        shape = im.shape[:2]
        h_ratio = new_shape[0]/ shape[0]
        w_ratio = new_shape[1]/ shape[1]
        if self.enable_ltter_box is True:
            self.letter_box_info_list.append(Letter_Box_Info(shape, new_shape, w_ratio, h_ratio, 0, 0, (0,0,0)))
        im = cv2.resize(im, (new_shape[1], new_shape[0]))
        return im

    def get_real_box(self, box, in_format='xyxy'):
        bbox = copy(box)
        if self.enable_ltter_box == True:
        # unletter_box result
            if in_format=='xyxy':
                bbox[:,0] -= self.letter_box_info_list[-1].dw
                bbox[:,0] /= self.letter_box_info_list[-1].w_ratio
                bbox[:,0] = np.clip(bbox[:,0], 0, self.letter_box_info_list[-1].origin_shape[1])

                bbox[:,1] -= self.letter_box_info_list[-1].dh
                bbox[:,1] /= self.letter_box_info_list[-1].h_ratio
                bbox[:,1] = np.clip(bbox[:,1], 0, self.letter_box_info_list[-1].origin_shape[0])

                bbox[:,2] -= self.letter_box_info_list[-1].dw
                bbox[:,2] /= self.letter_box_info_list[-1].w_ratio
                bbox[:,2] = np.clip(bbox[:,2], 0, self.letter_box_info_list[-1].origin_shape[1])

                bbox[:,3] -= self.letter_box_info_list[-1].dh
                bbox[:,3] /= self.letter_box_info_list[-1].h_ratio
                bbox[:,3] = np.clip(bbox[:,3], 0, self.letter_box_info_list[-1].origin_shape[0])
        return bbox

    def get_real_seg(self, seg):
        #! fix side effect
        dh = int(self.letter_box_info_list[-1].dh)
        dw = int(self.letter_box_info_list[-1].dw)
        origin_shape = self.letter_box_info_list[-1].origin_shape
        new_shape = self.letter_box_info_list[-1].new_shape
        if (dh == 0) and (dw == 0) and origin_shape == new_shape:
            return seg
        elif dh == 0 and dw != 0:
            seg = seg[:, :, dw:-dw] # a[0:-0] = []
        elif dw == 0 and dh != 0 : 
            seg = seg[:, dh:-dh, :]
        seg = np.where(seg, 1, 0).astype(np.uint8).transpose(1,2,0)
        seg = cv2.resize(seg, (origin_shape[1], origin_shape[0]), interpolation=cv2.INTER_LINEAR)
        if len(seg.shape) < 3:
            return seg[None,:,:]
        else:
            return seg.transpose(2,0,1)

    def add_single_record(self, image_id, category_id, bbox, score, in_format='xyxy', pred_masks = None):
        if self.enable_ltter_box == True:
        # unletter_box result
            if in_format=='xyxy':
                bbox[0] -= self.letter_box_info_list[-1].dw
                bbox[0] /= self.letter_box_info_list[-1].w_ratio

                bbox[1] -= self.letter_box_info_list[-1].dh
                bbox[1] /= self.letter_box_info_list[-1].h_ratio

                bbox[2] -= self.letter_box_info_list[-1].dw
                bbox[2] /= self.letter_box_info_list[-1].w_ratio

                bbox[3] -= self.letter_box_info_list[-1].dh
                bbox[3] /= self.letter_box_info_list[-1].h_ratio
                # bbox = [value/self.letter_box_info_list[-1].ratio for value in bbox]

        if in_format=='xyxy':
        # change xyxy to xywh
            bbox[2] = bbox[2] - bbox[0]
            bbox[3] = bbox[3] - bbox[1]
        else:
            assert False, "now only support xyxy format, please add code to support others format"
        
        def single_encode(x):
            from pycocotools.mask import encode
            rle = encode(np.asarray(x[:, :, None], order="F", dtype="uint8"))[0]
            rle["counts"] = rle["counts"].decode("utf-8")
            return rle

        if pred_masks is None:
            self.record_list.append({"image_id": image_id,
                                    "category_id": category_id,
                                    "bbox":[round(x, 3) for x in bbox],
                                    'score': round(score, 5),
                                    })
        else:
            rles = single_encode(pred_masks)
            self.record_list.append({"image_id": image_id,
                                    "category_id": category_id,
                                    "bbox":[round(x, 3) for x in bbox],
                                    'score': round(score, 5),
                                    'segmentation': rles,
                                    })
    
    def export_to_json(self, path):
        with open(path, 'w') as f:
            json.dump(self.record_list, f)



OBJ_THRESH = 0.25
NMS_THRESH = 0.45
MAX_DETECT = 300

# The follew two param is for mAP test
# OBJ_THRESH = 0.001
# NMS_THRESH = 0.65

IMG_SIZE = (640, 640)  # (width, height), such as (1280, 736)

CLASSES = ("bolt","nut")

coco_id_list = [1, 2]

class Colors:
    # Ultralytics color palette https://ultralytics.com/
    def __init__(self):
        # hex = matplotlib.colors.TABLEAU_COLORS.values()
        hexs = ('FF3838', 'FF9D97', 'FF701F', 'FFB21D', 'CFD231', '48F90A', '92CC17', '3DDB86', '1A9334', '00D4BB',
                '2C99A8', '00C2FF', '344593', '6473FF', '0018EC', '8438FF', '520085', 'CB38FF', 'FF95C8', 'FF37C7')
        self.palette = [self.hex2rgb(f'#{c}') for c in hexs]
        self.n = len(self.palette)

    def __call__(self, i, bgr=False):
        c = self.palette[int(i) % self.n]
        return (c[2], c[1], c[0]) if bgr else c

    @staticmethod
    def hex2rgb(h):  # rgb order (PIL)
        return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def filter_boxes(boxes, box_confidences, box_class_probs, seg_part):
    """Filter boxes with object threshold.
    """
    box_confidences = box_confidences.reshape(-1)
    candidate, class_num = box_class_probs.shape

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)

    _class_pos = np.where(class_max_score * box_confidences >= OBJ_THRESH)
    scores = (class_max_score * box_confidences)[_class_pos]

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    seg_part = (seg_part * box_confidences.reshape(-1, 1))[_class_pos]

    return boxes, classes, scores, seg_part

def box_process(position, anchors):
    grid_h, grid_w = position.shape[2:4]
    col, row = np.meshgrid(np.arange(0, grid_w), np.arange(0, grid_h))
    col = col.reshape(1, 1, grid_h, grid_w)
    row = row.reshape(1, 1, grid_h, grid_w)
    grid = np.concatenate((col, row), axis=1)
    stride = np.array([IMG_SIZE[1]//grid_h, IMG_SIZE[0]//grid_w]).reshape(1,2,1,1)

    col = col.repeat(len(anchors), axis=0)
    row = row.repeat(len(anchors), axis=0)
    anchors = np.array(anchors)
    anchors = anchors.reshape(*anchors.shape, 1, 1)

    box_xy = position[:,:2,:,:]*2 - 0.5
    box_wh = pow(position[:,2:4,:,:]*2, 2) * anchors

    box_xy += grid
    box_xy *= stride
    box = np.concatenate((box_xy, box_wh), axis=1)

    # Convert [c_x, c_y, w, h] to [x1, y1, x2, y2]
    xyxy = np.copy(box)
    xyxy[:, 0, :, :] = box[:, 0, :, :] - box[:, 2, :, :]/ 2  # top left x
    xyxy[:, 1, :, :] = box[:, 1, :, :] - box[:, 3, :, :]/ 2  # top left y
    xyxy[:, 2, :, :] = box[:, 0, :, :] + box[:, 2, :, :]/ 2  # bottom right x
    xyxy[:, 3, :, :] = box[:, 1, :, :] + box[:, 3, :, :]/ 2  # bottom right y

    return xyxy

def post_process(input_data, anchors):
    # input_data[0], input_data[2], and input_data[4] are detection box information
    # input_data[1], input_data[3], and input_data[5] are segmentation information
    # input_data[6] is the proto information
    boxes, scores, classes_conf = [], [], []
    # 1*255*h*w -> 3*85*h*w
    detect_part = [input_data[i*2].reshape([len(anchors[0]), -1]+list(input_data[i*2].shape[-2:])) for i in range(len(anchors))]
    seg_part = [input_data[i*2+1].reshape([len(anchors[0]), -1]+list(input_data[i*2+1].shape[-2:])) for i in range(len(anchors))]
    proto = input_data[-1]
    for i in range(len(detect_part)):
        boxes.append(box_process(detect_part[i][:, :4, :, :], anchors[i]))
        scores.append(detect_part[i][:, 4:5, :, :])
        classes_conf.append(detect_part[i][:, 5:, :, :])

    def sp_flatten(_in):
        ch = _in.shape[1]
        _in = _in.transpose(0, 2, 3, 1)
        return _in.reshape(-1, ch)

    boxes = [sp_flatten(_v) for _v in boxes]
    classes_conf = [sp_flatten(_v) for _v in classes_conf]
    scores = [sp_flatten(_v) for _v in scores]
    seg_part = [sp_flatten(_v) for _v in seg_part]

    boxes = np.concatenate(boxes)
    classes_conf = np.concatenate(classes_conf)
    scores = np.concatenate(scores)
    seg_part = np.concatenate(seg_part)

    # filter according to threshold
    boxes, classes, scores, seg_part = filter_boxes(boxes, scores, classes_conf, seg_part)

    zipped = zip(boxes, classes, scores, seg_part)
    sort_zipped = sorted(zipped, key=lambda x: (x[2]), reverse=True)
    result = zip(*sort_zipped)

    max_nms = 30000
    n = boxes.shape[0]  # number of boxes
    if not n:
        return None, None, None, None
    elif n > max_nms:  # excess boxes
        boxes, classes, scores, seg_part = [np.array(x[:max_nms]) for x in result]
    else:
        boxes, classes, scores, seg_part = [np.array(x) for x in result]

    # nms
    nboxes, nclasses, nscores, nseg_part = [], [], [], []
    agnostic = 0
    max_wh = 7680
    c = classes * (0 if agnostic else max_wh)
    ids = torchvision.ops.nms(torch.tensor(boxes, dtype=torch.float32) + torch.tensor(c, dtype=torch.float32).unsqueeze(-1),
                              torch.tensor(scores, dtype=torch.float32), NMS_THRESH)
    real_keeps = ids.tolist()[:MAX_DETECT]
    nboxes.append(boxes[real_keeps])
    nclasses.append(classes[real_keeps])
    nscores.append(scores[real_keeps])
    nseg_part.append(seg_part[real_keeps])

    if not nclasses and not nscores:
        return None, None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)
    seg_part = np.concatenate(nseg_part)

    ph, pw = proto.shape[-2:]
    proto = proto.reshape(seg_part.shape[-1], -1)
    seg_img = np.matmul(seg_part, proto)
    seg_img = sigmoid(seg_img)
    seg_img = seg_img.reshape(-1, ph, pw)

    seg_threadhold = 0.5

    # crop seg outside box
    seg_img = F.interpolate(torch.tensor(seg_img)[None], torch.Size([640, 640]), mode='bilinear', align_corners=False)[0]
    seg_img_t = _crop_mask(seg_img,torch.tensor(boxes) )

    seg_img = seg_img_t.numpy()
    seg_img = seg_img > seg_threadhold
    return boxes, classes, scores, seg_img

def draw(image, boxes, scores, classes):
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = [int(_b) for _b in box]
        print("%s @ (%d %d %d %d) %.3f" % (CLASSES[cl], top, left, right, bottom, score))
        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

def _crop_mask(masks, boxes):
    """
    "Crop" predicted masks by zeroing out everything not in the predicted bbox.
    Vectorized by Chong (thanks Chong).

    Args:
        - masks should be a size [h, w, n] tensor of masks
        - boxes should be a size [n, 4] tensor of bbox coords in relative point form
    """

    n, h, w = masks.shape
    x1, y1, x2, y2 = torch.chunk(boxes[:, :, None], 4, 1)  # x1 shape(1,1,n)
    r = torch.arange(w, device=masks.device, dtype=x1.dtype)[None, None, :]  # rows shape(1,w,1)
    c = torch.arange(h, device=masks.device, dtype=x1.dtype)[None, :, None]  # cols shape(h,1,1)
    
    return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))

def merge_seg(image, seg_img, classes):
    color = Colors()
    for i in range(len(seg_img)):
        seg = seg_img[i]
        seg = seg.astype(np.uint8)
        seg = cv2.cvtColor(seg, cv2.COLOR_GRAY2BGR)
        seg = seg * color(classes[i])
        seg = seg.astype(np.uint8)
        image = cv2.add(image, seg)
    return image


	
def setup_model(args):
    model_path = args.model_path
    # 创建 RKNNLite 对象
    rknn = RKNNLite()  # <-- 确保缩进正确（4个空格或 1个 Tab）
    ret = rknn.load_rknn(model_path)  # <-- 使用 args.model_path 而不是 RKNN_MODEL
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')

    ret = rknn.init_runtime()
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')
    return rknn, args.target  # <-- 返回 rknn 和 args.target

def img_check(path):
    img_type = ['.jpg', '.jpeg', '.png', '.bmp']
    for _type in img_type:
        if path.endswith(_type) or path.endswith(_type.upper()):
            return True
    return False
	
	
def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process some integers.')
    # basic params
    parser.add_argument('--model_path', type=str, required= True, help='model path, could be .onnx or .rknn file')
    parser.add_argument('--target', type=str, default='rk3566', help='target RKNPU platform')
    parser.add_argument('--device_id', type=str, default=None, help='device id')
    
    parser.add_argument('--img_show', action='store_true', default=False, help='draw the result and show')
    parser.add_argument('--img_save', action='store_true', default=False, help='save the result')

    # data params
    parser.add_argument('--anno_json', type=str, default='../../../datasets/COCO/annotations/instances_val2017.json', help='coco annotation path')
    # coco val folder: '../../../datasets/COCO//val2017'
    parser.add_argument('--img_folder', type=str, default='../model', help='img folder path')
    parser.add_argument('--coco_map_test', action='store_true', help='enable coco mAP test')
    # This anchors_yolov5.txt is defined in the configuration file in the yolov5 official project. 
    # For example, one of the configuration file paths is <yolov5_root_path>/models/yolov5n.yaml. 
    # If you modify the anchors configuration when training the model, you need modify it to the corresponding value in this file.
    parser.add_argument('--anchors', type=str, default='../model/anchors_yolov5.txt', help='target to anchor file')

    args = parser.parse_args()

    # load anchor
    with open(args.anchors, 'r') as f:
        values = [float(_v) for _v in f.readlines()]
        anchors = np.array(values).reshape(3,-1,2).tolist()
    print("use anchors from '{}', which is {}".format(args.anchors, anchors))
    
    # init model
    model_path = args.model_path
    # 创建 RKNNLite 对象
    rknn = RKNNLite()  # <-- 确保缩进正确（4个空格或 1个 Tab）
    ret = rknn.load_rknn(model_path)  # <-- 使用 args.model_path 而不是 RKNN_MODEL
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')

    ret = rknn.init_runtime()
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')
    

    file_list = sorted(os.listdir(args.img_folder))
    img_list = []
    for path in file_list:
        if img_check(path):
            img_list.append(path)
    co_helper = COCO_test_helper(enable_letter_box=True)
	
    # run test
    for i in range(len(img_list)):
        print('infer {}/{}'.format(i+1, len(img_list)), end='\r')

        img_name = img_list[i]
        img_path = os.path.join(args.img_folder, img_name)
        if not os.path.exists(img_path):
            print("{} is not found", img_name)
            continue

        img_src = cv2.imread(img_path)
        if img_src is None:
            continue
        
        
        
        img = co_helper.letter_box(img_src.copy(), new_shape=(640, 640), info_need=False)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img2 = np.expand_dims(img, 0)
        outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])

        boxes, classes, scores, seg_img = post_process(outputs, anchors)

        if boxes is not None:
            real_boxs = co_helper.get_real_box(boxes)
            real_segs = co_helper.get_real_seg(seg_img)
            img_p = merge_seg(img_src, real_segs, classes)


        if args.img_show or args.img_save:
            print('\n\nIMG: {}'.format(img_name))
            
            draw(img_p, real_boxs, scores, classes)
            
            if args.img_save:
                if not os.path.exists('./result'):
                    os.mkdir('./result')
                result_path = os.path.join('./result', img_name)
                cv2.imwrite(result_path, img_p)
                print('Detection result save to {}'.format(result_path))
               
            if args.img_show:
                cv2.imshow("full post process result", img_p)
                cv2.waitKeyEx(0)

为了节省时间，仅仅训练的100轮次：
在这里插入图片描述

在这里插入图片描述

调取本地摄像头进行实时检测代码如下所示：

运行指令：

python yolov5_seg_gqr_cam.py  --model_path ./yolov5_seg.rknn

在这里插入图片描述

import os
import cv2
import sys
import argparse
import numpy as np
from pathlib import Path
from pycocotools.coco import COCO
from pycocotools.cocoeval import COCOeval
import torch.nn.functional as F
import torch
import torchvision

from rknnlite.api import RKNNLite
from copy import copy



class Letter_Box_Info():
    def __init__(self, shape, new_shape, w_ratio, h_ratio, dw, dh, pad_color) -> None:
        self.origin_shape = shape
        self.new_shape = new_shape
        self.w_ratio = w_ratio
        self.h_ratio = h_ratio
        self.dw = dw 
        self.dh = dh
        self.pad_color = pad_color

class COCO_test_helper():
    def __init__(self, enable_letter_box = False) -> None:
        self.record_list = []
        self.enable_ltter_box = enable_letter_box
        if self.enable_ltter_box is True:
            self.letter_box_info_list = []
        else:
            self.letter_box_info_list = None

    def letter_box(self, im, new_shape, pad_color=(0,0,0), info_need=False):
        # Resize and pad image while meeting stride-multiple constraints
        shape = im.shape[:2]  # current shape [height, width]
        if isinstance(new_shape, int):
            new_shape = (new_shape, new_shape)

        # Scale ratio
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

        # Compute padding
        ratio = r  # width, height ratios
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

        dw /= 2  # divide padding into 2 sides
        dh /= 2

        if shape[::-1] != new_unpad:  # resize
            im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
        im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=pad_color)  # add border
        
        if self.enable_ltter_box is True:
            self.letter_box_info_list.append(Letter_Box_Info(shape, new_shape, ratio, ratio, dw, dh, pad_color))
        if info_need is True:
            return im, ratio, (dw, dh)
        else:
            return im

    def direct_resize(self, im, new_shape, info_need=False):
        shape = im.shape[:2]
        h_ratio = new_shape[0]/ shape[0]
        w_ratio = new_shape[1]/ shape[1]
        if self.enable_ltter_box is True:
            self.letter_box_info_list.append(Letter_Box_Info(shape, new_shape, w_ratio, h_ratio, 0, 0, (0,0,0)))
        im = cv2.resize(im, (new_shape[1], new_shape[0]))
        return im

    def get_real_box(self, box, in_format='xyxy'):
        bbox = copy(box)
        if self.enable_ltter_box == True:
        # unletter_box result
            if in_format=='xyxy':
                bbox[:,0] -= self.letter_box_info_list[-1].dw
                bbox[:,0] /= self.letter_box_info_list[-1].w_ratio
                bbox[:,0] = np.clip(bbox[:,0], 0, self.letter_box_info_list[-1].origin_shape[1])

                bbox[:,1] -= self.letter_box_info_list[-1].dh
                bbox[:,1] /= self.letter_box_info_list[-1].h_ratio
                bbox[:,1] = np.clip(bbox[:,1], 0, self.letter_box_info_list[-1].origin_shape[0])

                bbox[:,2] -= self.letter_box_info_list[-1].dw
                bbox[:,2] /= self.letter_box_info_list[-1].w_ratio
                bbox[:,2] = np.clip(bbox[:,2], 0, self.letter_box_info_list[-1].origin_shape[1])

                bbox[:,3] -= self.letter_box_info_list[-1].dh
                bbox[:,3] /= self.letter_box_info_list[-1].h_ratio
                bbox[:,3] = np.clip(bbox[:,3], 0, self.letter_box_info_list[-1].origin_shape[0])
        return bbox

    def get_real_seg(self, seg):
        #! fix side effect
        dh = int(self.letter_box_info_list[-1].dh)
        dw = int(self.letter_box_info_list[-1].dw)
        origin_shape = self.letter_box_info_list[-1].origin_shape
        new_shape = self.letter_box_info_list[-1].new_shape
        if (dh == 0) and (dw == 0) and origin_shape == new_shape:
            return seg
        elif dh == 0 and dw != 0:
            seg = seg[:, :, dw:-dw] # a[0:-0] = []
        elif dw == 0 and dh != 0 : 
            seg = seg[:, dh:-dh, :]
        seg = np.where(seg, 1, 0).astype(np.uint8).transpose(1,2,0)
        seg = cv2.resize(seg, (origin_shape[1], origin_shape[0]), interpolation=cv2.INTER_LINEAR)
        if len(seg.shape) < 3:
            return seg[None,:,:]
        else:
            return seg.transpose(2,0,1)

    def add_single_record(self, image_id, category_id, bbox, score, in_format='xyxy', pred_masks = None):
        if self.enable_ltter_box == True:
        # unletter_box result
            if in_format=='xyxy':
                bbox[0] -= self.letter_box_info_list[-1].dw
                bbox[0] /= self.letter_box_info_list[-1].w_ratio

                bbox[1] -= self.letter_box_info_list[-1].dh
                bbox[1] /= self.letter_box_info_list[-1].h_ratio

                bbox[2] -= self.letter_box_info_list[-1].dw
                bbox[2] /= self.letter_box_info_list[-1].w_ratio

                bbox[3] -= self.letter_box_info_list[-1].dh
                bbox[3] /= self.letter_box_info_list[-1].h_ratio
                # bbox = [value/self.letter_box_info_list[-1].ratio for value in bbox]

        if in_format=='xyxy':
        # change xyxy to xywh
            bbox[2] = bbox[2] - bbox[0]
            bbox[3] = bbox[3] - bbox[1]
        else:
            assert False, "now only support xyxy format, please add code to support others format"
        
        def single_encode(x):
            from pycocotools.mask import encode
            rle = encode(np.asarray(x[:, :, None], order="F", dtype="uint8"))[0]
            rle["counts"] = rle["counts"].decode("utf-8")
            return rle

        if pred_masks is None:
            self.record_list.append({"image_id": image_id,
                                    "category_id": category_id,
                                    "bbox":[round(x, 3) for x in bbox],
                                    'score': round(score, 5),
                                    })
        else:
            rles = single_encode(pred_masks)
            self.record_list.append({"image_id": image_id,
                                    "category_id": category_id,
                                    "bbox":[round(x, 3) for x in bbox],
                                    'score': round(score, 5),
                                    'segmentation': rles,
                                    })
    
    def export_to_json(self, path):
        with open(path, 'w') as f:
            json.dump(self.record_list, f)



OBJ_THRESH = 0.60
NMS_THRESH = 0.45
MAX_DETECT = 300

# The follew two param is for mAP test
# OBJ_THRESH = 0.001
# NMS_THRESH = 0.65

IMG_SIZE = (640, 640)  # (width, height), such as (1280, 736)

CLASSES = ("bolt","nut")

coco_id_list = [1, 2]

class Colors:
    # Ultralytics color palette https://ultralytics.com/
    def __init__(self):
        # hex = matplotlib.colors.TABLEAU_COLORS.values()
        hexs = ('FF3838', 'FF9D97', 'FF701F', 'FFB21D', 'CFD231', '48F90A', '92CC17', '3DDB86', '1A9334', '00D4BB',
                '2C99A8', '00C2FF', '344593', '6473FF', '0018EC', '8438FF', '520085', 'CB38FF', 'FF95C8', 'FF37C7')
        self.palette = [self.hex2rgb(f'#{c}') for c in hexs]
        self.n = len(self.palette)

    def __call__(self, i, bgr=False):
        c = self.palette[int(i) % self.n]
        return (c[2], c[1], c[0]) if bgr else c

    @staticmethod
    def hex2rgb(h):  # rgb order (PIL)
        return tuple(int(h[1 + i:1 + i + 2], 16) for i in (0, 2, 4))

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

def filter_boxes(boxes, box_confidences, box_class_probs, seg_part):
    """Filter boxes with object threshold.
    """
    box_confidences = box_confidences.reshape(-1)
    candidate, class_num = box_class_probs.shape

    class_max_score = np.max(box_class_probs, axis=-1)
    classes = np.argmax(box_class_probs, axis=-1)

    _class_pos = np.where(class_max_score * box_confidences >= OBJ_THRESH)
    scores = (class_max_score * box_confidences)[_class_pos]

    boxes = boxes[_class_pos]
    classes = classes[_class_pos]
    seg_part = (seg_part * box_confidences.reshape(-1, 1))[_class_pos]

    return boxes, classes, scores, seg_part

def box_process(position, anchors):
    grid_h, grid_w = position.shape[2:4]
    col, row = np.meshgrid(np.arange(0, grid_w), np.arange(0, grid_h))
    col = col.reshape(1, 1, grid_h, grid_w)
    row = row.reshape(1, 1, grid_h, grid_w)
    grid = np.concatenate((col, row), axis=1)
    stride = np.array([IMG_SIZE[1]//grid_h, IMG_SIZE[0]//grid_w]).reshape(1,2,1,1)

    col = col.repeat(len(anchors), axis=0)
    row = row.repeat(len(anchors), axis=0)
    anchors = np.array(anchors)
    anchors = anchors.reshape(*anchors.shape, 1, 1)

    box_xy = position[:,:2,:,:]*2 - 0.5
    box_wh = pow(position[:,2:4,:,:]*2, 2) * anchors

    box_xy += grid
    box_xy *= stride
    box = np.concatenate((box_xy, box_wh), axis=1)

    # Convert [c_x, c_y, w, h] to [x1, y1, x2, y2]
    xyxy = np.copy(box)
    xyxy[:, 0, :, :] = box[:, 0, :, :] - box[:, 2, :, :]/ 2  # top left x
    xyxy[:, 1, :, :] = box[:, 1, :, :] - box[:, 3, :, :]/ 2  # top left y
    xyxy[:, 2, :, :] = box[:, 0, :, :] + box[:, 2, :, :]/ 2  # bottom right x
    xyxy[:, 3, :, :] = box[:, 1, :, :] + box[:, 3, :, :]/ 2  # bottom right y

    return xyxy

def post_process(input_data, anchors):
    # input_data[0], input_data[2], and input_data[4] are detection box information
    # input_data[1], input_data[3], and input_data[5] are segmentation information
    # input_data[6] is the proto information
    boxes, scores, classes_conf = [], [], []
    # 1*255*h*w -> 3*85*h*w
    detect_part = [input_data[i*2].reshape([len(anchors[0]), -1]+list(input_data[i*2].shape[-2:])) for i in range(len(anchors))]
    seg_part = [input_data[i*2+1].reshape([len(anchors[0]), -1]+list(input_data[i*2+1].shape[-2:])) for i in range(len(anchors))]
    proto = input_data[-1]
    for i in range(len(detect_part)):
        boxes.append(box_process(detect_part[i][:, :4, :, :], anchors[i]))
        scores.append(detect_part[i][:, 4:5, :, :])
        classes_conf.append(detect_part[i][:, 5:, :, :])

    def sp_flatten(_in):
        ch = _in.shape[1]
        _in = _in.transpose(0, 2, 3, 1)
        return _in.reshape(-1, ch)

    boxes = [sp_flatten(_v) for _v in boxes]
    classes_conf = [sp_flatten(_v) for _v in classes_conf]
    scores = [sp_flatten(_v) for _v in scores]
    seg_part = [sp_flatten(_v) for _v in seg_part]

    boxes = np.concatenate(boxes)
    classes_conf = np.concatenate(classes_conf)
    scores = np.concatenate(scores)
    seg_part = np.concatenate(seg_part)

    # filter according to threshold
    boxes, classes, scores, seg_part = filter_boxes(boxes, scores, classes_conf, seg_part)

    zipped = zip(boxes, classes, scores, seg_part)
    sort_zipped = sorted(zipped, key=lambda x: (x[2]), reverse=True)
    result = zip(*sort_zipped)

    max_nms = 30000
    n = boxes.shape[0]  # number of boxes
    if not n:
        return None, None, None, None
    elif n > max_nms:  # excess boxes
        boxes, classes, scores, seg_part = [np.array(x[:max_nms]) for x in result]
    else:
        boxes, classes, scores, seg_part = [np.array(x) for x in result]

    # nms
    nboxes, nclasses, nscores, nseg_part = [], [], [], []
    agnostic = 0
    max_wh = 7680
    c = classes * (0 if agnostic else max_wh)
    ids = torchvision.ops.nms(torch.tensor(boxes, dtype=torch.float32) + torch.tensor(c, dtype=torch.float32).unsqueeze(-1),
                              torch.tensor(scores, dtype=torch.float32), NMS_THRESH)
    real_keeps = ids.tolist()[:MAX_DETECT]
    nboxes.append(boxes[real_keeps])
    nclasses.append(classes[real_keeps])
    nscores.append(scores[real_keeps])
    nseg_part.append(seg_part[real_keeps])

    if not nclasses and not nscores:
        return None, None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)
    seg_part = np.concatenate(nseg_part)

    ph, pw = proto.shape[-2:]
    proto = proto.reshape(seg_part.shape[-1], -1)
    seg_img = np.matmul(seg_part, proto)
    seg_img = sigmoid(seg_img)
    seg_img = seg_img.reshape(-1, ph, pw)

    seg_threadhold = 0.5

    # crop seg outside box
    seg_img = F.interpolate(torch.tensor(seg_img)[None], torch.Size([640, 640]), mode='bilinear', align_corners=False)[0]
    seg_img_t = _crop_mask(seg_img,torch.tensor(boxes) )

    seg_img = seg_img_t.numpy()
    seg_img = seg_img > seg_threadhold
    return boxes, classes, scores, seg_img

def draw(image, boxes, scores, classes):
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = [int(_b) for _b in box]
        print("%s @ (%d %d %d %d) %.3f" % (CLASSES[cl], top, left, right, bottom, score))
        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6), cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

def _crop_mask(masks, boxes):
    """
    "Crop" predicted masks by zeroing out everything not in the predicted bbox.
    Vectorized by Chong (thanks Chong).

    Args:
        - masks should be a size [h, w, n] tensor of masks
        - boxes should be a size [n, 4] tensor of bbox coords in relative point form
    """

    n, h, w = masks.shape
    x1, y1, x2, y2 = torch.chunk(boxes[:, :, None], 4, 1)  # x1 shape(1,1,n)
    r = torch.arange(w, device=masks.device, dtype=x1.dtype)[None, None, :]  # rows shape(1,w,1)
    c = torch.arange(h, device=masks.device, dtype=x1.dtype)[None, :, None]  # cols shape(h,1,1)
    
    return masks * ((r >= x1) * (r < x2) * (c >= y1) * (c < y2))

def merge_seg(image, seg_img, classes):
    color = Colors()
    for i in range(len(seg_img)):
        seg = seg_img[i]
        seg = seg.astype(np.uint8)
        seg = cv2.cvtColor(seg, cv2.COLOR_GRAY2BGR)
        seg = seg * color(classes[i])
        seg = seg.astype(np.uint8)
        image = cv2.add(image, seg)
    return image



if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Process some integers.')
    # basic params
    parser.add_argument('--model_path', type=str, required= True, help='model path, could be .onnx or .rknn file')
    parser.add_argument('--target', type=str, default='rk3566', help='target RKNPU platform')
    
    # This anchors_yolov5.txt is defined in the configuration file in the yolov5 official project. 
    # For example, one of the configuration file paths is <yolov5_root_path>/models/yolov5n.yaml. 
    # If you modify the anchors configuration when training the model, you need modify it to the corresponding value in this file.
    parser.add_argument('--anchors', type=str, default='../model/anchors_yolov5.txt', help='target to anchor file')

    args = parser.parse_args()

    # load anchor
    with open(args.anchors, 'r') as f:
        values = [float(_v) for _v in f.readlines()]
        anchors = np.array(values).reshape(3,-1,2).tolist()
    print("use anchors from '{}', which is {}".format(args.anchors, anchors))
    
    # init model
    model_path = args.model_path
    # 创建 RKNNLite 对象
    rknn = RKNNLite()  # <-- 确保缩进正确（4个空格或 1个 Tab）
    ret = rknn.load_rknn(model_path)  # <-- 使用 args.model_path 而不是 RKNN_MODEL
    if ret != 0:
        print('Load RKNN model failed')
        exit(ret)
    print('done')

    ret = rknn.init_runtime()
    if ret != 0:
        print('Init runtime environment failed!')
        exit(ret)
    print('done')
    

    co_helper = COCO_test_helper(enable_letter_box=True)

    cap = cv2.VideoCapture('/dev/video20', cv2.CAP_V4L2)
    # 手动设置分辨率
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, 640)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, 640)

    if not cap.isOpened():
        print("打开摄像头失败")
        exit()

    print("按 q 退出")
    import time
    # 计算FPS
    frame_count=0
    fps=0
    start_time=time.time()


    while True:
        ret, img_src = cap.read()
        if not ret:
            print("读取帧失败")
            break
        
        # 计算FPS
        frame_count+=1
        if frame_count >=10:
            end_time=time.time()
            fps=frame_count / (end_time-start_time)
            frame_count=0
            start_time=end_time
	
    
        
        img = co_helper.letter_box(img_src.copy(), new_shape=(640, 640), info_need=False)
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img2 = np.expand_dims(img, 0)
        outputs = rknn.inference(inputs=[img2], data_format=['nhwc'])

        boxes, classes, scores, seg_img = post_process(outputs, anchors)
        
        
        img_p=img_src
        if boxes is not None:
            real_boxs = co_helper.get_real_box(boxes)
            real_segs = co_helper.get_real_seg(seg_img)
            img_p = merge_seg(img_src, real_segs, classes)
            
            draw(img_p, real_boxs, scores, classes)

      
            
           # 绘制FPS
            cv2.putText(img_p, f"FPS: {fps:.2f}", 
                       (img_p.shape[1] - 150, 30), 
                       cv2.FONT_HERSHEY_SIMPLEX, 
                       0.7, (0, 255, 0), 2)
        #img_1=cv2.resize(img_1,(800,600))
        cv2.imshow("My Detection Window", img_p)


        if cv2.waitKey(1) & 0xFF == ord('q'):
            break


    cap.release()
    cv2.destroyAllWindows()

    rknn.release()