YOLO使用

最新推荐文章于 2025-02-11 19:21:16 发布

学习小卡车

最新推荐文章于 2025-02-11 19:21:16 发布

阅读量1.9k

点赞数 1

CC 4.0 BY-SA版权

分类专栏：图像处理文章标签：深度学习

本文链接：https://blog.csdn.net/m0_37769093/article/details/120209730

图像处理专栏收录该内容

6 篇文章

订阅专栏

本文介绍了如何下载并利用预训练的yolov3模型进行物体检测，包括修改配置文件、单张图片和多张图片检测、摄像头与视频测试，以及从头训练的步骤。重点讲解了如何在opencv中集成GPU模块以提升性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

第一步

下载预训练模型。

cd darknet
wget https://pjreddie.com/media/files/yolov3.weights

第二步

修改网络配置文件。修改darknet/cfg/yolov3.cfg，如下，注释掉Training参数，取消注释Testing参数。
终端输入：

cd cfg
gedit yolov3.cfg

打开的文本中修改如下

#Testing
 batch=1
 subdivisions=1
#Training
#batch=64
#subdivisions=16

第三步基于 yolo.weights 模型参数来测试单张图片

单张图像检测。运行探测器指令，这是单张图片的测试命令

./darknet detect cfg/yolov3.cfg yolov3.weights data/dog.jpg

备注：上面的指令要在darknet文件夹路径下的终端运行。

cd darknet
make

编译成功后会生成一个darknet可执行文件，执行./darknet就可以运行。可以修改Makefile的参数，但是注意每次修改都要重新make一下。
指令的解释：
./darknet是执行当前文件下面已经编译好的darknet文件；
detect 是命令；
后面三个分别是参数；
参数cfg/yolov3.cfg表示网络模型；
参数yolov3.weights表示网络权重；
参数data/dog.jpg表示需要检测的图片。
上面的指令等同于下面的指令，一般使用上面的指令，更简洁。如果是训练情况下，则把下面命令中的test换为train。即detector train

./darknet detector test cfg/coco.data cfg/yolov3.cfg yolov3.weights data/dog.jpg

运行后会看到这样的输出：

layer     filters    size              input                output
    0 conv     32  3 x 3 / 1   608 x 608 x   3   ->   608 x 608 x  32  0.639 BFLOPs
    1 conv     64  3 x 3 / 2   608 x 608 x  32   ->   304 x 304 x  64  3.407 BFLOPs
    2 conv     32  1 x 1 / 1   304 x 304 x  64   ->   304 x 304 x  32  0.379 BFLOPs
    3 conv     64  3 x 3 / 1   304 x 304 x  32   ->   304 x 304 x  64  3.407 BFLOPs
    4 res    1                 304 x 304 x  64   ->   304 x 304 x  64
    5 conv    128  3 x 3 / 2   304 x 304 x  64   ->   152 x 152 x 128  3.407 BFLOPs
    6 conv     64  1 x 1 / 1   152 x 152 x 128   ->   152 x 152 x  64  0.379 BFLOPs
    7 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128  3.407 BFLOPs
    8 res    5                 152 x 152 x 128   ->   152 x 152 x 128
    9 conv     64  1 x 1 / 1   152 x 152 x 128   ->   152 x 152 x  64  0.379 BFLOPs
   10 conv    128  3 x 3 / 1   152 x 152 x  64   ->   152 x 152 x 128  3.407 BFLOPs
   11 res    8                 152 x 152 x 128   ->   152 x 152 x 128
   12 conv    256  3 x 3 / 2   152 x 152 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   13 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   14 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   15 res   12                  76 x  76 x 256   ->    76 x  76 x 256
   16 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   17 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   18 res   15                  76 x  76 x 256   ->    76 x  76 x 256
   19 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   20 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   21 res   18                  76 x  76 x 256   ->    76 x  76 x 256
   22 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   23 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   24 res   21                  76 x  76 x 256   ->    76 x  76 x 256
   25 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   26 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   27 res   24                  76 x  76 x 256   ->    76 x  76 x 256
   28 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   29 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   30 res   27                  76 x  76 x 256   ->    76 x  76 x 256
   31 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   32 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   33 res   30                  76 x  76 x 256   ->    76 x  76 x 256
   34 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
   35 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
   36 res   33                  76 x  76 x 256   ->    76 x  76 x 256
   37 conv    512  3 x 3 / 2    76 x  76 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   38 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   39 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   40 res   37                  38 x  38 x 512   ->    38 x  38 x 512
   41 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   42 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   43 res   40                  38 x  38 x 512   ->    38 x  38 x 512
   44 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   45 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   46 res   43                  38 x  38 x 512   ->    38 x  38 x 512
   47 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   48 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   49 res   46                  38 x  38 x 512   ->    38 x  38 x 512
   50 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   51 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   52 res   49                  38 x  38 x 512   ->    38 x  38 x 512
   53 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   54 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   55 res   52                  38 x  38 x 512   ->    38 x  38 x 512
   56 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   57 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   58 res   55                  38 x  38 x 512   ->    38 x  38 x 512
   59 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   60 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   61 res   58                  38 x  38 x 512   ->    38 x  38 x 512
   62 conv   1024  3 x 3 / 2    38 x  38 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   63 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   64 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   65 res   62                  19 x  19 x1024   ->    19 x  19 x1024
   66 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   67 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   68 res   65                  19 x  19 x1024   ->    19 x  19 x1024
   69 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   70 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   71 res   68                  19 x  19 x1024   ->    19 x  19 x1024
   72 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   73 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   74 res   71                  19 x  19 x1024   ->    19 x  19 x1024
   75 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   76 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   77 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   78 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   79 conv    512  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 512  0.379 BFLOPs
   80 conv   1024  3 x 3 / 1    19 x  19 x 512   ->    19 x  19 x1024  3.407 BFLOPs
   81 conv    255  1 x 1 / 1    19 x  19 x1024   ->    19 x  19 x 255  0.189 BFLOPs
   82 yolo
   83 route  79
   84 conv    256  1 x 1 / 1    19 x  19 x 512   ->    19 x  19 x 256  0.095 BFLOPs
   85 upsample            2x    19 x  19 x 256   ->    38 x  38 x 256
   86 route  85 61
   87 conv    256  1 x 1 / 1    38 x  38 x 768   ->    38 x  38 x 256  0.568 BFLOPs
   88 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   89 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   90 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   91 conv    256  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 256  0.379 BFLOPs
   92 conv    512  3 x 3 / 1    38 x  38 x 256   ->    38 x  38 x 512  3.407 BFLOPs
   93 conv    255  1 x 1 / 1    38 x  38 x 512   ->    38 x  38 x 255  0.377 BFLOPs
   94 yolo
   95 route  91
   96 conv    128  1 x 1 / 1    38 x  38 x 256   ->    38 x  38 x 128  0.095 BFLOPs
   97 upsample            2x    38 x  38 x 128   ->    76 x  76 x 128
   98 route  97 36
   99 conv    128  1 x 1 / 1    76 x  76 x 384   ->    76 x  76 x 128  0.568 BFLOPs
  100 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
  101 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
  102 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
  103 conv    128  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 128  0.379 BFLOPs
  104 conv    256  3 x 3 / 1    76 x  76 x 128   ->    76 x  76 x 256  3.407 BFLOPs
  105 conv    255  1 x 1 / 1    76 x  76 x 256   ->    76 x  76 x 255  0.754 BFLOPs
  106 yolo
Loading weights from yolov3.weights...Done!
data/dog.jpg: Predicted in 0.063622 seconds.
dog: 100%
truck: 92%
bicycle: 99%
Gtk-Message: 20:09:45.990: Failed to load module "canberra-gtk-module"

g++ -Iinclude/ -Isrc/ -DOPENCV `pkg-config --cflags opencv`  -DGPU -I/usr/local/cuda/include/ -DCUDNN  -Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -fPIC -Ofast -DOPENCV -DGPU -DCUDNN -c ./src/image_opencv.cpp -o obj/image_opencv.o
./src/image_opencv.cpp: In function ‘image mat_to_image(cv::Mat)’:
./src/image_opencv.cpp:63:20: error: conversion from ‘cv::Mat’ to non-scalar type ‘IplImage {aka _IplImage}’ requested
     IplImage ipl = m;
                    ^
compilation terminated due to -Wfatal-errors.
Makefile:86: recipe for target 'obj/image_opencv.o' failed
make: *** [obj/image_opencv.o] Error 1

第七步基于yolo.weights模型参数测试多张图片

不需要给定测试图片的路径，直接输入以下指令，然后程序会提示你输入测试图像路径，直到ctrl+c退出程序。

./darknet detect cfg/yolo.cfg yolo.weights

第八步

基于yolo.weights模型参数，使用“-thresh"参数控制显示的bounding-box个数，darknet默认只显示被检测的物体中confidence大于或者等于0.25的bounding-box，可以通过参数-thresh来改变，例如”-thresh 0"表示显示出所有的bounding-box。

第九步基于 yolo.weights 模型参数来测试摄像头

使用摄像头检测物体 / webcam检测，使用到cuda和opencv编译darknet。

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights

第十步基于 yolo.weights 模型参数来测试video

video检测，使用opencv检测视频。
终端输入：

./darknet detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights <video file>

第十一步 train yolo on voc
如果想使用不同的训练方案，可以从头开始训练yolo。这里先训练官网数据集。voc数据集或者coco数据集。
参照官网步骤

获取数据
指定文件夹后darknet/VOCdevkit，输入命令
wget https://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar
wget https://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar
wget https://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar
tar xf VOCtrainval_11-May-2012.tar
tar xf VOCtrainval_06-Nov-2007.tar
tar xf VOCtest_06-Nov-2007.tar
1
2
3
4
5
6
指令解释：
（1）wget命令用来从指定的URL下载文件。wget非常稳定，它在带宽很窄的情况下和不稳定网络中有很强的适应性，如果是由于网络的原因下载失败，wget会不断的尝试，直到整个文件下载完毕。如果是服务器打断下载过程，它会再次联到服务器上从停止的地方继续下载。这对从那些限定了链接时间的服务器上下载大文件非常有用。
（2）tar解压命令。-x：解压。-f: 使用档案名字，切记，这个参数是最后一个参数，后面只能接档案名。

上述命令是从作者的数据仓库中下载数据，可能比较慢。可从Pascal_voc的官网下载数据：2007数据集和2012数据集（在development kit中下载）

将三个解压后的文件，放在文件夹VOCdevkit中。

为图片创建.txt文件。在终端输入
wget https://pjreddie.com/media/files/voc_label.py
python3 voc_label.py
1
2
执行指令查看目前路径下的文件

ls
1
可以看到包括了如下几个文件

2007_test.txt， 2007_train.txt， 2007_val.txt， 2012_train.txt， 2012_val.txt
1
上面创建出的几个.txt文件，除了test.txt用于test，其他的都用于train。所以执行下面的指令用于把所有的2007 trainval and the 2012 trainval set放在一个大的list里面。
cat 2007_train.txt 2007_val.txt 2012_.txt > train.txt
1
指令解释：cat 命令用于连接文件并打印到标准输出设备上。指令作用是把2007_train.txt 和2007_val.txt 和2012_.txt的内容全部输出到 train.txt文本中。

修改配置文件cfg/voc.data。
将文本中的内容按照自己的数据对照修改，路径也要修改为自己的路径。
classes= 3 #修改为自己的类别数
train = /home/learner/darknet/data/voc/train.txt #修改为自己的路径
valid = /home/learner/darknet/data/voc/2007_test.txt #修改为自己的路径
names = /home/learner/darknet/data/voc.names #修改见voc.names文件
backup = /home/learner/darknet/backup #修改为自己的路径，输出的权重信息将存储这个文件内
1
2
3
4
5
修改voc.names文件。
head #改为自己需要探测的类别，一行一个
eye
nose
1
2
3
下载预训练卷积层权重文件，在前面测试单张图片时也下载过卷积层权重文件，但是这两个文件不一样。不用指令通过官网链接下载也可。
wget https://pjreddie.com/media/files/darknet53.conv.74
1
修改cfg/yolov3-voc.cfg文件，在前面训练单张图片时有修改过，这里可以不再修改。
[net]

Testing

batch=64
subdivisions=32 #每批训练的个数=batch/subvisions，根据GPU修改，显存不够值就大一些

Training

batch=64

subdivisions=16

……
learning_rate=0.001
burn_in=1000
max_batches = 50200 #训练步数
policy=steps
steps=40000,45000 #开始衰减的步数
scales=.1,.1

[convolutional]
…

[convolutional]
……

[yolo]
……
classes=3 #修改为自己的类别数
……

[route]
layers = -4

[convolutional]
……

[upsample]
stride=2

[route]
layers = -1, 61

[convolutional]
……

[yolo]
……
classes=3 #修改为自己的类别数
……

[route]
layers = -4

[convolutional]
……

[upsample]
stride=2

[route]
layers = -1, 36

[convolutional]
……

[yolo]
……
classes=3 #修改为自己的类别数
……
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
训练模型
#单GPU训练的指令，除此之外还有多GPU训练
./darknet detector train cfg/voc.data cfg/yolov3-voc.cfg darknet53.conv.74
1
2
第十二步 opencv的gpu模块的调用
在执行opencv的内置人脸检测级联分类器时，由于opencv默认使用的cpu，所以导致摄像头实时测试过程中出现卡顿，故需要换用opencv的gpu模块调用。
参考：参考资料
在编写代码之前第一件事就是连接gpu模块到项目中，包括模块的头文件，所有gpu的函数和数据结构都在以cv命名空间的gpu空间内，例如下面引入的是gpu的结构和方法。
#include <opencv2/gpu/gpu.hpp>
1
OpenCV提供的开发包中提供的库没有开启gpu和ocl模块功能，虽然有***gpu.lib/***gpu.dll文件，但不能用。如果调用gpu::getCudaEnableDeviceCount()将会return 0；要开启该功能需要重新编译opencv的库。需要在编译之前安装：CMake用于生成vs工程，Tbb, Qt(gui), cuda tool kit, python 等程序。
第十三步利用opencv和摄像头将捕获的视频转换为图片