EfficientDet训练自己的数据集实现抽烟检测

最新推荐文章于 2025-05-22 08:51:29 发布

置顶 mind_programmonkey

最新推荐文章于 2025-05-22 08:51:29 发布

阅读量7.3k

点赞数 7

CC 4.0 BY-SA版权

分类专栏：人工智能机器学习深度学习项目篇文章标签：深度学习目标检测抽烟检测抽烟检测打电话检测 Detection Smoking 图像检测 efficientdet 深度学习

本文链接：https://blog.csdn.net/Mind_programmonkey/article/details/107286647

人工智能机器学习深度学习同时被 2 个专栏收录

76 篇文章

订阅专栏

项目篇

71 篇文章

订阅专栏

本文详细介绍使用EfficientDet对抽烟行为进行检测的过程，包括环境配置、数据集处理、网络结构解析及训练测试步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

哈哈，我又来了！！！再次立下flag，开学之后还是要保持更新频率！！！

本次用efficientdet来对抽烟检测，检测出是否抽烟。那么，老规矩，先上结果图！！！

那么，接下来，还是原先一套流程。走起！！！

一、环境配置

python==3.7.4
tensorflow-gpu==1.14.0
keras==2.2.4
numpy==1.17.4

本次，在租的gpu的机器上的。没办法，efficientnet这个网络占据显存太大了。本次机器带不动呀。

二、抽烟数据集

本次数据集是用labelme标注的，提供的json格式的数据集，但本次我们的voc格式的xml数据集，所以需要对json格式的数据进行转换。

图片:

标注的json格式数据：

转换后的xml格式

本次json转xml的源码如下：

# -*- coding: utf-8 -*-
"""
Created on Sun May 31 10:19:23 2020
@author: ywx
"""
import os
from typing import List, Any
import numpy as np
import codecs
import json
from glob import glob
import cv2
import shutil
from sklearn.model_selection import train_test_split
 
# 1.标签路径
labelme_path = "annotations/"
#原始labelme标注数据路径
saved_path = "VOC2007/"
# 保存路径
isUseTest=True#是否创建test集
# 2.创建要求文件夹
if not os.path.exists(saved_path + "Annotations"):
    os.makedirs(saved_path + "Annotations")
if not os.path.exists(saved_path + "JPEGImages/"):
    os.makedirs(saved_path + "JPEGImages/")
if not os.path.exists(saved_path + "ImageSets/Main/"):
    os.makedirs(saved_path + "ImageSets/Main/")
# 3.获取待处理文件
files = glob(labelme_path + "*.json")
files = [i.replace("\\","/").split("/")[-1].split(".json")[0] for i in files]
print(files)
# 4.读取标注信息并写入 xml
for json_file_ in files:
    json_filename = labelme_path + json_file_ + ".json"
    json_file = json.load(open(json_filename, "r", encoding="utf-8"))
    height, width, channels = cv2.imread('jpeg/' + json_file_ + ".jpg").shape
    with codecs.open(saved_path + "Annotations/" + json_file_ + ".xml", "w", "utf-8") as xml:
 
        xml.write('<annotation>\n')
        xml.write('\t<folder>' + 'WH_data' + '</folder>\n')
        xml.write('\t<filename>' + json_file_ + ".jpg" + '</filename>\n')
        xml.write('\t<source>\n')
        xml.write('\t\t<database>WH Data</database>\n')
        xml.write('\t\t<annotation>WH</annotation>\n')
        xml.write('\t\t<image>flickr</image>\n')
        xml.write('\t\t<flickrid>NULL</flickrid>\n')
        xml.write('\t</source>\n')
        xml.write('\t<owner>\n')
        xml.write('\t\t<flickrid>NULL</flickrid>\n')
        xml.write('\t\t<name>WH</name>\n')
        xml.write('\t</owner>\n')
        xml.write('\t<size>\n')
        xml.write('\t\t<width>' + str(width) + '</width>\n')
        xml.write('\t\t<height>' + str(height) + '</height>\n')
        xml.write('\t\t<depth>' + str(channels) + '</depth>\n')
        xml.write('\t</size>\n')
        xml.write('\t\t<segmented>0</segmented>\n')
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            labelName=multi["label"]
            xmin = min(points[:, 0])
            xmax = max(points[:, 0])
            ymin = min(points[:, 1])
            ymax = max(points[:, 1])
            label = multi["label"]
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                xml.write('\t<object>\n')
                xml.write('\t\t<name>' + labelName+ '</name>\n')
                xml.write('\t\t<pose>Unspecified</pose>\n')
                xml.write('\t\t<truncated>1</truncated>\n')
                xml.write('\t\t<difficult>0</difficult>\n')
                xml.write('\t\t<bndbox>\n')
                xml.write('\t\t\t<xmin>' + str(int(xmin)) + '</xmin>\n')
                xml.write('\t\t\t<ymin>' + str(int(ymin)) + '</ymin>\n')
                xml.write('\t\t\t<xmax>' + str(int(xmax)) + '</xmax>\n')
                xml.write('\t\t\t<ymax>' + str(int(ymax)) + '</ymax>\n')
                xml.write('\t\t</bndbox>\n')
                xml.write('\t</object>\n')
                print(json_filename, xmin, ymin, xmax, ymax, label)
        xml.write('</annotation>')
# 5.复制图片到 VOC2007/JPEGImages/下
image_files = glob("jpeg/" + "*.jpg")
print("copy image files to VOC007/JPEGImages/")
for image in image_files:
    shutil.copy(image, saved_path + "JPEGImages/")
# 6.split files for txt
txtsavepath = saved_path + "ImageSets/Main/"
ftrainval = open(txtsavepath + '/trainval.txt', 'w')
ftest = open(txtsavepath + '/test.txt', 'w')
ftrain = open(txtsavepath + '/train.txt', 'w')
fval = open(txtsavepath + '/val.txt', 'w')
total_files = glob("./VOC2007/Annotations/*.xml")
total_files = [i.replace("\\","/").split("/")[-1].split(".xml")[0] for i in total_files]
trainval_files=[]
test_files=[] 
if isUseTest:
    trainval_files, test_files = train_test_split(total_files, test_size=0.15, random_state=55) 
else: 
    trainval_files=total_files 
for file in trainval_files: 
    ftrainval.write(file + "\n") 
# split 
train_files, val_files = train_test_split(trainval_files, test_size=0.15, random_state=55) 
# train
for file in train_files: 
    ftrain.write(file + "\n") 
# val 
for file in val_files: 
    fval.write(file + "\n")
for file in test_files:
    print(file)
    ftest.write(file + "\n")
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

三、EfficientDet理论介绍

EfficientDet是基于Efficientnet的目标检测网络，所以需要先读懂Efficientnet，这里可以先去看我之前写的卷积神经网络发展史中有关于Efficientnet的介绍。

简短来说，EfficientNet是将图片的分辨率，网络的宽度，网络的深度这三者结合起来，通过α实现缩放模型，不同的α有不同的模型精度。

总的来说，efficientdet目标检测网络，是以efficientnet为主干网络，之后经过bifpn特征特征网络，之后再输出检测结果。

1.EfficientNet

EfficientNet主要由Efficient Blocks构成，在其中小残差边以及大残差边构成，并在其中添加了注意力模块。

def mb_conv_block(inputs, block_args, activation, drop_rate=None, prefix='', ):
    """Mobile Inverted Residual Bottleneck."""

    has_se = (block_args.se_ratio is not None) and (0 < block_args.se_ratio <= 1)
    bn_axis = 3 if backend.image_data_format() == 'channels_last' else 1

    # workaround over non working dropout with None in noise_shape in tf.keras
    Dropout = get_dropout(
        backend=backend,
        layers=layers,
        models=models,
        utils=keras_utils
    )

    # Expansion phase
    filters = block_args.input_filters * block_args.expand_ratio
    if block_args.expand_ratio != 1:
        x = layers.Conv2D(filters, 1,
                          padding='same',
                          use_bias=False,
                          kernel_initializer=CONV_KERNEL_INITIALIZER,
                          name=prefix + 'expand_conv')(inputs)
        x = layers.BatchNormalization(axis=bn_axis, name=prefix + 'expand_bn')(x)
        x = layers.Activation(activation, name=prefix + 'expand_activation')(x)
    else:
        x = inputs

    # Depthwise Convolution
    x = layers.DepthwiseConv2D(block_args.kernel_size,
                               strides=block_args.strides,
                               padding='same',
                               use_bias=False,
                               depthwise_initializer=CONV_KERNEL_INITIALIZER,
                               name=prefix + 'dwconv')(x)
    x = layers.BatchNormalization(axis=bn_axis, name=prefix + 'bn')(x)
    x = layers.Activation(activation, name=prefix + 'activation')(x)

    # Squeeze and Excitation phase
    if has_se:
        num_reduced_filters = max(1, int(
            block_args.input_filters * block_args.se_ratio
        ))
        se_tensor = layers.GlobalAveragePooling2D(name=prefix + 'se_squeeze')(x)

        target_shape = (1, 1, filters) if backend.image_data_format() == 'channels_last' else (filters, 1, 1)
        se_tensor = layers.Reshape(target_shape, name=prefix + 'se_reshape')(se_tensor)
        se_tensor = layers.Conv2D(num_reduced_filters, 1,
                                  activation=activation,
                                  padding='same',
                                  use_bias=True,
                                  kernel_initializer=CONV_KERNEL_INITIALIZER,
                                  name=prefix + 'se_reduce')(se_tensor)
        se_tensor = layers.Conv2D(filters, 1,
                                  activation='sigmoid',
                                  padding='same',
                                  use_bias=True,
                                  kernel_initializer=CONV_KERNEL_INITIALIZER,
                                  name=prefix + 'se_expand')(se_tensor)
        if backend.backend() == 'theano':
            # For the Theano backend, we have to explicitly make
            # the excitation weights broadcastable.
            pattern = ([True, True, True, False] if backend.image_data_format() == 'channels_last'
                       else [True, False, True, True])
            se_tensor = layers.Lambda(
                lambda x: backend.pattern_broadcast(x, pattern),
                name=prefix + 'se_broadcast')(se_tensor)
        x = layers.multiply([x, se_tensor], name=prefix + 'se_excite')

    # Output phase
    x = layers.Conv2D(block_args.output_filters, 1,
                      padding='same',
                      use_bias=False,
                      kernel_initializer=CONV_KERNEL_INITIALIZER,
                      name=prefix + 'project_conv')(x)
    x = layers.BatchNormalization(axis=bn_axis, name=prefix + 'project_bn')(x)
    if block_args.id_skip and all(
            s == 1 for s in block_args.strides
    ) and block_args.input_filters == block_args.output_filters:
        if drop_rate and (drop_rate > 0):
            x = Dropout(drop_rate,
                        noise_shape=(None, 1, 1, 1),
                        name=prefix + 'drop')(x)
        x = layers.add([x, inputs], name=prefix + 'add')

    return x

2.BiFPN

改进了FPN中的多尺度特征融合方式，提出了加权双向特征金字塔网络BiFPN。BiFPN 引入了一种自顶向下的路径，融合P3～P7的多尺度特征

BiFPN模块类似于FPN网络（特征金字塔网络），不过比FPN更复杂些。其主要是为了增强特征，提取更有代表性的特征。

下图展示一下FPN网络：

而这是BiFPN的网络图：

其中的一个BiFPN模块为：

四、训练过程

1.准备数据集

准备抽烟数据，使用VOC格式的数据进行训练

训练前将标签文件放在VOCdevkit文件夹下的VOC2007文件夹下的Annotation中。
训练前将图片文件放在VOCdevkit文件夹下的VOC2007文件夹下的JPEGImages中。
在训练前利用voc2efficientdet.py文件生成对应的txt。

VOCdevkit
   -VOC2007
   	├─ImageSets    # 存放数据集列表文件，由voc2yolo3.py文件生成
   	├─Annotations  # 存放数据集中图片文件
   	├─JPEGImages   # 存放图片标签，xml 格式
   	└─voc2yolo4.py # 用来生成数据集列表文件

2.运行生成EfficientDet所需的数据

再运行根目录voc_annotation.py，运行前需要将voc_annotation文件中classes改成你自己的classes。
每一行对应其图片位置及其真实框的位置