PSC-Net: Learning Part Spatial Co-occurrence for Occluded Pedestrian Detection 用于遮挡行人检测的部分空间共现网络翻译

最新推荐文章于 2022-07-05 12:03:51 发布

一个小呆苗

最新推荐文章于 2022-07-05 12:03:51 发布

阅读量849

点赞数 2

CC 4.0 BY-SA版权

分类专栏：目标检测算法

本文链接：https://blog.csdn.net/weixin_55775980/article/details/119380534

PSC-Net是一种新型的遮挡行人检测方法，通过图卷积网络捕获行人身体部位的内部和外部空间共现信息，无需额外的可见边界框注释。在CityPersons和Caltech数据集上的实验结果显示，PSC-Net在处理不同程度遮挡的行人检测上表现优越，尤其在严重遮挡行人上，相较于现有技术有显著提升。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

PSC-Net: Learning Part Spatial Co-occurrence for Occluded Pedestrian Detection 详解

原文
Abstract:Detecting pedestrians,especially under heavy occlusions, is a challenging computer vision problem with numerous real-world applications. This paper introduces a novel approach, termed as PSC-Net, for occluded pedestrian detection. The proposed PSC-Net contains a dedicated module that is designed to explicitly capture both inter and intra-part co-occurrence
information of different pedestrian body parts through a Graph Convolutional Network (GCN). Both inter and intra-part co- occurrence information contribute towards improving the feature representation for handling varying level of occlusions, ranging from partial to severe occlusions. Our PSC-Net exploits the topological structure of pedestrian and does not require part- based annotations or additional visible bounding-box (VBB) information to learn part spatial co-occurrence. Comprehensive experiments are performed on two challenging datasets: CityPer- sons and Caltech datasets. The proposed PSC-Net achieves state- of-the-art detection performance on both. On the heavy occluded (HO) set of CityPerosns test set, our PSC-Net obtains an absolute gain of 4.0% in terms of log-average miss rate over the state-of- the-art [34] with same backbone, input scale and without using additional VBB supervision. Further, PSC-Net improves the state- of-the-art [54] from 37.9 to 34.8 in terms of log-average miss rate on Caltech (HO) test set.

行人检测是一个具有挑战性的计算机视觉问题，在现实生活中有着广泛的应用。本文介绍了一种新的遮挡行人检测方法，称为PSC网络。本文提出的的PSC网络包含一个专用模块，该模块旨在通过图卷积网络（GCN）明确捕获不同行人身体部位的部件间和部件内共现信息。部分间和部分内共现信息都有助于改进特征表示，以处理从部分到严重的不同程度的遮挡问题。我们的PSC网络利用行人的拓扑结构，不需要基于零件的注释或附加可见边界框（VBB）信息来学习零件空间共生。在两个具有挑战性的数据集上进行了综合实验：City Persons和Caltech数据集。提出的PSC网络在这两方面都达到了最先进的检测性能。在City Persons测试集的重阻塞（HO）集上，我们的PSC网络在使用相同主干、输入规模且不使用额外 VBB 监控的情况下，在最新技术[34]的对数平均未命中率方面获得了4.0%的绝对增益。此外，就加州理工学院（HO）测试集的对数平均未命中率而言，PSC Net将最新水平[54]从37.9提高到34.8。

I. INTRODUCTION

PEDESTRIAN detection is a challenging problem in computer vision with various real-application applications, e.g., robotics, autonomous driving and visual surveillance. Recent years have witnessed significant progress in the field of pedestrian detection, mainly due to the advances in deep convolutional neural networks (CNNs). Modern pedestrian detection methods can be generally divided into single-stage [22], [30], [3] and two-stage [20], [57], [54], [34], [52], [53], [4], [46], [24]. Single-stage pedestrian detectors typically work by directly regressing the default anchors into pedestrian detection boxes. Different to single-stage pedestrian detectors,
two-stage methods first produce a set of candidate pedestrian proposals which is followed by classification and regression of these pedestrian proposals. Most existing two-stage pedestrian detectors [53], [34], [20], [52], [46], [24] are base on the popular Faster R-CNN detection framework [37] that is adapted from generic object detection. Existing pedestrian
detectors typically assume entirely visible pedestrians when trained using full body pedestrian annotations.

在计算机视觉中，行人检测是一个具有挑战性的问题，在各种实际应用中，例如。机器人技术、自动驾驶和视觉监控。近年来，由于深度卷积神经网络（CNN）的发展，行人检测领域取得了重大进展。现代行人检测方法一般可分为单阶段[22]、[30]、[3]和两阶段[20]、[57]、[54]、[34]、[52]、[53]、[4]、[46]、[24]。单阶段行人检测器通常通过将默认anchor直接回归到行人检测框中来工作。与单阶段行人检测器不同的是，两阶段方法首先生成一组候选行人方案，然后对这些行人方案进行分类和回归。大多数现有的双阶段行人检测方法是[53]、[34]、[20]、[52]、[46]、[24]基于流行的快速R-CNN检测框架[37]，该框架采用自通用对象检测。当使用行人全身作为注释进行训练时，现有行人检测器通常假定行人完全可见。

While promising results have been achieved by existing pedestrian detectors on standard non-occluded pedestrians, their performance on heavily occluded pedestrians is far from satisfactory. This is evident from the fact that the best reported performance [34] on the reasonable ® set (where visibility ratio is larger than 65%) of CityPersons test set [51] is 9.3 (log- average miss rate) whereas it is 41.0 on the heavy occluded
(HO) set (where visibility ratio ranges from 20% to 65%) of the same dataset . Handling pedestrian occlusion is an open problem in computer vision and present a great challenge for detecting pedestrians in real-world applications due to its frequent occurrence. Therefore, a pedestrian detector is desired to be accurate with respect to varying level of occlusions, ranging
from reasonably occluded to severely occluded pedestrians.

虽然现有的行人检测器在标准的非遮挡行人上取得了令人满意的结果，但它们在严重遮挡行人上的性能远远不尽人意。这一点从以下实验结果中可以明显看出：在City Persons测试集[51]的 reasonable ® set （可见性比率大于65%）上的最佳报告性能[34]为9.3（对数平均未命中率），而在同一数据集的严重遮挡（HO）集（可见性比率从20%到65%）上的最佳报告性能[34]为41.0（对数平均未命中率）。行人遮挡的处理是计算机视觉中的一个开放性问题，由于其频繁发生，在实际应用中对行人检测提出了很大的挑战。因此，需要有更加精确的行人检测方法来处理对于不同程度的遮挡问题（从合理的遮挡到严重的遮挡的行人检测）。

II. RELATED WORK

Convolutional neural networks (CNNs) have significantly advanced the state-of-the-art in numerous computer vision applications, such as image classification [12], [41], [39], [43], object detection [37], [33], [45], [29], [21], object counting [48], [9], [8], [17], image retrieval [35], [36], [23], [49],action recognition [40], [38], [19], [28], and pedestrian detection [52], [34], [53], [46]. State-of-the-art pedestrian detection methods can generally be divided into single-stage and twostage methods. Next, we present a brief overview of two-stage pedestrian detection methods.

卷积神经网络（CNN）在许多计算机视觉应用领域，如图像分类[12]、[41]、[39]、[43]、目标检测[37]、[33]、[45]、[29]、[21]、目标计数[48]、[9]、[8]、[17]、图像检索[35]、[36]、[23]、[49]、行为识别[40]、[38]、[19]、[28]，行人检测[52]、[34]、[53]、[46]。最先进的行人检测方法一般可分为单阶段和两阶段方法。接下来，我们简要介绍两阶段行人检测方法。

Two-stage Deep Pedestrian Detection: In recent years, twostage pedestrian detection approaches [51], [34], [57], [54], [52], [20], [53], [46] have shown superior performance on standard pedestrian benchmarks. Generally, in two-stage pedestrian detectors, a set of candidate pedestrian proposals is first generated. Then, these candidate object proposals are classified and regressed. Zhang et al., [51] propose key adaptations in the popular Faster R-CNN [37] for pedestrian detection. The work of [46] propose an approach based on a bounding-box regression loss designed for crowded scenes. The work of [52] investigate several attention strategies, e.g., channel, part and visible bounding-box, for pedestrian detection. The work of [5] introduce a multi-scale pedestrian detection approach with layers having receptive fields similar to object scales. Zhang et al., [53] propose a loss formulation that enforces candidate proposals to be close to the corresponding objects and integrates structural information with visibility predictions. The work of [3] propose a multi-phase autoregressive