PSC-Net: Learning Part Spatial Co-occurrence for Occluded Pedestrian Detection 详解
原文
Abstract:Detecting pedestrians,especially under heavy occlusions, is a challenging computer vision problem with numerous real-world applications. This paper introduces a novel approach, termed as PSC-Net, for occluded pedestrian detection. The proposed PSC-Net contains a dedicated module that is designed to explicitly capture both inter and intra-part co-occurrence
information of different pedestrian body parts through a Graph Convolutional Network (GCN). Both inter and intra-part co- occurrence information contribute towards improving the feature representation for handling varying level of occlusions, ranging from partial to severe occlusions. Our PSC-Net exploits the topological structure of pedestrian and does not require part- based annotations or additional visible bounding-box (VBB) information to learn part spatial co-occurrence. Comprehensive experiments are performed on two challenging datasets: CityPer- sons and Caltech datasets. The proposed PSC-Net achieves state- of-the-art detection performance on both. On the heavy occluded (HO) set of CityPerosns test set, our PSC-Net obtains an absolute gain of 4.0% in terms of log-average miss rate over the state-of- the-art [34] with same backbone, input scale and without using additional VBB supervision. Further, PSC-Net improves the state- of-the-art [54] from 37.9 to 34.8 in terms of log-average miss rate on Caltech (HO) test set.
I. INTRODUCTION
PEDESTRIAN detection is a challenging problem in computer vision with various real-application applications, e.g., robotics, autonomous driving and visual surveillance. Recent years have witnessed significant progress in the field of pedestrian detection, mainly due to the advances in deep convolutional neural networks (CNNs). Modern pedestrian detection methods can be generally divided into single-stage [22], [30], [3] and two-stage [20], [57], [54], [34], [52], [53], [4], [46], [24]. Single-stage pedestrian detectors typically work by directly regressing the default anchors into pedestrian detection boxes. Different to single-stage pedestrian detectors,
two-stage methods first produce a set of candidate pedestrian proposals which is followed by classification and regression of these pedestrian proposals. Most existing two-stage pedestrian detectors [53], [34], [20], [52], [46], [24] are base on the popular Faster R-CNN detection framework [37] that is adapted from generic object detection. Existing pedestrian
detectors typically assume entirely visible pedestrians when trained using full body pedestrian annotations.
在计算机视觉中,行人检测是一个具有挑战性的问题,在各种实际应用中,例如。机器人技术、自动驾驶和视觉监控。近年来,由于深度卷积神经网络(CNN)的发展,行人检测领域取得了重大进展。现代行人检测方法一般可分为单阶段[22]、[30]、[3]和两阶段[20]、[57]、[54]、[34]、[52]、[53]、[4]、[46]、[24]。单阶段行人检测器通常通过将默认anchor直接回归到行人检测框中来工作。与单阶段行人检测器不同的是,两阶段方法首先生成一组候选行人方案,然后对这些行人方案进行分类和回归。大多数现有的双阶段行人检测方法是[53]、[34]、[20]、[52]、[46]、[24]基于流行的快速R-CNN检测框架[37],该框架采用自通用对象检测。当使用行人全身作为注释进行训练时,现有行人检测器通常假定行人完全可见。
While promising results have been achieved by existing pedestrian detectors on standard non-occluded pedestrians, their performance on heavily occluded pedestrians is far from satisfactory. This is evident from the fact that the best reported performance [34] on the reasonable ® set (where visibility ratio is larger than 65%) of CityPersons test set [51] is 9.3 (log- average miss rate) whereas it is 41.0 on the heavy occluded
(HO) set (where visibility ratio ranges from 20% to 65%) of the same dataset . Handling pedestrian occlusion is an open problem in computer vision and present a great challenge for detecting pedestrians in real-world applications due to its frequent occurrence. Therefore, a pedestrian detector is desired to be accurate with respect to varying level of occlusions, ranging
from reasonably occluded to severely occluded pedestrians.
虽然现有的行人检测器在标准的非遮挡行人上取得了令人满意的结果,但它们在严重遮挡行人上的性能远远不尽人意。这一点从以下实验结果中可以明显看出:在City Persons测试集[51]的 reasonable ® set (可见性比率大于65%)上的最佳报告性能[34]为9.3(对数平均未命中率),而在同一数据集的严重遮挡(HO)集(可见性比率从20%到65%)上的最佳报告性能[34]为41.0(对数平均未命中率)。行人遮挡的处理是计算机视觉中的一个开放性问题,由于其频繁发生,在实际应用中对行人检测提出了很大的挑战。因此,需要有更加精确的行人检测方法来处理对于不同程度的遮挡问题(从合理的遮挡到严重的遮挡的行人检测)。
II. RELATED WORK
Convolutional neural networks (CNNs) have significantly advanced the state-of-the-art in numerous computer vision applications, such as image classification [12], [41], [39], [43], object detection [37], [33], [45], [29], [21], object counting [48], [9], [8], [17], image retrieval [35], [36], [23], [49],action recognition [40], [38], [19], [28], and pedestrian detection [52], [34], [53], [46]. State-of-the-art pedestrian detection methods can generally be divided into single-stage and twostage methods. Next, we present a brief overview of two-stage pedestrian detection methods.
卷积神经网络(CNN)在许多计算机视觉应用领域,如图像分类[12]、[41]、[39]、[43]、目标检测[37]、[33]、[45]、[29]、[21]、目标计数[48]、[9]、[8]、[17]、图像检索[35]、[36]、[23]、[49]、行为识别[40]、[38]、[19]、[28],行人检测[52]、[34]、[53]、[46]。最先进的行人检测方法一般可分为单阶段和两阶段方法。接下来,我们简要介绍两阶段行人检测方法。
Two-stage Deep Pedestrian Detection: In recent years, twostage pedestrian detection approaches [51], [34], [57], [54], [52], [20], [53], [46] have shown superior performance on standard pedestrian benchmarks. Generally, in two-stage pedestrian detectors, a set of candidate pedestrian proposals is first generated. Then, these candidate object proposals are classified and regressed. Zhang et al., [51] propose key adaptations in the popular Faster R-CNN [37] for pedestrian detection. The work of [46] propose an approach based on a bounding-box regression loss designed for crowded scenes. The work of [52] investigate several attention strategies, e.g., channel, part and visible bounding-box, for pedestrian detection. The work of [5] introduce a multi-scale pedestrian detection approach with layers having receptive fields similar to object scales. Zhang et al., [53] propose a loss formulation that enforces candidate proposals to be close to the corresponding objects and integrates structural information with visibility predictions. The work of [3] propose a multi-phase autoregressive