CVPR2021 行人检测重识别等论文,共 7 篇
拥挤人群计数
1. Cross-Modal Collaborative Representation Learning and a Large-Scale RGBT
Benchmark for Crowd Counting
Abstract:Crowd counting is a fundamental yet challenging task, which desires
rich information to generate pixel-wise crowd density maps. However, most
previous methods only used the limited information of RGB images and cannot
well discover potential pedestrians in unconstrained scenarios. In this work, we
find that incorporating optical and thermal information can greatly help to
recognize pedestrians. To promote future researches in this field, we introduce a
large-scale RGBT Crowd Counting (RGBT-CC) benchmark, which contains 2,030
pairs of RGB-thermal images with 138,389 annotated people. Furthermore, to
facilitate the multimodal crowd counting, we propose a cross-modal collaborative
representation learning framework, which consists of multiple modality-specific
branches, a modality-shared branch, and an Information Aggregation-Distribution
Module (IADM) to capture the complementary information of different modalities
fully. Specifically, our IADM incorporates two collaborative information transfers
to dynamically enhance the modality-shared and modality-specific representations
with a dual information propagation mechanism. Extensive experiments
conducted on the RGBT-CC benchmark demonstrate the effectiveness of our
framework for RGBT crowd counting. Moreover, the proposed approach is
universal for multimodal crowd counting and is also capable to achieve superior
performance on the ShanghaiTechRGBD [22] dataset. Finally, our source code
and benchmark have been released at
http://lingboliu.com/RGBT_Crowd_Counting.html.
2. Cross-View Cross-Scene Multi-View Crowd Counting
Abstract:Multi-view crowd counting has been previously proposed to utilize
multi-cameras to extend the field-of-view of a single camera, capturing more
people in the scene, and improve counting performance for occluded people or
those in low resolution. However, the current multi-view paradigm trains and tests
on the same single scene and camera-views, which limits its practical application.
In this paper, we propose a cross-view cross-scene (CVCS) multi-view crowd
counting paradigm, where the training and testing occur on different scenes with
arbitrary camera layouts. To dynamically handle the challenge of optimal view
fusion under scene and camera layout change and non-correspondence noise due
to camera calibration errors or erroneous features, we propose a CVCS model that
attentively selects and fuses multiple views together using camera layout
geometry, and a noise view regularization method to train the model to handle
non-correspondence errors. We also generate a large synthetic multi-camera crowd
counting dataset with a large number of scenes and camera views to capture many
possible variations, which avoids the difficulty of collecting and annotating such a
评论0