没有合适的资源?快使用搜索试试~ 我知道了~
基于连续域卷积操作跟踪(C-COT)算法的跟踪系统
需积分: 50 6 下载量 196 浏览量
2019-04-11
16:13:35
上传
评论
收藏 1.74MB PDF 举报
温馨提示
在机器视觉领域里,目标跟踪是一个非常基本的问题,并且在众多应用领域(如监控安防系统、无人机系统、无人驾驶系统、智能交通管制系统、人机交互系统,在线视觉跟踪系统)中,扮演着十分重要的角色。目标跟踪的基本原理是:给定目标物初始位置,结合一系列连续的图像帧,估算出目标物的运动轨迹。在线视觉跟踪的“在线”特性决定,即使机器视觉对实时计算的约束条件异常复杂,理想的跟踪策略都应该保证系统的“精确性”和“鲁棒性”。
资源推荐
资源详情
资源评论





























Beyond Correlation Filters: Learning Continuous
Convolution Operators for Visual Tracking
Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, Michael Felsberg
CVL, Department of Electrical Engineering, Link¨oping University, Sweden
{martin.danelljan, andreas.robinson, fahad.khan, michael.felsberg}@liu.se
Abstract. Discriminative Correlation Filters (DCF) have demonstrated
excellent performance for visual object tracking. The key to their success
is the ability to efficiently exploit available negative data by including
all shifted versions of a training sample. However, the underlying DCF
formulation is restricted to single-resolution feature maps, significantly
limiting its potential. In this paper, we go beyond the conventional DCF
framework and introduce a novel formulation for training continuous
convolution filters. We employ an implicit interpolation model to pose
the learning problem in the continuous spatial domain. Our proposed
formulation enables efficient integration of multi-resolution deep feature
maps, leading to superior results on three object tracking benchmarks:
OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP),
and VOT2015 (20% relative reduction in failure rate). Additionally, our
approach is capable of sub-pixel localization, crucial for the task of ac-
curate feature point tracking. We also demonstrate the effectiveness of
our learning formulation in extensive feature point tracking experiments.
Code and supplementary material are available at http://www.cvl.isy.
liu.se/research/objrec/visualtracking/conttrack/index.html.
1 Introduction
Visual tracking is the task of estimating the trajectory of a target in a video.
It is one of the fundamental problems in computer vision. Tracking of objects
or feature points has numerous applications in robotics, structure-from-motion,
and visual surveillance. In recent years, Discriminative Correlation Filter (DCF)
based approaches have shown outstanding results on object tracking benchmarks
[30,46]. DCF methods train a correlation filter for the task of predicting the
target classification scores. Unlike other methods, the DCF efficiently utilize all
spatial shifts of the training samples by exploiting the discrete Fourier transform.
Deep convolutional neural networks (CNNs) have shown impressive perfor-
mance for many tasks, and are therefore of interest for DCF-based tracking. A
CNN consists of several layers of convolution, normalization and pooling opera-
tions. Recently, activations from the last convolutional layers have been success-
fully employed for image classification. Features from these deep convolutional
layers are discriminative while preserving spatial and structural information.
Surprisingly, in the context of tracking, recent DCF-based methods [10,35] have

2 Danelljan et al.
Multi-resolution deep
feature map
Learned continuous
convolution filters
Confidence scores
for each layer
Final continuous confidence
output function
Fig. 1. Visualization of our continuous convolution operator, applied to a multi-
resolution deep feature map. The feature map (left) consists of the input RGB patch
along with the first and last convolutional layer of a pre-trained deep network. The sec-
ond column visualizes the continuous convolution filters learned by our framework. The
resulting continuous convolution outputs for each layer (third column) are combined
into the final continuous confidence function (right ) of the target (green box).
demonstrated the importance of shallow convolutional layers. These layers pro-
vide higher spatial resolution, which is crucial for accurate target localization.
However, fusing multiple layers in a DCF framework is still an open problem.
The conventional DCF formulation is limited to a single-resolution feature
map. Therefore, all feature channels must have the same spatial resolution, as in
e.g. the HOG descriptor. This limitation prohibits joint fusion of multiple con-
volutional layers with different spatial resolutions. A straightforward strategy to
counter this restriction is to explicitly resample all feature channels to the same
common resolution. However, such a resampling strategy is both cumbersome,
adds redundant data and introduces artifacts. Instead, a principled approach for
integrating multi-resolution feature maps in the learning formulation is preferred.
In this work, we propose a novel formulation for learning a convolution opera-
tor in the continuous spatial domain. The proposed learning formulation employs
an implicit interpolation model of the training samples. Our approach learns a
set of convolution filters to produce a continuous-domain confidence map of the
target. This enables an elegant fusion of multi-resolution feature maps in a joint
learning formulation. Figure 1 shows a visualization of our continuous convolu-
tion operator, when integrating multi-resolution deep feature maps. We validate
the effectiveness of our approach on three object tracking benchmarks: OTB-
2015 [46], Temple-Color [32] and VOT2015 [29]. On the challenging OTB-2015
with 100 videos, our object tracking framework improves the state-of-the-art
from 77.3% to 82.4% in mean overlap precision.
In addition to multi-resolution fusion, our continuous domain learning for-
mulation enables accurate sub-pixel localization. This is achieved by labeling the
training samples with sub-pixel precise continuous confidence maps. Our formu-

Learning Continuous Convolution Operators for Visual Tracking 3
lation is therefore also suitable for accurate feature point tracking. Further, our
learning-based approach is discriminative and does not require explicit interpo-
lation of the image to achieve sub-pixel accuracy. We demonstrate the accuracy
and robustness of our approach by performing extensive feature point tracking
experiments on the popular MPI Sintel dataset [7].
2 Related Work
Discriminative Correlation Filters (DCF) [5,11,24] have shown promising results
for object tracking. These methods exploit the properties of circular correlation
for training a regressor in a sliding-window fashion. Initially, the DCF approaches
[5,23] were restricted to a single feature channel. The DCF framework was later
extended to multi-channel feature maps [4,13,17]. The multi-channel DCF allows
high-dimensional features, such as HOG and Color Names, to be incorporated for
improved tracking. In addition to the incorporation of multi-channel features,
the DCF framework has been significantly improved lately by, e.g., including
scale estimation [9,31], non-linear kernels [23,24], a long-term memory [36], and
by alleviating the periodic effects of circular convolution [11,15,18].
With the advent of deep CNNs, fully connected layers of the network have
been commonly employed for image representation [38,43]. Recently, the last
(deep) convolutional layers were shown to be more beneficial for image classifi-
cation [8,33]. On the other hand, the first (shallow) convolutional layer was shown
to be more suitable for visual tracking, compared to the deeper layers [10]. The
deep convolutional layers are discriminative and possess high-level visual infor-
mation. In contrast, the shallow layers contain low-level features at high spatial
resolution, beneficial for localization. Ma et al. [35] employed multiple convolu-
tional layers in a hierarchical ensemble of independent DCF trackers. Instead,
we propose a novel continuous formulation to fuse multiple convolutional layers
with different spatial resolutions in a joint learning framework.
Unlike object tracking, feature point tracking is the task of accurately es-
timating the motion of distinctive key-points. It is a core component in many
vision systems [1,27,39,48]. Most feature point tracking methods are derived from
the classic Kanade-Lucas-Tomasi (KLT) tracker [34,44]. The KLT tracker is a
generative method, that is based on minimizing the squared sum of differences
between two image patches. In the last decades, significant effort has been spent
on improving the KLT tracker [2,16]. In contrast, we propose a discriminative
learning based approach for feature point tracking.
Our approach: Our main contribution is a theoretical framework for learn-
ing discriminative convolution operators in the continuous spatial domain. Our
formulation has two major advantages compared to the conventional DCF frame-
work. Firstly, it allows a natural integration of multi-resolution feature maps, e.g.
combinations of convolutional layers or multi-resolution HOG and color features.
This property is especially desirable for object tracking, detection and action
recognition applications. Secondly, our continuous formulation enables accurate
sub-pixel localization, crucial in many feature point tracking problems.
剩余15页未读,继续阅读
资源评论


足球真的可以嘛
- 粉丝: 0
上传资源 快速赚钱
我的内容管理 展开
我的资源 快来上传第一个资源
我的收益
登录查看自己的收益我的积分 登录查看自己的积分
我的C币 登录后查看C币余额
我的收藏
我的下载
下载帮助


最新资源
- 六自由度系统集成设计(四)PPT课件.ppt
- 生产库存与项目管理概述.pptx
- Postcat -Typescript资源
- 工业机器人现场编程实训任务测量由机器人引导的工PPT课件.pptx
- 嵌入式系统开发的最佳实践指南
- 无刷直流电机BLDC神经网络PID控制:双闭环控制模型的研究与学习指南
- 我国医院人力资源管理的信息化初探.docx
- 车载卫星通信设备及操作简介.doc
- 最新中职技能大赛Flash动画试题.pdf
- 网络支付安全知识[最终版].pdf
- GoFlyAdmin(Go语言快速开发框架)-Go资源
- 算法设计与应用作业.doc
- 山东推进农业大数据运用实施方案.doc
- TCP-IP协议在vxWorks嵌入式平台上的实现.doc
- 杭州市权力阳光电子政务系统安全技术规范.docx
- 关于铁路应急通信综合传输系统设计探讨论文.doc
资源上传下载、课程学习等过程中有任何疑问或建议,欢迎提出宝贵意见哦~我们会及时处理!
点击此处反馈



安全验证
文档复制为VIP权益,开通VIP直接复制
