基于连续域卷积操作跟踪（C-COT）算法的跟踪系统资源-CSDN下载

图像处理

需积分: 50 196 浏览量 2019-04-11 16:13:35 上传评论收藏 1.74MB PDF 举报

资源推荐

资源详情

资源评论

Beyond Correlation Filters: Learning Continuous

Convolution Operators for Visual Tracking

Martin Danelljan, Andreas Robinson, Fahad Shahbaz Khan, Michael Felsberg

CVL, Department of Electrical Engineering, Link¨oping University, Sweden

{martin.danelljan, andreas.robinson, fahad.khan, michael.felsberg}@liu.se

Abstract. Discriminative Correlation Filters (DCF) have demonstrated

excellent performance for visual object tracking. The key to their success

is the ability to eﬃciently exploit available negative data by including

all shifted versions of a training sample. However, the underlying DCF

formulation is restricted to single-resolution feature maps, signiﬁcantly

limiting its potential. In this paper, we go beyond the conventional DCF

framework and introduce a novel formulation for training continuous

convolution ﬁlters. We employ an implicit interpolation model to pose

the learning problem in the continuous spatial domain. Our proposed

formulation enables eﬃcient integration of multi-resolution deep feature

maps, leading to superior results on three object tracking benchmarks:

OTB-2015 (+5.1% in mean OP), Temple-Color (+4.6% in mean OP),

and VOT2015 (20% relative reduction in failure rate). Additionally, our

approach is capable of sub-pixel localization, crucial for the task of ac-

curate feature point tracking. We also demonstrate the eﬀectiveness of

our learning formulation in extensive feature point tracking experiments.

Code and supplementary material are available at http://www.cvl.isy.

liu.se/research/objrec/visualtracking/conttrack/index.html.

1 Introduction

Visual tracking is the task of estimating the trajectory of a target in a video.

It is one of the fundamental problems in computer vision. Tracking of objects

or feature points has numerous applications in robotics, structure-from-motion,

and visual surveillance. In recent years, Discriminative Correlation Filter (DCF)

based approaches have shown outstanding results on object tracking benchmarks

[30,46]. DCF methods train a correlation ﬁlter for the task of predicting the

target classiﬁcation scores. Unlike other methods, the DCF eﬃciently utilize all

spatial shifts of the training samples by exploiting the discrete Fourier transform.

Deep convolutional neural networks (CNNs) have shown impressive perfor-

mance for many tasks, and are therefore of interest for DCF-based tracking. A

CNN consists of several layers of convolution, normalization and pooling opera-

tions. Recently, activations from the last convolutional layers have been success-

fully employed for image classiﬁcation. Features from these deep convolutional

layers are discriminative while preserving spatial and structural information.

Surprisingly, in the context of tracking, recent DCF-based methods [10,35] have

2 Danelljan et al.

Multi-resolution deep

feature map

Learned continuous

convolution ﬁlters

Conﬁdence scores

for each layer

Final continuous conﬁdence

output function

Fig. 1. Visualization of our continuous convolution operator, applied to a multi-

resolution deep feature map. The feature map (left) consists of the input RGB patch

along with the ﬁrst and last convolutional layer of a pre-trained deep network. The sec-

ond column visualizes the continuous convolution ﬁlters learned by our framework. The

resulting continuous convolution outputs for each layer (third column) are combined

into the ﬁnal continuous conﬁdence function (right ) of the target (green box).

demonstrated the importance of shallow convolutional layers. These layers pro-

vide higher spatial resolution, which is crucial for accurate target localization.

However, fusing multiple layers in a DCF framework is still an open problem.

The conventional DCF formulation is limited to a single-resolution feature

map. Therefore, all feature channels must have the same spatial resolution, as in

e.g. the HOG descriptor. This limitation prohibits joint fusion of multiple con-

volutional layers with diﬀerent spatial resolutions. A straightforward strategy to

counter this restriction is to explicitly resample all feature channels to the same

common resolution. However, such a resampling strategy is both cumbersome,

adds redundant data and introduces artifacts. Instead, a principled approach for

integrating multi-resolution feature maps in the learning formulation is preferred.

In this work, we propose a novel formulation for learning a convolution opera-

tor in the continuous spatial domain. The proposed learning formulation employs

an implicit interpolation model of the training samples. Our approach learns a

set of convolution ﬁlters to produce a continuous-domain conﬁdence map of the

target. This enables an elegant fusion of multi-resolution feature maps in a joint

learning formulation. Figure 1 shows a visualization of our continuous convolu-

tion operator, when integrating multi-resolution deep feature maps. We validate

the eﬀectiveness of our approach on three object tracking benchmarks: OTB-

2015 [46], Temple-Color [32] and VOT2015 [29]. On the challenging OTB-2015

with 100 videos, our object tracking framework improves the state-of-the-art

from 77.3% to 82.4% in mean overlap precision.

In addition to multi-resolution fusion, our continuous domain learning for-

mulation enables accurate sub-pixel localization. This is achieved by labeling the

training samples with sub-pixel precise continuous conﬁdence maps. Our formu-

Learning Continuous Convolution Operators for Visual Tracking 3

lation is therefore also suitable for accurate feature point tracking. Further, our

learning-based approach is discriminative and does not require explicit interpo-

lation of the image to achieve sub-pixel accuracy. We demonstrate the accuracy

and robustness of our approach by performing extensive feature point tracking

experiments on the popular MPI Sintel dataset [7].

2 Related Work

Discriminative Correlation Filters (DCF) [5,11,24] have shown promising results

for object tracking. These methods exploit the properties of circular correlation

for training a regressor in a sliding-window fashion. Initially, the DCF approaches

[5,23] were restricted to a single feature channel. The DCF framework was later

extended to multi-channel feature maps [4,13,17]. The multi-channel DCF allows

high-dimensional features, such as HOG and Color Names, to be incorporated for

improved tracking. In addition to the incorporation of multi-channel features,

the DCF framework has been signiﬁcantly improved lately by, e.g., including

scale estimation [9,31], non-linear kernels [23,24], a long-term memory [36], and

by alleviating the periodic eﬀects of circular convolution [11,15,18].

With the advent of deep CNNs, fully connected layers of the network have

been commonly employed for image representation [38,43]. Recently, the last

(deep) convolutional layers were shown to be more beneﬁcial for image classiﬁ-

cation [8,33]. On the other hand, the ﬁrst (shallow) convolutional layer was shown

to be more suitable for visual tracking, compared to the deeper layers [10]. The

deep convolutional layers are discriminative and possess high-level visual infor-

mation. In contrast, the shallow layers contain low-level features at high spatial

resolution, beneﬁcial for localization. Ma et al. [35] employed multiple convolu-

tional layers in a hierarchical ensemble of independent DCF trackers. Instead,

we propose a novel continuous formulation to fuse multiple convolutional layers

with diﬀerent spatial resolutions in a joint learning framework.

Unlike object tracking, feature point tracking is the task of accurately es-

timating the motion of distinctive key-points. It is a core component in many

vision systems [1,27,39,48]. Most feature point tracking methods are derived from

the classic Kanade-Lucas-Tomasi (KLT) tracker [34,44]. The KLT tracker is a

generative method, that is based on minimizing the squared sum of diﬀerences

between two image patches. In the last decades, signiﬁcant eﬀort has been spent

on improving the KLT tracker [2,16]. In contrast, we propose a discriminative

learning based approach for feature point tracking.

Our approach: Our main contribution is a theoretical framework for learn-

ing discriminative convolution operators in the continuous spatial domain. Our

formulation has two major advantages compared to the conventional DCF frame-

work. Firstly, it allows a natural integration of multi-resolution feature maps, e.g.

combinations of convolutional layers or multi-resolution HOG and color features.

This property is especially desirable for object tracking, detection and action

recognition applications. Secondly, our continuous formulation enables accurate

sub-pixel localization, crucial in many feature point tracking problems.

剩余15页未读，继续阅读

评论收藏

内容反馈

足球真的可以嘛

粉丝: 0

基于连续域卷积操作跟踪（C-COT）算法的跟踪系统

C-COT跟踪算法源码

基于CCS的卷积算法

（C-COT）相关滤波目标跟踪C-COT代码（matlab版本）

C-COT中文翻译WORD

用卷积滤波器matlab代码-C-COT:C-COT

VOT2017大赛C-COT与ECO文章及代码

C-COT_VOT16_slides.pdf

ECO跟踪算法源码

C-COT以及ECO论文和代码.rar

快速多域卷积神经网络和光流法融合的目标跟踪.pdf

MATLAB常用函数

背出来,Matlab就无敌了.docx

基于注意力机制与ResNet变体的网络安全态势感知研究

matlab-基于全卷积Fully-Convolutional-Siamese-Networks的目标跟踪仿真-源码

matlab函数大全 (3).pdf

matlab命令大全

matlab函数大全

matlab讲义.docx

matlab指令大全

MATLAB常用函数[定义].pdf

Matlab 关键字整理

Matlab函数大全

Matlab中图像函数大全.txt

Matlab函数大全.txt

matlab常用函数（含解释）

python实现逆滤波与维纳滤波示例

matlab命令大全.doc

matlab常用函数汇总.doc

matlab常用函数.doc

MATLAB命令锦集

NameNode主备宕机引发的思考

3GPP R16的Conditional handover 功能有啥优势？

最新资源