Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Jiang, Lai; Xu, Mai; Wang, Zulin

doi:10.1007/978-3-030-01264-9_37

Computer Science > Computer Vision and Pattern Recognition

arXiv:1709.06316 (cs)

[Submitted on 19 Sep 2017 (v1), last revised 14 Jan 2019 (this version, v3)]

Title:Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Authors:Lai Jiang, Mai Xu, Zulin Wang

View PDF

Abstract:Over the past few years, deep neural networks (DNNs) have exhibited great success in predicting the saliency of images. However, there are few works that apply DNNs to predict the saliency of generic videos. In this paper, we propose a novel DNN-based video saliency prediction method. Specifically, we establish a large-scale eye-tracking database of videos (LEDOV), which provides sufficient data to train the DNN models for predicting video saliency. Through the statistical analysis of our LEDOV database, we find that human attention is normally attracted by objects, particularly moving objects or the moving parts of objects. Accordingly, we propose an object-to-motion convolutional neural network (OM-CNN) to learn spatio-temporal features for predicting the intra-frame saliency via exploring the information of both objectness and object motion. We further find from our database that there exists a temporal correlation of human attention with a smooth saliency transition across video frames. Therefore, we develop a two-layer convolutional long short-term memory (2C-LSTM) network in our DNN-based method, using the extracted features of OM-CNN as the input. Consequently, the inter-frame saliency maps of videos can be generated, which consider the transition of attention across video frames. Finally, the experimental results show that our method advances the state-of-the-art in video saliency prediction.

Comments:	Jiang, Lai and Xu, Mai and Liu, Tie and Qiao, Minglang and Wang, Zulin; DeepVS: A Deep Learning Based Video Saliency Prediction Approach;The European Conference on Computer Vision (ECCV); September 2018
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1709.06316 [cs.CV]
	(or arXiv:1709.06316v3 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1709.06316
Related DOI:	https://doi.org/10.1007/978-3-030-01264-9_37

Submission history

From: Lai Jiang [view email]
[v1] Tue, 19 Sep 2017 09:45:03 UTC (13,777 KB)
[v2] Mon, 25 Sep 2017 04:16:54 UTC (13,777 KB)
[v3] Mon, 14 Jan 2019 18:11:58 UTC (14,240 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators