SlideShare a Scribd company logo
Day 4 Lecture 4
Video Analytics
Xavier Giró-i-Nieto
xavier.giro@upc.edu
[course site]
2
Motivation
Slide credit: Alberto Montes
3
Motivation
4
Motivation
5
Outline
1. Scene Classification
2. Object Detection & Tracking
6
Scene Classification
(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L.
(2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
7
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Scene Classification
8
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional
networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Previous lectures
Scene Classification
9
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Scene Classification
10
Scene Classification: DeepVideo: Architectures
(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L.
(2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
11
Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14]
Scene Classification: DeepVideo: Features
(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L.
(2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
12
Scene Classification: DeepVideo: Multires
(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L.
(2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
13
Scene Classification: DeepVideo: Results
(Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L.
(2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
14
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." CVPR 2015
Scene Classification
15
Scene Classification: C3D
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
CVPR 2015
16K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition” ICLR 2015.
Scene Classification: C3D: Spatial Dimensions
17
3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D
ConvNets
Temporal depth
2D ConvNets
Scene Classification: C3D: Temporal dimension
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
18
A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is
among the best performing architectures for 3D ConvNets
Scene Classification: C3D: Temporal dimension
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
19
No gain when varying the temporal depth across layers.
Scene Classification: C3D: Temporal dimension
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
20
Feature
vector
Scene Classification: C3D: Network Architecture
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
21
Video sequence
16 frames-long clips
8 frames-long overlap
Scene Classification: C3D: Feature Vector
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
22
16-frame clip
16-frame clip
16-frame clip
16-frame clip
...
Average
4096-dimvideodescriptor
4096-dimvideodescriptor
L2 norm
Scene Classification: C3D: Feature Vector
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
23
Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for
more details.
Scene Classification: C3D: Visualization
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
24
C3D + simple linear classifier outperformed state-of-the-art
methods on 4 different benchmarks, and were comparable with
state of the art methods on other 2 benchmarks
Scene Classification: C3D: Visualization
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
25
Implementation by Michael Gygli (GitHub)
Scene Classification: C3D: Software
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal
features with 3D convolutional networks." CVPR 2015
26
Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and
George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015
Classification: Image & Optical Flow CNN + LSTM
27
(Scene Classification: Image &) Optical Flow
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D.
and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015
28
(Scene Classification: Image &) Optical Flow
Since existing ground truth datasets are not sufficiently large to train a Convnet, a
synthetic dataset is generated… and augmented (translation, rotation, scaling
transformations; additive Gaussian noise; changes in brightness, contrast, gamma
and color).
Data
augmentation
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D.
and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015
29
Scene Classification & Detection
“Biking”
CNN RNN+
Slide credit: Albero Montes
30
Classification & Detection: Proposals + C3D
(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal
Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
31
Classification & Detection: Proposals + C3D
(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal
Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
(1) Binary classification: Action or No Action
32
Classification & Detection: Proposals + C3D
(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal
Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
(2) One-vs-all Action classification
33
Classification & Detection: Proposals + C3D
(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal
Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
(3) Refinement with temporal-aware loss function
34
Classification & Detection: Proposals + C3D
(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal
Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
Post-processing
35
Classification & Detection: Proposals + C3D
(Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal
Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
36
Classification & Detection: Image + RNN + Reinforce
Yeung, Serena, Olga Russakovsky, Greg Mori, and Li Fei-Fei. "End-to-end Learning of Action Detection
from Frame Glimpses in Videos." CVPR 2016
37
Scene Classification & Detection: C3D + LSTM
Montes A. “Temporal Activity Detection in Untrimmed Videos with Recurrent Neural
Networks”. BSc thesis submitted to ETSETB (2016) [code available in Keras]
38
Outline
1. Scene Classification
2. Object Detection & Tracking
39[ILSVRC 2015 Slides and videos]
Objects: ImageNet Video
40[ILSVRC 2015 Slides and videos]
Objects: ImageNet Video
41
(Slides by Andrea Ferri): Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang,
Ruohui Wang, Xiaogang Wang, and Wanli Ouyang, “Object Detection From Video Tubelets With Convolutional Neural
Networks”, CVPR 2016 [code]
Objects: ImageNet Video: T-CNN
Object
Detection
Object
Tracking
42
Domain-specific layers are used during training for each sequence, but are
replaced by a single one at test time.
Objects: Tracking: MDNet
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual
tracking." ICCV VOT Workshop (2015)
43
Objects: Tracking: MDNet
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual
tracking." ICCV VOT Workshop (2015)
44
Objects: Tracking: FCNT
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." CVPR 2015 [code]
Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image
classification.
conv4-3 conv5-3
45Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
Despite trained for image classification, feature maps in conv5-3 enable object
localization...but are not discriminative enough to different instances of the same
class.
Objects: Tracking: FCNT: Localization
46Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
On the other hand, feature maps from conv4-3 are more sensitive to intra-class
appearance variation…
Objects: Tracking: FCNT: Localization
conv4-3 conv5-3
47Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
SNet=Specific Network (online update)
GNet=General Network (fixed)
Objects: Tracking: FCNT: Localization
48Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015.
Other works have also highlighted how features maps in convolutional layers allow
object localization.
Objects: Tracking: FCNT: Localization
49
Objects: Tracking: DeepTracking
P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]
50P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]
Objects: Tracking: DeepTracking
51
Objects: Tracking: DeepTracking
P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]
52
Summary
● Works on video are normally extensions from principles
previously tested on still images.
● RNNs can naturally handle the diversity in video lengths,
and capture its temporal dependencies.
● Trick: Init your networks to predict the next frame.
53
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi

More Related Content

What's hot (20)

PDF
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Universitat Politècnica de Catalunya
 
PDF
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PDF
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
PDF
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PDF
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
PDF
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Neural Architectures for Video Encoding
Universitat Politècnica de Catalunya
 
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
PDF
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Universitat Politècnica de Catalunya
 
PDF
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Universitat Politècnica de Catalunya
 
PDF
Action Recognitionの歴史と最新動向
Ohnishi Katsunori
 
PDF
Disentangle motion, Foreground and Background Features in Videos
Universitat Politècnica de Catalunya
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Universitat Politècnica de Catalunya
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Neural Architectures for Video Encoding
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Universitat Politècnica de Catalunya
 
Action Recognitionの歴史と最新動向
Ohnishi Katsunori
 
Disentangle motion, Foreground and Background Features in Videos
Universitat Politècnica de Catalunya
 

Viewers also liked (20)

PDF
Deep Learning for Computer Vision: Welcome (UPC TelecomBCN 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Image Classification (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Face Recognition (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Optimization (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Memory usage and computational considerati...
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Generative models and adversarial training...
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Closing (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Welcome (UPC TelecomBCN 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Deep Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Image Classification (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Image Retrieval (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Medical Imaging (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Data Augmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Visualization (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Segmentation (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Face Recognition (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Attention Models (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Optimization (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Memory usage and computational considerati...
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Generative models and adversarial training...
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Closing (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Unsupervised Learning (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Universitat Politècnica de Catalunya
 
Ad

Similar to Deep Learning for Computer Vision: Video Analytics (UPC 2016) (20)

PDF
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
PDF
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET Journal
 
PPTX
Learning spatiotemporal features with 3 d convolutional networks
SungminYou
 
PDF
IRJET- A Review on Moving Object Detection in Video Forensics
IRJET Journal
 
PDF
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Universitat Politècnica de Catalunya
 
PDF
Deep Learning from Videos (UPC 2018)
Universitat Politècnica de Catalunya
 
PPTX
TechnicalBackgroundOverview
Motaz El-Saban
 
PDF
Real Time Object Detection with Audio Feedback using Yolo v3
ijtsrd
 
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Understanding user interactivity for immersive communications and its impact ...
Alpen-Adria-Universität
 
PDF
Understanding user interactivity for immersive communications and its impact ...
lauratoni4
 
PDF
Video Inpainting detection using inconsistencies in optical Flow
Cybersecurity Education and Research Centre
 
PDF
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
When Remote Sensing Meets Artificial Intelligence
WahyuRahmaniar2
 
PPTX
Presentation2.pptx of sota seminar iit kanpur
datastudydaily
 
PDF
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
PDF
Introduction talk to Computer Vision
Chen Sagiv
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Language and Vision (D2L11 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET Journal
 
Learning spatiotemporal features with 3 d convolutional networks
SungminYou
 
IRJET- A Review on Moving Object Detection in Video Forensics
IRJET Journal
 
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Universitat Politècnica de Catalunya
 
Deep Learning from Videos (UPC 2018)
Universitat Politècnica de Catalunya
 
TechnicalBackgroundOverview
Motaz El-Saban
 
Real Time Object Detection with Audio Feedback using Yolo v3
ijtsrd
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Understanding user interactivity for immersive communications and its impact ...
Alpen-Adria-Universität
 
Understanding user interactivity for immersive communications and its impact ...
lauratoni4
 
Video Inpainting detection using inconsistencies in optical Flow
Cybersecurity Education and Research Centre
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
When Remote Sensing Meets Artificial Intelligence
WahyuRahmaniar2
 
Presentation2.pptx of sota seminar iit kanpur
datastudydaily
 
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
Introduction talk to Computer Vision
Chen Sagiv
 
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
PDF
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
PDF
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 
PDF
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Universitat Politècnica de Catalunya
 

Recently uploaded (20)

PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
PDF
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
PDF
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
PPTX
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
PDF
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
PPTX
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
PDF
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPT
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
PDF
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PPTX
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
PDF
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
OOPs with Java_unit2.pdf. sarthak bookkk
Sarthak964187
 
How to Connect Your On-Premises Site to AWS Using Site-to-Site VPN.pdf
Tamanna
 
Simplifying Document Processing with Docling for AI Applications.pdf
Tamanna
 
AI Presentation Tool Pitch Deck Presentation.pptx
ShyamPanthavoor1
 
apidays Helsinki & North 2025 - REST in Peace? Hunting the Dominant Design fo...
apidays
 
Listify-Intelligent-Voice-to-Catalog-Agent.pptx
nareshkottees
 
apidays Helsinki & North 2025 - APIs in the healthcare sector: hospitals inte...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
apidays Helsinki & North 2025 - Vero APIs - Experiences of API development in...
apidays
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
tuberculosiship-2106031cyyfuftufufufivifviviv
AkshaiRam
 
AUDITABILITY & COMPLIANCE OF AI SYSTEMS IN HEALTHCARE
GAHI Youssef
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Numbers of a nation: how we estimate population statistics | Accessible slides
Office for National Statistics
 
OPPOTUS - Malaysias on Malaysia 1Q2025.pdf
Oppotus
 

Deep Learning for Computer Vision: Video Analytics (UPC 2016)

  • 1. Day 4 Lecture 4 Video Analytics Xavier Giró-i-Nieto [email protected] [course site]
  • 5. 5 Outline 1. Scene Classification 2. Object Detection & Tracking
  • 6. 6 Scene Classification (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
  • 7. 7 Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Scene Classification
  • 8. 8 Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Previous lectures Scene Classification
  • 9. 9 Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Scene Classification
  • 10. 10 Scene Classification: DeepVideo: Architectures (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
  • 11. 11 Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14] Scene Classification: DeepVideo: Features (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
  • 12. 12 Scene Classification: DeepVideo: Multires (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
  • 13. 13 Scene Classification: DeepVideo: Results (Slides by Victor Campos) Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. CVPR 2014
  • 14. 14 Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015 Scene Classification
  • 15. 15 Scene Classification: C3D Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 16. 16K. Simonyan, A. Zisserman, “Very Deep Convolutional Networks for Large-Scale Image Recognition” ICLR 2015. Scene Classification: C3D: Spatial Dimensions
  • 17. 17 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets Temporal depth 2D ConvNets Scene Classification: C3D: Temporal dimension Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 18. 18 A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets Scene Classification: C3D: Temporal dimension Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 19. 19 No gain when varying the temporal depth across layers. Scene Classification: C3D: Temporal dimension Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 20. 20 Feature vector Scene Classification: C3D: Network Architecture Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 21. 21 Video sequence 16 frames-long clips 8 frames-long overlap Scene Classification: C3D: Feature Vector Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 22. 22 16-frame clip 16-frame clip 16-frame clip 16-frame clip ... Average 4096-dimvideodescriptor 4096-dimvideodescriptor L2 norm Scene Classification: C3D: Feature Vector Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 23. 23 Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details. Scene Classification: C3D: Visualization Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 24. 24 C3D + simple linear classifier outperformed state-of-the-art methods on 4 different benchmarks, and were comparable with state of the art methods on other 2 benchmarks Scene Classification: C3D: Visualization Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 25. 25 Implementation by Michael Gygli (GitHub) Scene Classification: C3D: Software Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." CVPR 2015
  • 26. 26 Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." CVPR 2015 Classification: Image & Optical Flow CNN + LSTM
  • 27. 27 (Scene Classification: Image &) Optical Flow Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015
  • 28. 28 (Scene Classification: Image &) Optical Flow Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color). Data augmentation Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., FlowNet: Learning Optical Flow With Convolutional Networks. CVPR 2015
  • 29. 29 Scene Classification & Detection “Biking” CNN RNN+ Slide credit: Albero Montes
  • 30. 30 Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
  • 31. 31 Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code] (1) Binary classification: Action or No Action
  • 32. 32 Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code] (2) One-vs-all Action classification
  • 33. 33 Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code] (3) Refinement with temporal-aware loss function
  • 34. 34 Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code] Post-processing
  • 35. 35 Classification & Detection: Proposals + C3D (Slidecast and Slides by Alberto Montes) Shou, Zheng, Dongang Wang, and Shih-Fu Chang. "Temporal Action Localization in Untrimmed Videos via Multi-stage CNNs." CVPR 2016 [code]
  • 36. 36 Classification & Detection: Image + RNN + Reinforce Yeung, Serena, Olga Russakovsky, Greg Mori, and Li Fei-Fei. "End-to-end Learning of Action Detection from Frame Glimpses in Videos." CVPR 2016
  • 37. 37 Scene Classification & Detection: C3D + LSTM Montes A. “Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks”. BSc thesis submitted to ETSETB (2016) [code available in Keras]
  • 38. 38 Outline 1. Scene Classification 2. Object Detection & Tracking
  • 39. 39[ILSVRC 2015 Slides and videos] Objects: ImageNet Video
  • 40. 40[ILSVRC 2015 Slides and videos] Objects: ImageNet Video
  • 41. 41 (Slides by Andrea Ferri): Kai Kang, Hongsheng Li, Junjie Yan, Xingyu Zeng, Bin Yang, Tong Xiao, Cong Zhang, Zhe Wang, Ruohui Wang, Xiaogang Wang, and Wanli Ouyang, “Object Detection From Video Tubelets With Convolutional Neural Networks”, CVPR 2016 [code] Objects: ImageNet Video: T-CNN Object Detection Object Tracking
  • 42. 42 Domain-specific layers are used during training for each sequence, but are replaced by a single one at test time. Objects: Tracking: MDNet Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
  • 43. 43 Objects: Tracking: MDNet Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
  • 44. 44 Objects: Tracking: FCNT Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." CVPR 2015 [code] Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification. conv4-3 conv5-3
  • 45. 45Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Despite trained for image classification, feature maps in conv5-3 enable object localization...but are not discriminative enough to different instances of the same class. Objects: Tracking: FCNT: Localization
  • 46. 46Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] On the other hand, feature maps from conv4-3 are more sensitive to intra-class appearance variation… Objects: Tracking: FCNT: Localization conv4-3 conv5-3
  • 47. 47Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] SNet=Specific Network (online update) GNet=General Network (fixed) Objects: Tracking: FCNT: Localization
  • 48. 48Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015. Other works have also highlighted how features maps in convolutional layers allow object localization. Objects: Tracking: FCNT: Localization
  • 49. 49 Objects: Tracking: DeepTracking P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]
  • 50. 50P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code] Objects: Tracking: DeepTracking
  • 51. 51 Objects: Tracking: DeepTracking P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” AAAI 2016. [code]
  • 52. 52 Summary ● Works on video are normally extensions from principles previously tested on still images. ● RNNs can naturally handle the diversity in video lengths, and capture its temporal dependencies. ● Trick: Init your networks to predict the next frame.
  • 53. 53 Thanks ! Q&A ? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi