SlideShare a Scribd company logo
@DocXavi
Deep Learning for Computer Vision
Video Analytics
5 May 2016
Xavier Giró-i-Nieto
Master en
Creació Multimedia
Outline
1. Recognition
2. Optical Flow
3. Object Tracking
2
Recognition
Demo: Clarifai
MIT Technology Review : “A start-up’s Neural Network Can Understand Video” (3/2/2015)
3
Figure: Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with
convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE.
4
Recognition
5
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
6
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Previous lectures
7
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video
classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014
IEEE Conference on (pp. 1725-1732). IEEE.
Slides extracted from ReadCV seminar by Victor Campos 8
Recognition: DeepVideo
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional
neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 9
Recognition: DeepVideo: Demo
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional
neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 10
Recognition: DeepVideo: Architectures
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional
neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 11
Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14]
Recognition: DeepVideo: Features
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional
neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 12
Recognition: DeepVideo: Multiscale
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional
neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 13
Recognition: DeepVideo: Results
14
Recognition
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D
convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
15
Recognition: C3D
Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning
spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International
Conference on Computer Vision, pp. 4489-4497. 2015
16
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Demo
17
K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015.
Recognition: C3D: Spatial dimension
Spatial dimensions (XY) of the used kernels are fixed to 3x3, following Symonian & Zisserman (ICLR 2015).
18
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Temporal dimension
3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets
Temporal depth
2D ConvNets
19
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best
performing architectures for 3D ConvNets
Recognition: C3D: Temporal dimension
20
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
No gain when varying the temporal depth across layers.
Recognition: C3D: Temporal dimension
21
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Architecture
Feature
vector
22
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Feature vector
Video sequence
16 frames-long clips
8 frames-long overlap
23
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Feature vector
16-frame clip
16-frame clip
16-frame clip
16-frame clip
...
Average
4096-dimvideodescriptor
4096-dimvideodescriptor
L2 norm
24
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Visualization
Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details.
25
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Compactness
26
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4
different benchmarks and are comparable with state of the art methods on other 2 benchmarks
Recognition: C3D: Performance
27
Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks."
In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
Recognition: C3D: Software
Implementation by Michael Gygli (GitHub)
28
Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets:
Deep networks for video classification." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694-4702. 2015.
Recognition:
29
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
30
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
31
Recognition: ImageNet Video
[ILSVRC 2015 Slides and videos]
32
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
33
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
34
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
35
Recognition: ImageNet Video
Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
Optical Flow
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 36
Optical Flow: FlowNet
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 37
Optical Flow: FlowNet
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 38
End to end supervised learning of optical flow.
Optical Flow: FlowNet (contracting)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 39
Option A: Stack both input images together and feed them through a generic network.
Optical Flow: FlowNet (contracting)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 40
Option B: Create two separate, yet identical processing streams for the two images and combine them at a
later stage.
Optical Flow: FlowNet (contracting)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 41
Option B: Create two separate, yet identical processing streams for the two images and combine them at a
later stage.
Correlation layer:
Convolution of data patches from the layers to combine.
Optical Flow: FlowNet (expanding)
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 42
Upconvolutional layers: Unpooling features maps + convolution.
Upconvolutioned feature maps are concatenated with the corresponding map from the contractive part.
Optical Flow: FlowNet
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. ICCV 2015 43
Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic Flying Dataset
is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise;
changes in brightness, contrast, gamma and color).
Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI.
Data
augmentation
Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning
Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 44
Optical Flow: FlowNet
Object tracking: MDNet
45
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
Object tracking: MDNet
46
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
Object tracking: MDNet: Architecture
47
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
Domain-specific layers are used during training for each sequence, but are replaced by a single one at test
time.
Object tracking: MDNet: Online update
48
Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
MDNet is updated online at test
time with hard negative mining,
that is, selecting negative
samples with the highest positive
score.
Object tracking: FCNT
49
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." ICCV 2015 [code]
Object tracking: FCNT
50
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification.
conv4-3 conv5-3
Object tracking: FCNT: Specialization
51
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking
sequence.
Object tracking: FCNT: Localization
52
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
Although trained for image classification, feature maps in conv5-3 enable object localization…
...but is not discriminative enough to different objects of the same category.
Object tracking: Localization
53
Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015.
Other works have shown how features maps in convolutional layers allow object localization.
Object tracking: FCNT: Localization
54
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
On the other hand, feature maps from conv4-3 are more sensitive to intra-class appearance variation…
conv4-3 conv5-3
Object tracking: FCNT: Architecture
55
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
SNet=Specific Network (online update)
GNet=General Network (fixed)
Object tracking: FCNT: Results
56
Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE
International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
Object tracking: DeepTracking
57
P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth
AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
Object tracking: DeepTracking
58
P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth
AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
Object tracking: DeepTracking
59
P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth
AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
Object tracking: DeepTracking
60
P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth
AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
Object tracking: Compact features
61
Wang, Naiyan, and Dit-Yan Yeung. "Learning a deep compact image representation for visual tracking." NIPS 2013 [Project page with code]
62
Thank you !
https://imatge.upc.edu/web/people/xavier-giro
https://twitter.com/DocXavi
https://www.facebook.com/ProfessorXavi
xavier.giro@upc.edu
Xavier Giró-i-Nieto

More Related Content

PDF
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Universitat Politècnica de Catalunya
 
PDF
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Video: Action Recognition (UPC 2018)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Universitat Politècnica de Catalunya
 
PDF
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PDF
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: Video Analytics (UPC 2016)
Universitat Politècnica de Catalunya
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Universitat Politècnica de Catalunya
 
Video Analysis with Convolutional Neural Networks (Master Computer Vision Bar...
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Universitat Politècnica de Catalunya
 
Deep Learning for Video: Action Recognition (UPC 2018)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Universitat Politècnica de Catalunya
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 

What's hot (20)

PDF
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
PDF
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
PDF
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Universitat Politècnica de Catalunya
 
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
PDF
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Neural Architectures for Video Encoding
Universitat Politècnica de Catalunya
 
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
PDF
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Universitat Politècnica de Catalunya
 
PDF
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Video: Object Tracking (UPC 2018)
Universitat Politècnica de Catalunya
 
PDF
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
PDF
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Universitat Politècnica de Catalunya
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
Temporal Activity Detection in Untrimmed Videos with Recurrent Neural Networks
Universitat Politècnica de Catalunya
 
Speaker ID II (D4L1 Deep Learning for Speech and Language UPC 2017)
Universitat Politècnica de Catalunya
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Neural Architectures for Video Encoding
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Universitat Politècnica de Catalunya
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Video: Object Tracking (UPC 2018)
Universitat Politècnica de Catalunya
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Universitat Politècnica de Catalunya
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Ad

Similar to Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016 (20)

PDF
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET Journal
 
PDF
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
PDF
Understanding user interactivity for immersive communications and its impact ...
Alpen-Adria-Universität
 
PDF
Understanding user interactivity for immersive communications and its impact ...
lauratoni4
 
PDF
Real Time Object Detection with Audio Feedback using Yolo v3
ijtsrd
 
PPTX
Unsupervised object-level video summarization with online motion auto-encoder
NEERAJ BAGHEL
 
PDF
Top Cited Articles in Computer Graphics and Animation
ijcga
 
PPTX
Mtech Fourth progress presentation
NEERAJ BAGHEL
 
PDF
Object Detection using SURF features
IRJET Journal
 
PDF
Elliptical curve cryptography image encryption scheme with aid of optimizatio...
IJEECSIAES
 
PDF
Elliptical curve cryptography image encryption scheme with aid of optimizatio...
nooriasukmaningtyas
 
PDF
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Universitat Politècnica de Catalunya
 
PDF
October 2021: Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
PPTX
Sharbani bhattacharya gyanodya 2014
Sharbani Bhattacharya
 
PPTX
presentation.pptxpresentation.pptxpresentation.pptx
sadman190214
 
PPTX
Hierarchical structure adaptive
NEERAJ BAGHEL
 
PPTX
Data Science for IoT
Olivera Kotevska, Ph.D.
 
PDF
December 2024 - Top 10 Read Articles in Network Security & Its Applications.pdf
IJNSA Journal
 
PDF
April 2024 - Top 10 Read Articles in Network Security & Its Applications
IJNSA Journal
 
PDF
July 2024 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
IRJET- Identification of Missing Person in the Crowd using Pretrained Neu...
IRJET Journal
 
Multi modal retrieval and generation with deep distributed models
Roelof Pieters
 
Understanding user interactivity for immersive communications and its impact ...
Alpen-Adria-Universität
 
Understanding user interactivity for immersive communications and its impact ...
lauratoni4
 
Real Time Object Detection with Audio Feedback using Yolo v3
ijtsrd
 
Unsupervised object-level video summarization with online motion auto-encoder
NEERAJ BAGHEL
 
Top Cited Articles in Computer Graphics and Animation
ijcga
 
Mtech Fourth progress presentation
NEERAJ BAGHEL
 
Object Detection using SURF features
IRJET Journal
 
Elliptical curve cryptography image encryption scheme with aid of optimizatio...
IJEECSIAES
 
Elliptical curve cryptography image encryption scheme with aid of optimizatio...
nooriasukmaningtyas
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Universitat Politècnica de Catalunya
 
October 2021: Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
Sharbani bhattacharya gyanodya 2014
Sharbani Bhattacharya
 
presentation.pptxpresentation.pptxpresentation.pptx
sadman190214
 
Hierarchical structure adaptive
NEERAJ BAGHEL
 
Data Science for IoT
Olivera Kotevska, Ph.D.
 
December 2024 - Top 10 Read Articles in Network Security & Its Applications.pdf
IJNSA Journal
 
April 2024 - Top 10 Read Articles in Network Security & Its Applications
IJNSA Journal
 
July 2024 - Top 10 Read Articles in Network Security and Its Applications
IJNSA Journal
 
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
PDF
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 

Recently uploaded (20)

PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PPTX
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Software Development Methodologies in 2025
KodekX
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
PDF
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
PDF
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PDF
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PDF
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
PDF
The Future of Artificial Intelligence (AI)
Mukul
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Agile Chennai 18-19 July 2025 Ideathon | AI Powered Microfinance Literacy Gui...
AgileNetwork
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Software Development Methodologies in 2025
KodekX
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Data_Analytics_vs_Data_Science_vs_BI_by_CA_Suvidha_Chaplot.pdf
CA Suvidha Chaplot
 
Responsible AI and AI Ethics - By Sylvester Ebhonu
Sylvester Ebhonu
 
The Future of Mobile Is Context-Aware—Are You Ready?
iProgrammer Solutions Private Limited
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
AI-Cloud-Business-Management-Platforms-The-Key-to-Efficiency-Growth.pdf
Artjoker Software Development Company
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
Using Anchore and DefectDojo to Stand Up Your DevSecOps Function
Anchore
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
The Future of Artificial Intelligence (AI)
Mukul
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 

Deep Learning for Computer Vision (3/4): Video Analytics @ laSalle 2016

  • 1. @DocXavi Deep Learning for Computer Vision Video Analytics 5 May 2016 Xavier Giró-i-Nieto Master en Creació Multimedia
  • 2. Outline 1. Recognition 2. Optical Flow 3. Object Tracking 2
  • 3. Recognition Demo: Clarifai MIT Technology Review : “A start-up’s Neural Network Can Understand Video” (3/2/2015) 3
  • 4. Figure: Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 4 Recognition
  • 5. 5 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  • 6. 6 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Previous lectures
  • 7. 7 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  • 8. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. Slides extracted from ReadCV seminar by Victor Campos 8 Recognition: DeepVideo
  • 9. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 9 Recognition: DeepVideo: Demo
  • 10. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 10 Recognition: DeepVideo: Architectures
  • 11. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 11 Unsupervised learning [Le at al’11] Supervised learning [Karpathy et al’14] Recognition: DeepVideo: Features
  • 12. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 12 Recognition: DeepVideo: Multiscale
  • 13. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., & Fei-Fei, L. (2014, June). Large-scale video classification with convolutional neural networks. In Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on (pp. 1725-1732). IEEE. 13 Recognition: DeepVideo: Results
  • 14. 14 Recognition Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  • 15. 15 Recognition: C3D Figure: Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015
  • 16. 16 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Demo
  • 17. 17 K. Simonyan, A. Zisserman, Very Deep Convolutional Networks for Large-Scale Image Recognition ICLR 2015. Recognition: C3D: Spatial dimension Spatial dimensions (XY) of the used kernels are fixed to 3x3, following Symonian & Zisserman (ICLR 2015).
  • 18. 18 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Temporal dimension 3D ConvNets are more suitable for spatiotemporal feature learning compared to 2D ConvNets Temporal depth 2D ConvNets
  • 19. 19 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 A homogeneous architecture with small 3 × 3 × 3 convolution kernels in all layers is among the best performing architectures for 3D ConvNets Recognition: C3D: Temporal dimension
  • 20. 20 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 No gain when varying the temporal depth across layers. Recognition: C3D: Temporal dimension
  • 21. 21 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Architecture Feature vector
  • 22. 22 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Feature vector Video sequence 16 frames-long clips 8 frames-long overlap
  • 23. 23 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Feature vector 16-frame clip 16-frame clip 16-frame clip 16-frame clip ... Average 4096-dimvideodescriptor 4096-dimvideodescriptor L2 norm
  • 24. 24 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Visualization Based on Deconvnets by Zeiler and Fergus [ECCV 2014] - See [ReadCV Slides] for more details.
  • 25. 25 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Compactness
  • 26. 26 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Convolutional 3D(C3D) combined with a simple linear classifier outperforms state-of-the-art methods on 4 different benchmarks and are comparable with state of the art methods on other 2 benchmarks Recognition: C3D: Performance
  • 27. 27 Tran, Du, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. "Learning spatiotemporal features with 3D convolutional networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 4489-4497. 2015 Recognition: C3D: Software Implementation by Michael Gygli (GitHub)
  • 28. 28 Yue-Hei Ng, Joe, Matthew Hausknecht, Sudheendra Vijayanarasimhan, Oriol Vinyals, Rajat Monga, and George Toderici. "Beyond short snippets: Deep networks for video classification." In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694-4702. 2015. Recognition:
  • 29. 29 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  • 30. 30 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  • 31. 31 Recognition: ImageNet Video [ILSVRC 2015 Slides and videos]
  • 32. 32 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  • 33. 33 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  • 34. 34 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  • 35. 35 Recognition: ImageNet Video Kai Kang et al, Object Detection in Videos with TubeLets and Multi-Context Cues (ILSVRC 2015) [video] [poster]
  • 36. Optical Flow Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 36
  • 37. Optical Flow: FlowNet Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 37
  • 38. Optical Flow: FlowNet Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 38 End to end supervised learning of optical flow.
  • 39. Optical Flow: FlowNet (contracting) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 39 Option A: Stack both input images together and feed them through a generic network.
  • 40. Optical Flow: FlowNet (contracting) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 40 Option B: Create two separate, yet identical processing streams for the two images and combine them at a later stage.
  • 41. Optical Flow: FlowNet (contracting) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 41 Option B: Create two separate, yet identical processing streams for the two images and combine them at a later stage. Correlation layer: Convolution of data patches from the layers to combine.
  • 42. Optical Flow: FlowNet (expanding) Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 42 Upconvolutional layers: Unpooling features maps + convolution. Upconvolutioned feature maps are concatenated with the corresponding map from the contractive part.
  • 43. Optical Flow: FlowNet Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. ICCV 2015 43 Since existing ground truth datasets are not sufficiently large to train a Convnet, a synthetic Flying Dataset is generated… and augmented (translation, rotation, scaling transformations; additive Gaussian noise; changes in brightness, contrast, gamma and color). Convnets trained on these unrealistic data generalize well to existing datasets such as Sintel and KITTI. Data augmentation
  • 44. Dosovitskiy, A., Fischer, P., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D. and Brox, T., 2015. FlowNet: Learning Optical Flow With Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2758-2766). 44 Optical Flow: FlowNet
  • 45. Object tracking: MDNet 45 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
  • 46. Object tracking: MDNet 46 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015)
  • 47. Object tracking: MDNet: Architecture 47 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015) Domain-specific layers are used during training for each sequence, but are replaced by a single one at test time.
  • 48. Object tracking: MDNet: Online update 48 Nam, Hyeonseob, and Bohyung Han. "Learning multi-domain convolutional neural networks for visual tracking." ICCV VOT Workshop (2015) MDNet is updated online at test time with hard negative mining, that is, selecting negative samples with the highest positive score.
  • 49. Object tracking: FCNT 49 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." ICCV 2015 [code]
  • 50. Object tracking: FCNT 50 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Focus on conv4-3 and conv5-3 of VGG-16 network pre-trained for ImageNet image classification. conv4-3 conv5-3
  • 51. Object tracking: FCNT: Specialization 51 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Most feature maps in VGG-16 conv4-3 and conv5-3 are not related to the foreground regions in a tracking sequence.
  • 52. Object tracking: FCNT: Localization 52 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] Although trained for image classification, feature maps in conv5-3 enable object localization… ...but is not discriminative enough to different objects of the same category.
  • 53. Object tracking: Localization 53 Zhou, Bolei, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. "Object detectors emerge in deep scene cnns." ICLR 2015. Other works have shown how features maps in convolutional layers allow object localization.
  • 54. Object tracking: FCNT: Localization 54 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] On the other hand, feature maps from conv4-3 are more sensitive to intra-class appearance variation… conv4-3 conv5-3
  • 55. Object tracking: FCNT: Architecture 55 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code] SNet=Specific Network (online update) GNet=General Network (fixed)
  • 56. Object tracking: FCNT: Results 56 Wang, Lijun, Wanli Ouyang, Xiaogang Wang, and Huchuan Lu. "Visual Tracking with Fully Convolutional Networks." In Proceedings of the IEEE International Conference on Computer Vision, pp. 3119-3127. 2015 [code]
  • 57. Object tracking: DeepTracking 57 P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
  • 58. Object tracking: DeepTracking 58 P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
  • 59. Object tracking: DeepTracking 59 P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
  • 60. Object tracking: DeepTracking 60 P. Ondruska and I. Posner, “Deep Tracking: Seeing Beyond Seeing Using Recurrent Neural Networks,” in The Thirtieth AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona USA, 2016. [code]
  • 61. Object tracking: Compact features 61 Wang, Naiyan, and Dit-Yan Yeung. "Learning a deep compact image representation for visual tracking." NIPS 2013 [Project page with code]