SlideShare a Scribd company logo
@DocXavi
Module 6 - Day 8 - Lecture 1
Self-supervised Learning
from Video Sequences
28th March 2019
[http://pagines.uab.cat/mcv/]
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politècnica de
Catalunya
2
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verifications
d. Predictive Learning
e. Miscellaneous: optical flow, color & multiview
Types of machine learning
Yann Lecun’s Black Forest cake
3
Slide credit:
Yann LeCun
Supervised learning
4
Slide credit: Kevin McGuinness
y^
Unsupervised learning
5
Slide credit: Kevin McGuinness
y^
6
Types of machine learning
We can categorize three types of learning procedures:
1. Supervised Learning:
𝐲 = ƒ(𝐱)
2. Unsupervised Learning:
ƒ(𝐱)
3. Reinforcement Learning (RL):
𝐲 = ƒ(𝐱)
𝐳
Predict label y corresponding to
observation x
Estimate the distribution of
observation x
Predict action y based on
observation x, to maximize a future
reward z
7
Types of machine learning
We can categorize three types of learning procedures:
1. Supervised Learning:
𝐲 = ƒ(𝐱)
2. Unsupervised Learning:
ƒ(𝐱)
3. Reinforcement Learning (RL):
𝐲 = ƒ(𝐱)
𝐳
8
Why Unsupervised Learning ?
● It is the nature of how intelligent beings percept
the world.
● It can save us tons of efforts to build a
human-alike intelligent agent compared to a
totally supervised fashion.
● Vast amounts of unlabelled data.
9
Assumptions for Unsupervised Learning
Slide: Kevin McGuinness (DLCV UPC 2017)
To model P(X) given data, it is necessary to make some assumptions
“You can’t do inference without making assumptions”
-- David MacKay, Information Theory, Inference, and Learning Algorithms
10
Assumptions for Unsupervised Learning
Slide: Kevin McGuinness (DLCV UPC 2017)
To model P(X) given data, it is necessary to make some assumptions
“You can’t do inference without making assumptions”
-- David MacKay, Information Theory, Inference, and Learning Algorithms
Typical assumptions:
● Smoothness assumption
○ Points which are close to each other are more likely to share a label.
● Cluster assumption
○ The data form discrete clusters; points in the same cluster are likely to share a label
● Manifold assumption
○ The data lie approximately on a manifold of much lower dimension than the input space.
11
The manifold hypothesis
Slide: Kevin McGuinness (DLCV UPC 2017)
x1
x2
Linear manifold
wT
x + b
x1
x2
Non-linear
manifold
12
The manifold hypothesis
Slide: Kevin McGuinness (DLCV UPC 2017)
The data distribution lie close to a low-dimensional
manifold
Example: consider image data
● Very high dimensional (1,000,000D)
● A randomly generated image will almost certainly not
look like any real world scene
○ The space of images that occur in nature is
almost completely empty
● Hypothesis: real world images lie on a smooth,
low-dimensional manifold
○ Manifold distance is a good measure of
similarity
Similar for audio and text
13
Video lectures on Unsupervised Learning
Kevin McGuinness, UPC DLCV 2016 Xavier Giró, UPC DLAI 2017
14
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verifications
d. Predictive Learning
e. Miscellaneous: optical flow, color & multiview
15
Acknowledgements
Víctor Campos Junting Pan Xunyu Lin Sebastian
Palacio
Carlos
Arenas
16
Self-supervised learning
Reference: Andrew Zisserman (PAISS 2018)
Self-supervised learning is a form of unsupervised learning where the data
provides the supervision.
● A surrogate task must be invented by withholding a part of the unlabeled
data and training the NN to predict it.
Unlabeled data
(X)
17
Self-supervised learning
Reference: Andrew Zisserman (PAISS 2018)
Self-supervised learning is a form of unsupervised learning where the data
provides the supervision.
● By defining a proxy loss, the NN learns representations, which should be
valuable for the actually target task.
^
y
loss
Representations learned without labels
18
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verifications
d. Predictive Learning
e. Miscellaneous: optical flow, color & multiview
19
Autoencoder (AE)
Fig: “Deep Learning Tutorial” Stanford
Autoencoders:
● Predict at the output the
same input data.
● Do not need labels.
20
Autoencoder (AE)
Fig: “Deep Learning Tutorial” Stanford
What is the use of an autoencoder ?
21
Autoencoder (AE)
Fig: “Deep Learning Tutorial” Stanford
Dimensionality reduction:
Use the hidden layer as a
feature extractor of any
desired size.
22
Autoencoder (AE)
Slide: Kevin McGuinness (DLCV UPC 2017)
Encoder
W1
Decoder
W2
hdata reconstruction
Loss
(reconstruction error)
Latent variables
(representation/features)
Pretraining:
1. Initialize a NN by solving an autoencoding
problem.
23
Autoencoder (AE)
Slide: Kevin McGuinness (DLCV UPC 2017)
Latent variables
(representation/features)
Encoder
W1
hdata Classifier
WC
prediction
y Loss
(cross entropy)
Pretraining:
1. Initialize a NN solving an autoencoding
problem.
2. Train for final task with “few” labels.
24
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verifications
d. Predictive Learning
e. Miscellaneous: optical flow, color & multiview
25
Temporal regularization: ISA
Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for
action recognition with independent subspace analysis." CVPR 2011
Features are learned with a Independent Subspace Analysis (ISA). Uses
convolution and pooling operations.
26Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for
action recognition with independent subspace analysis." CVPR 2011
Features are learned unsupervisedly by considering 3D (space+time) video
blocks.
Temporal regularization: ISA
27Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for
action recognition with independent subspace analysis." CVPR 2011
Feature visualizations.
Temporal regularization: ISA
28
Assumption: adjacent video frames contain semantically similar information.
Autoencoder trained with regularizations by slowliness and sparisty.
Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. "Unsupervised learning of spatiotemporally
coherent metrics." ICCV 2015.
Temporal regularization: Slowliness
29Jayaraman, Dinesh, and Kristen Grauman. "Slow and steady feature analysis: higher order temporal coherence in video."
CVPR 2016. [video]
Slow feature analysis
● Temporal coherence assumption: features
should change slowly over time in video
Steady feature analysis
● Second order changes also small: changes in
the past should resemble changes in the future
Train on triplets of frames from video
Loss encourages nearby frames to have slow
and steady features, and far frames to have
different features
Temporal regularization: Slowliness
30
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verification
d. Predictive Learning
e. Miscellaneous: optical flow, color & multiview
31
Related work on still images
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction."
ICCV 2015.
A surrogate task is defined by exploiting the spatial context.
32
Related work on still images
Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction."
ICCV 2015.
What video-specific surrogate tasks could you think about ?
33
Temporal coherence
(Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code]
Temporal order of frames is
exploited as the supervisory
signal for learning.
34
Temporal coherence
(Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code]
Take temporal order as the supervisory signals for learning
Shuffled
sequences
Binary classification
In order
Not in order
35
Temporal coherence
(Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code]
36
Temporal coherence
(Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code]
37
Temporal verification
#Odd-one-out Fernando, Basura, Hakan Bilen, Efstratios Gavves, and Stephen Gould. "Self-supervised video
representation learning with odd-one-out networks." ICCV 2017
Train a network to detect which of the video sequences contains frames in the wrong order.
38
Temporal coherence
Lee, Hsin-Ying, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. "Unsupervised representation learning by sorting
sequences." ICCV 2017.
Sort the sequence of frames.
39
Temporal coherence
#T-CAM Wei, Donglai, Joseph J. Lim, Andrew Zisserman, and William T. Freeman. "Learning and using
the arrow of time." CVPR 2018.
Predict whether the video moves forward or backward.
40
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verification
d. Frame Prediction
e. Miscellaneous: optical flow, color & multiview
41
Predictive Learning
Slide credit:
Yann LeCun
42
Frame Prediction
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video
Representations using LSTMs." In ICML 2015. [Github]
Learning video representations (features) by...
43
(1) frame reconstruction (AE):
Learning video representations (features) by...
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." ICML 2015. [Github]
Frame Prediction
44
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." ICML 2015. [Github]
Learning video representations (features) by...
(2) frame prediction
Frame Prediction
45
Unsupervised learned features (lots of data) are
fine-tuned for activity recognition (small data).
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." ICML 2015. [Github]
Frame Prediction
46
Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error."
ICLR 2016 [project] [code]
Video frame prediction with a ConvNet.
Frame Prediction
47
Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error."
ICLR 2016 [project] [code]
The blurry predictions from MSE (l1) are improved with multi-scale architecture,
adversarial training and an image gradient difference loss (GDL) function.
Frame Prediction
48
Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error."
ICLR 2016 [project] [code]
Frame Prediction
49#DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017.
The model learns to disentangle (“separate”) the visual features that correspond
to the:
Object Pose
(wrt the camera)
Object Content
(class)
Frame Prediction + Disentangled features
50
#DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017.
#MCNet R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee. Decomposing motion and content for natural video sequence
prediction. In ICLR, 2017
100 step video generation on KTH where green frames indicate conditioned input
and red frames indicate generations.
Generations from the MCNet of Villegas et al. (2017), are shown for comparison.
Frame Prediction + Disentangled features
51
A CNN video architecture learns disentangles features for appearance & motion
when trained for frame prediction.
Frame Prediction + Disentangled features
a_feat
F(a_feat,
m_feat)
FC layer
- Frame t
pixel_loss
- Frame t
total_loss
m_feat
Temporal conv
- [t-N, t] frames
- w_size: N
2D Spatial
deconv
- Frame t
2D Spatial
deconv
- Frame t+K
pixel_loss
- Frame t+K
2D Spatial conv
- [t-N, t] frames
Input clip
20 frames
DecoderEncoder
- Backbone for the
two streams
Block
gradients
Sum of both
pixel losses
#DisNet Carlos Arenas, Victor Campos, Sebastian Palacio, Xavier Giro-i-Nieto, “Video Understanding through the
Disentanglement of Appearance and Motion” MSc thesis, ETSETB TelecomBCN 2018.
52#DisNet Carlos Arenas, Victor Campos, Sebastian Palacio, Xavier Giro-i-Nieto, “Video Understanding through the
Disentanglement of Appearance and Motion” MSc thesis, ETSETB TelecomBCN 2018.
A synthetic dataset of moving MNIST digits was built to have access to virtually an
infinite amount of data.
Frame Prediction + Disentangled features
‐ Train:
○ 5000 clips: (0) Horizontal
○ 5000 clips: (3) Vertical
‐ Validation:
○ 500 clips: (3) Horizontal
○ 500 clips: (0) Vertical
‐ Bounding angle: 180º
‐ Speed: 8 pixels/frame
‐ Size: original scale 1:1
.
.
.
.
.
.
.
.
. 20
64
6
4
53
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verification
d. Frame Prediction
e. Miscellaneous: optical flow, color & multiview
54
Pathak, Deepak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. "Learning features by watching
objects move." CVPR 2017
Noisy labels from motion (optical flow)
Noisy labels can be built with optical flow computed with a handcrafted tool.
NN somehow regularizes the noise present in the annotations
55
Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Noisy labels from color
A NN is trained to colorize a video frame, given the color of the first frame of the
video sequence.
CNN
56
Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Noisy labels from color
A NN is trained to colorize a video frame, given the color of the first frame of the
video sequence.
57
Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Noisy labels from color
Learned embeddings can be clustered to track objects.
Temporal + Multiview Weak Labels
Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain.
"Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain.
"Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
Temporal + Multiview Weak Labels
Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain.
"Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
61
Outline
1. Unsupervised Learning
2. Self-supervised Learning
a. Autoencoder
b. Temporal regularisations
c. Temporal verification
d. Frame Prediction
e. Miscellaneous: optical flow, color & multiview
62
Questions ?
63
Deep Learning courses @ UPC TelecomBCN:
● MSc course [2017] [2018]
● BSc course [2018] [2019]
● 1st edition (2016)
● 2nd edition (2017)
● 3rd edition (2018)
● 4th edition (2019)
● 1st edition (2017)
● 2nd edition (2018)
● 3rd edition - NLP (2019)
Next edition: Autumn 2019 Registration open for 2019Registration open for 2019
64
Deep Learning for Professionals @ UPC School
Next edition starts November 2019. Sign up here.

More Related Content

PPTX
Machine learning seminar ppt
RAHUL DANGWAL
 
PDF
An approach towards sotif with ansys medini analyze
Bernhard Kaiser
 
PDF
Computer Vision for autonomous driving
Bill Liu
 
PPT
Multisensor data fusion in object tracking applications
Sayed Abulhasan Quadri
 
PPTX
Image classification using convolutional neural network
KIRAN R
 
PPTX
Lstm
Mehrnaz Faraz
 
PPTX
Human Pose Estimation by Deep Learning
Wei Yang
 
PPTX
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
Machine learning seminar ppt
RAHUL DANGWAL
 
An approach towards sotif with ansys medini analyze
Bernhard Kaiser
 
Computer Vision for autonomous driving
Bill Liu
 
Multisensor data fusion in object tracking applications
Sayed Abulhasan Quadri
 
Image classification using convolutional neural network
KIRAN R
 
Human Pose Estimation by Deep Learning
Wei Yang
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 

What's hot (20)

PDF
MISRA C in an ISO 26262 context
AdaCore
 
PPTX
STPA Analysis of Automotive Safety Using Arcadia and Capella
David Hetherington
 
PDF
Autoware Architecture Proposal
Tier_IV
 
PPT
HCI 3e - Ch 20: Ubiquitous computing and augmented realities
Alan Dix
 
PPTX
Machine Learning & Predictive Maintenance
Arnab Biswas
 
PDF
Multimodal Deep Learning
Universitat Politècnica de Catalunya
 
PDF
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
PDF
Traffic Prediction for Intelligent Transportation System using Machine Learning
OmSuryawanshi9
 
PPTX
Machine learning ppt.
ASHOK KUMAR
 
PPTX
Classical Sets & fuzzy sets
Dr.Ashvini Chaudhari Bhongade
 
PDF
Final Project Report on Image processing based intelligent traffic control sy...
Louise Antonio
 
PPT
Fuzzy logic
Babu Appat
 
PPTX
Fuzzy logic lec 1
GAFAR ZEN ALABDEEN SALH
 
PPTX
Unit2 hci
pradeepgupta266
 
PPTX
Fuzzy Logic ppt
Ritu Bafna
 
PDF
Face Recognition Methods based on Convolutional Neural Networks
Elaheh Rashedi
 
PPT
Introduction to soft computing
Ankush Kumar
 
DOCX
Report (1)
Arun Kumar
 
PPTX
Deep Learning With Neural Networks
Aniket Maurya
 
PDF
Single Image Super Resolution Overview
LEE HOSEONG
 
MISRA C in an ISO 26262 context
AdaCore
 
STPA Analysis of Automotive Safety Using Arcadia and Capella
David Hetherington
 
Autoware Architecture Proposal
Tier_IV
 
HCI 3e - Ch 20: Ubiquitous computing and augmented realities
Alan Dix
 
Machine Learning & Predictive Maintenance
Arnab Biswas
 
Multimodal Deep Learning
Universitat Politècnica de Catalunya
 
Deep learning - Conceptual understanding and applications
Buhwan Jeong
 
Traffic Prediction for Intelligent Transportation System using Machine Learning
OmSuryawanshi9
 
Machine learning ppt.
ASHOK KUMAR
 
Classical Sets & fuzzy sets
Dr.Ashvini Chaudhari Bhongade
 
Final Project Report on Image processing based intelligent traffic control sy...
Louise Antonio
 
Fuzzy logic
Babu Appat
 
Fuzzy logic lec 1
GAFAR ZEN ALABDEEN SALH
 
Unit2 hci
pradeepgupta266
 
Fuzzy Logic ppt
Ritu Bafna
 
Face Recognition Methods based on Convolutional Neural Networks
Elaheh Rashedi
 
Introduction to soft computing
Ankush Kumar
 
Report (1)
Arun Kumar
 
Deep Learning With Neural Networks
Aniket Maurya
 
Single Image Super Resolution Overview
LEE HOSEONG
 
Ad

Similar to Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019 (20)

PDF
Deep Learning from Videos (UPC 2018)
Universitat Politècnica de Catalunya
 
PDF
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Universitat Politècnica de Catalunya
 
PDF
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
PDF
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
lec_11_self_supervised_learning.pdf
AlamgirAkash3
 
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
PPTX
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Luba Elliott
 
PDF
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Introduction talk to Computer Vision
Chen Sagiv
 
PDF
Dividing and Aggregating Network for Multi-view Action Recognition [Poster in...
Dongang (Sean) Wang
 
PDF
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Data Science Milan
 
PDF
598_WI2022_lecture22.pdf data analysis and data prediction
aitaghavi
 
Deep Learning from Videos (UPC 2018)
Universitat Politècnica de Catalunya
 
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Universitat Politècnica de Catalunya
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Deep Video Object Tracking 2020 - Xavier Giro - UPC TelecomBCN Barcelona
Universitat Politècnica de Catalunya
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
lec_11_self_supervised_learning.pdf
AlamgirAkash3
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Emily Denton - Unsupervised Learning of Disentangled Representations from Vid...
Luba Elliott
 
Deep Video Object Tracking - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Introduction talk to Computer Vision
Chen Sagiv
 
Dividing and Aggregating Network for Multi-view Action Recognition [Poster in...
Dongang (Sean) Wang
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Data Science Milan
 
598_WI2022_lecture22.pdf data analysis and data prediction
aitaghavi
 
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
PDF
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
PDF
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 
PDF
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Universitat Politècnica de Catalunya
 

Recently uploaded (20)

PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPTX
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PDF
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PDF
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PDF
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
PPT
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PPTX
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
PPTX
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Future_of_AI_Presentation for everyone.pptx
boranamanju07
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
TIC ACTIVIDAD 1geeeeeeeeeeeeeeeeeeeeeeeeeeeeeer3.pdf
Thais Ruiz
 
short term internship project on Data visualization
JMJCollegeComputerde
 
Research about a FoodFolio app for personalized dietary tracking and health o...
AustinLiamAndres
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
Practical Measurement Systems Analysis (Gage R&R) for design
Rob Schubert
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
Blue and Dark Blue Modern Technology Presentation.pptx
ap177979
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Real Life Application of Set theory, Relations and Functions
manavparmar205
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
lecture 13 mind test academy it skills.pptx
ggesjmrasoolpark
 
Pipeline Automatic Leak Detection for Water Distribution Systems
Sione Palu
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 

Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019

  • 1. @DocXavi Module 6 - Day 8 - Lecture 1 Self-supervised Learning from Video Sequences 28th March 2019 [http://pagines.uab.cat/mcv/] Xavier Giro-i-Nieto [email protected] Associate Professor Universitat Politècnica de Catalunya
  • 2. 2 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verifications d. Predictive Learning e. Miscellaneous: optical flow, color & multiview
  • 3. Types of machine learning Yann Lecun’s Black Forest cake 3 Slide credit: Yann LeCun
  • 6. 6 Types of machine learning We can categorize three types of learning procedures: 1. Supervised Learning: 𝐲 = ƒ(𝐱) 2. Unsupervised Learning: ƒ(𝐱) 3. Reinforcement Learning (RL): 𝐲 = ƒ(𝐱) 𝐳 Predict label y corresponding to observation x Estimate the distribution of observation x Predict action y based on observation x, to maximize a future reward z
  • 7. 7 Types of machine learning We can categorize three types of learning procedures: 1. Supervised Learning: 𝐲 = ƒ(𝐱) 2. Unsupervised Learning: ƒ(𝐱) 3. Reinforcement Learning (RL): 𝐲 = ƒ(𝐱) 𝐳
  • 8. 8 Why Unsupervised Learning ? ● It is the nature of how intelligent beings percept the world. ● It can save us tons of efforts to build a human-alike intelligent agent compared to a totally supervised fashion. ● Vast amounts of unlabelled data.
  • 9. 9 Assumptions for Unsupervised Learning Slide: Kevin McGuinness (DLCV UPC 2017) To model P(X) given data, it is necessary to make some assumptions “You can’t do inference without making assumptions” -- David MacKay, Information Theory, Inference, and Learning Algorithms
  • 10. 10 Assumptions for Unsupervised Learning Slide: Kevin McGuinness (DLCV UPC 2017) To model P(X) given data, it is necessary to make some assumptions “You can’t do inference without making assumptions” -- David MacKay, Information Theory, Inference, and Learning Algorithms Typical assumptions: ● Smoothness assumption ○ Points which are close to each other are more likely to share a label. ● Cluster assumption ○ The data form discrete clusters; points in the same cluster are likely to share a label ● Manifold assumption ○ The data lie approximately on a manifold of much lower dimension than the input space.
  • 11. 11 The manifold hypothesis Slide: Kevin McGuinness (DLCV UPC 2017) x1 x2 Linear manifold wT x + b x1 x2 Non-linear manifold
  • 12. 12 The manifold hypothesis Slide: Kevin McGuinness (DLCV UPC 2017) The data distribution lie close to a low-dimensional manifold Example: consider image data ● Very high dimensional (1,000,000D) ● A randomly generated image will almost certainly not look like any real world scene ○ The space of images that occur in nature is almost completely empty ● Hypothesis: real world images lie on a smooth, low-dimensional manifold ○ Manifold distance is a good measure of similarity Similar for audio and text
  • 13. 13 Video lectures on Unsupervised Learning Kevin McGuinness, UPC DLCV 2016 Xavier Giró, UPC DLAI 2017
  • 14. 14 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verifications d. Predictive Learning e. Miscellaneous: optical flow, color & multiview
  • 15. 15 Acknowledgements Víctor Campos Junting Pan Xunyu Lin Sebastian Palacio Carlos Arenas
  • 16. 16 Self-supervised learning Reference: Andrew Zisserman (PAISS 2018) Self-supervised learning is a form of unsupervised learning where the data provides the supervision. ● A surrogate task must be invented by withholding a part of the unlabeled data and training the NN to predict it. Unlabeled data (X)
  • 17. 17 Self-supervised learning Reference: Andrew Zisserman (PAISS 2018) Self-supervised learning is a form of unsupervised learning where the data provides the supervision. ● By defining a proxy loss, the NN learns representations, which should be valuable for the actually target task. ^ y loss Representations learned without labels
  • 18. 18 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verifications d. Predictive Learning e. Miscellaneous: optical flow, color & multiview
  • 19. 19 Autoencoder (AE) Fig: “Deep Learning Tutorial” Stanford Autoencoders: ● Predict at the output the same input data. ● Do not need labels.
  • 20. 20 Autoencoder (AE) Fig: “Deep Learning Tutorial” Stanford What is the use of an autoencoder ?
  • 21. 21 Autoencoder (AE) Fig: “Deep Learning Tutorial” Stanford Dimensionality reduction: Use the hidden layer as a feature extractor of any desired size.
  • 22. 22 Autoencoder (AE) Slide: Kevin McGuinness (DLCV UPC 2017) Encoder W1 Decoder W2 hdata reconstruction Loss (reconstruction error) Latent variables (representation/features) Pretraining: 1. Initialize a NN by solving an autoencoding problem.
  • 23. 23 Autoencoder (AE) Slide: Kevin McGuinness (DLCV UPC 2017) Latent variables (representation/features) Encoder W1 hdata Classifier WC prediction y Loss (cross entropy) Pretraining: 1. Initialize a NN solving an autoencoding problem. 2. Train for final task with “few” labels.
  • 24. 24 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verifications d. Predictive Learning e. Miscellaneous: optical flow, color & multiview
  • 25. 25 Temporal regularization: ISA Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011 Features are learned with a Independent Subspace Analysis (ISA). Uses convolution and pooling operations.
  • 26. 26Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011 Features are learned unsupervisedly by considering 3D (space+time) video blocks. Temporal regularization: ISA
  • 27. 27Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011 Feature visualizations. Temporal regularization: ISA
  • 28. 28 Assumption: adjacent video frames contain semantically similar information. Autoencoder trained with regularizations by slowliness and sparisty. Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. "Unsupervised learning of spatiotemporally coherent metrics." ICCV 2015. Temporal regularization: Slowliness
  • 29. 29Jayaraman, Dinesh, and Kristen Grauman. "Slow and steady feature analysis: higher order temporal coherence in video." CVPR 2016. [video] Slow feature analysis ● Temporal coherence assumption: features should change slowly over time in video Steady feature analysis ● Second order changes also small: changes in the past should resemble changes in the future Train on triplets of frames from video Loss encourages nearby frames to have slow and steady features, and far frames to have different features Temporal regularization: Slowliness
  • 30. 30 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verification d. Predictive Learning e. Miscellaneous: optical flow, color & multiview
  • 31. 31 Related work on still images Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." ICCV 2015. A surrogate task is defined by exploiting the spatial context.
  • 32. 32 Related work on still images Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." ICCV 2015. What video-specific surrogate tasks could you think about ?
  • 33. 33 Temporal coherence (Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] Temporal order of frames is exploited as the supervisory signal for learning.
  • 34. 34 Temporal coherence (Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] Take temporal order as the supervisory signals for learning Shuffled sequences Binary classification In order Not in order
  • 35. 35 Temporal coherence (Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code]
  • 36. 36 Temporal coherence (Slides by Xunyu Lin): Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code]
  • 37. 37 Temporal verification #Odd-one-out Fernando, Basura, Hakan Bilen, Efstratios Gavves, and Stephen Gould. "Self-supervised video representation learning with odd-one-out networks." ICCV 2017 Train a network to detect which of the video sequences contains frames in the wrong order.
  • 38. 38 Temporal coherence Lee, Hsin-Ying, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. "Unsupervised representation learning by sorting sequences." ICCV 2017. Sort the sequence of frames.
  • 39. 39 Temporal coherence #T-CAM Wei, Donglai, Joseph J. Lim, Andrew Zisserman, and William T. Freeman. "Learning and using the arrow of time." CVPR 2018. Predict whether the video moves forward or backward.
  • 40. 40 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verification d. Frame Prediction e. Miscellaneous: optical flow, color & multiview
  • 42. 42 Frame Prediction Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." In ICML 2015. [Github] Learning video representations (features) by...
  • 43. 43 (1) frame reconstruction (AE): Learning video representations (features) by... Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." ICML 2015. [Github] Frame Prediction
  • 44. 44 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." ICML 2015. [Github] Learning video representations (features) by... (2) frame prediction Frame Prediction
  • 45. 45 Unsupervised learned features (lots of data) are fine-tuned for activity recognition (small data). Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." ICML 2015. [Github] Frame Prediction
  • 46. 46 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] Video frame prediction with a ConvNet. Frame Prediction
  • 47. 47 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] The blurry predictions from MSE (l1) are improved with multi-scale architecture, adversarial training and an image gradient difference loss (GDL) function. Frame Prediction
  • 48. 48 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] Frame Prediction
  • 49. 49#DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017. The model learns to disentangle (“separate”) the visual features that correspond to the: Object Pose (wrt the camera) Object Content (class) Frame Prediction + Disentangled features
  • 50. 50 #DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017. #MCNet R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee. Decomposing motion and content for natural video sequence prediction. In ICLR, 2017 100 step video generation on KTH where green frames indicate conditioned input and red frames indicate generations. Generations from the MCNet of Villegas et al. (2017), are shown for comparison. Frame Prediction + Disentangled features
  • 51. 51 A CNN video architecture learns disentangles features for appearance & motion when trained for frame prediction. Frame Prediction + Disentangled features a_feat F(a_feat, m_feat) FC layer - Frame t pixel_loss - Frame t total_loss m_feat Temporal conv - [t-N, t] frames - w_size: N 2D Spatial deconv - Frame t 2D Spatial deconv - Frame t+K pixel_loss - Frame t+K 2D Spatial conv - [t-N, t] frames Input clip 20 frames DecoderEncoder - Backbone for the two streams Block gradients Sum of both pixel losses #DisNet Carlos Arenas, Victor Campos, Sebastian Palacio, Xavier Giro-i-Nieto, “Video Understanding through the Disentanglement of Appearance and Motion” MSc thesis, ETSETB TelecomBCN 2018.
  • 52. 52#DisNet Carlos Arenas, Victor Campos, Sebastian Palacio, Xavier Giro-i-Nieto, “Video Understanding through the Disentanglement of Appearance and Motion” MSc thesis, ETSETB TelecomBCN 2018. A synthetic dataset of moving MNIST digits was built to have access to virtually an infinite amount of data. Frame Prediction + Disentangled features ‐ Train: ○ 5000 clips: (0) Horizontal ○ 5000 clips: (3) Vertical ‐ Validation: ○ 500 clips: (3) Horizontal ○ 500 clips: (0) Vertical ‐ Bounding angle: 180º ‐ Speed: 8 pixels/frame ‐ Size: original scale 1:1 . . . . . . . . . 20 64 6 4
  • 53. 53 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verification d. Frame Prediction e. Miscellaneous: optical flow, color & multiview
  • 54. 54 Pathak, Deepak, Ross Girshick, Piotr Dollár, Trevor Darrell, and Bharath Hariharan. "Learning features by watching objects move." CVPR 2017 Noisy labels from motion (optical flow) Noisy labels can be built with optical flow computed with a handcrafted tool. NN somehow regularizes the noise present in the annotations
  • 55. 55 Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Noisy labels from color A NN is trained to colorize a video frame, given the color of the first frame of the video sequence. CNN
  • 56. 56 Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Noisy labels from color A NN is trained to colorize a video frame, given the color of the first frame of the video sequence.
  • 57. 57 Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Noisy labels from color Learned embeddings can be clustered to track objects.
  • 58. Temporal + Multiview Weak Labels Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
  • 59. Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018. Temporal + Multiview Weak Labels
  • 60. Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
  • 61. 61 Outline 1. Unsupervised Learning 2. Self-supervised Learning a. Autoencoder b. Temporal regularisations c. Temporal verification d. Frame Prediction e. Miscellaneous: optical flow, color & multiview
  • 63. 63 Deep Learning courses @ UPC TelecomBCN: ● MSc course [2017] [2018] ● BSc course [2018] [2019] ● 1st edition (2016) ● 2nd edition (2017) ● 3rd edition (2018) ● 4th edition (2019) ● 1st edition (2017) ● 2nd edition (2018) ● 3rd edition - NLP (2019) Next edition: Autumn 2019 Registration open for 2019Registration open for 2019
  • 64. 64 Deep Learning for Professionals @ UPC School Next edition starts November 2019. Sign up here.