SlideShare a Scribd company logo
[course site]
Xavier Giro-i-Nieto
xavier.giro@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Image Classification
on ImageNet
#DLUPC
2
ImageNet Challenge
● 1,000 object classes
(categories).
● Images:
○ 1.2 M train
○ 100k test.
3
Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet
large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]
ImageNet Dataset
Slide credit:
Rob Fergus (NYU)
-9.8%
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web] 4
Based on SIFT + Fisher Vectors
ImageNet Challenge: 2012
AlexNet (Supervision)
5
Orange
A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” NIPS 2012
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
Slide credit:
Rob Fergus (NYU)
6
ImageNet Challenge: 2013
The development of better
convnets is reduced to
trial-and-error.
7
Zeiler-Fergus (ZF)
Visualization can help in
proposing better architectures.
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
“A convnet model that uses the same
components (filtering, pooling) but in
reverse, so instead of mapping pixels
to features does the opposite.”
Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision
(ICCV), 2011 IEEE International Conference on. IEEE, 2011.
8
Zeiler-Fergus (ZF)
Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer
International Publishing.
9
Zeiler-Fergus (ZF)
10
Regularization with more
dropout: introduced in the
input layer.
Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of
feature detectors. arXiv preprint arXiv:1207.0580.
Chicago
Zeiler-Fergus (ZF): Drop out
ImageNet Classification 2013
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv
preprint arXiv:1409.0575. [web]
-5%
11
ImageNet Challenge: 2013
12NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015)
ImageNet Challenge: 2014
13
ImageNet Challenge: 2014
GoogLeNet (Inception)
14Movie: Inception (2010)
15
22 layers !
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov,
Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions."
GoogLeNet (Inception)
16
GoogLeNet (Inception)
17
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
GoogLeNet (Inception)
18
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
Multiple
scales
GoogLeNet (Inception)
GoogLeNet (NiN)
19
3x3 and 5x5 convolutions deal
with different scales.
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
20
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
Dimensionality
reduction
GoogLeNet (Inception)
21
1x1 convolutions does dimensionality
reduction (c3<c2) and accounts for rectified
linear units (ReLU).
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
GoogLeNet (Inception)
22
In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the
expensive 3x3 and 5x5 convolutions.
GoogLeNet (Inception)
23
Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014.
GoogLeNet (Inception)
24
Two Softmax Classifiers at intermediate layers combat the vanishing gradient while
providing regularization at training time.
...and no fully connected layers needed
(12 times fewer parameters than AlexNet. !)
GoogLeNet (Inception)
25
Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent
Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster]
GoogLeNet (Inception)
E2E: Classification: VGG
26
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR 2015.
[video] [slides] [project]
E2E: Classification: VGG
27
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition."
International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG: 3x3 Stacks
28
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
E2E: Classification: VGG
29
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image
recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
● No poolings between some convolutional layers.
● Convolution strides of 1 (no skipping).
30
3.6% top 5 error…
with 152 layers !!
ImageNet Challenge: 2015
E2E: Classification: ResNet
31
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition."
CVPR 2016. [slides]
E2E: Classification: ResNet
32
● Deeper networks (34 is deeper than 18) are more difficult to train.
Thin curves: training error
Bold curves: validation error
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition."
CVPR 2016. [slides]
ResNet
33
● Residual learning: reformulate the layers as learning residual functions with
reference to the layer inputs, instead of learning unreferenced functions
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition."
CVPR 2016. [slides]
E2E: Classification: ResNet
34
He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition."
CVPR 2016. [slides]
35
Learn more
Li Fei-Fei, “How we’re teaching computers to understand
pictures” TEDTalks 2014.
Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet
large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]
36
The end of the challenge
http://image-net.org/challenges/beyond_ilsvrc
37
Thanks ! Q&A ?
Follow me at
https://imatge.upc.edu/web/people/xavier-giro
@DocXavi
/ProfessorXavi

More Related Content

What's hot (20)

PPTX
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
 
PPTX
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
PDF
Deep learning
Mohamed Loey
 
PDF
Convolutional neural network
Yan Xu
 
PDF
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
PDF
Introduction of Deep Learning
Myungjin Lee
 
PDF
210523 swin transformer v1.5
taeseon ryu
 
PPTX
Convolutional neural network from VGG to DenseNet
SungminYou
 
PDF
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
PDF
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 
PDF
Deep learning - A Visual Introduction
Lukas Masuch
 
PDF
NLP using transformers
Arvind Devaraj
 
PDF
Transformer Introduction (Seminar Material)
Yuta Niki
 
PDF
Deep Learning - Overview of my work II
Mohamed Loey
 
PPTX
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
PPTX
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
PPTX
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
PPTX
Introduction to Deep Learning
Oswald Campesato
 
PPTX
Transformers In Vision From Zero to Hero (DLI).pptx
Deep Learning Italia
 
PDF
Transfer Learning
Hichem Felouat
 
Transfer Learning and Fine Tuning for Cross Domain Image Classification with ...
Sujit Pal
 
Convolutional Neural Network (CNN) - image recognition
YUNG-KUEI CHEN
 
Deep learning
Mohamed Loey
 
Convolutional neural network
Yan Xu
 
TensorFlow and Keras: An Overview
Poo Kuan Hoong
 
Introduction of Deep Learning
Myungjin Lee
 
210523 swin transformer v1.5
taeseon ryu
 
Convolutional neural network from VGG to DenseNet
SungminYou
 
Convolutional Neural Networks (CNN)
Gaurav Mittal
 
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 
Deep learning - A Visual Introduction
Lukas Masuch
 
NLP using transformers
Arvind Devaraj
 
Transformer Introduction (Seminar Material)
Yuta Niki
 
Deep Learning - Overview of my work II
Mohamed Loey
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
Simplilearn
 
Convolutional Neural Network - CNN | How CNN Works | Deep Learning Course | S...
Simplilearn
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
UMBC
 
Introduction to Deep Learning
Oswald Campesato
 
Transformers In Vision From Zero to Hero (DLI).pptx
Deep Learning Italia
 
Transfer Learning
Hichem Felouat
 

Similar to Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vision) (20)

PDF
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Universitat Politècnica de Catalunya
 
PDF
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
PDF
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Universitat Politècnica de Catalunya
 
PPTX
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
himob78718
 
PDF
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Universitat Politècnica de Catalunya
 
PDF
End to-end convolutional network for saliency prediction
Universitat Politècnica de Catalunya
 
PDF
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Increasing immersiveness into a 3D virtual world - motion tracking and natura...
Mikhail Fominykh
 
PDF
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
LEE HOSEONG
 
PPTX
L7_finetuning on tamil technologies.pptx
Meganath7
 
PDF
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Universitat Politècnica de Catalunya
 
PDF
【CVPR 2020 メタサーベイ】Video Analysis and Understanding
cvpaper. challenge
 
PDF
Open-ended Visual Question-Answering
Universitat Politècnica de Catalunya
 
PDF
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
PDF
Resume_HaoZhang_Dec07
Hao Zhang
 
PDF
Adversarial Photo Frame: Concealing Sensitive Scene Information in a User-Acc...
multimediaeval
 
PDF
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Simone Ercoli
 
PDF
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Universitat Politècnica de Catalunya
 
Image Classification on ImageNet (D1L3 Insight@DCU Machine Learning Workshop ...
Universitat Politècnica de Catalunya
 
Neural Architectures for Still Images - Xavier Giro- UPC Barcelona 2019
Universitat Politècnica de Catalunya
 
Deep Learning for Computer Vision (1/4): Image Analytics @ laSalle 2016
Universitat Politècnica de Catalunya
 
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
himob78718
 
Deep convnets for global recognition (Master in Computer Vision Barcelona 2016)
Universitat Politècnica de Catalunya
 
End to-end convolutional network for saliency prediction
Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Universitat Politècnica de Catalunya
 
Increasing immersiveness into a 3D virtual world - motion tracking and natura...
Mikhail Fominykh
 
"The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Gen...
LEE HOSEONG
 
L7_finetuning on tamil technologies.pptx
Meganath7
 
Deep Convnets for Video Processing (Master in Computer Vision Barcelona, 2016)
Universitat Politècnica de Catalunya
 
【CVPR 2020 メタサーベイ】Video Analysis and Understanding
cvpaper. challenge
 
Open-ended Visual Question-Answering
Universitat Politècnica de Catalunya
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Universitat Politècnica de Catalunya
 
Resume_HaoZhang_Dec07
Hao Zhang
 
Adversarial Photo Frame: Concealing Sensitive Scene Information in a User-Acc...
multimediaeval
 
Deep Learning for Computer Vision: Saliency Prediction (UPC 2016)
Universitat Politècnica de Catalunya
 
Vision and Multimedia Reading Group: DeCAF: a Deep Convolutional Activation F...
Simone Ercoli
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Universitat Politècnica de Catalunya
 
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
PDF
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Ad

Recently uploaded (20)

PDF
Choosing the Right Database for Indexing.pdf
Tamanna
 
PDF
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
PDF
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
PPTX
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
PDF
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
PPTX
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
PDF
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
PPTX
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
PDF
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
PDF
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
PPTX
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
PDF
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
PDF
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
PPTX
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 
Choosing the Right Database for Indexing.pdf
Tamanna
 
WEF_Future_of_Global_Fintech_Second_Edition_2025.pdf
AproximacionAlFuturo
 
apidays Helsinki & North 2025 - API-Powered Journeys: Mobility in an API-Driv...
apidays
 
The _Operations_on_Functions_Addition subtruction Multiplication and Division...
mdregaspi24
 
Early_Diabetes_Detection_using_Machine_L.pdf
maria879693
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
Module-5-Measures-of-Central-Tendency-Grouped-Data-1.pptx
lacsonjhoma0407
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Munich 2025 - Building Telco-Aware Apps with Open Gateway APIs, Subhr...
apidays
 
Dr djdjjdsjsjsjsjsjsjjsjdjdjdjdjjd1.pptx
Nandy31
 
Context Engineering for AI Agents, approaches, memories.pdf
Tamanna
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
Web Scraping with Google Gemini 2.0 .pdf
Tamanna
 
Aict presentation on dpplppp sjdhfh.pptx
vabaso5932
 
Product Management in HealthTech (Case Studies from SnappDoctor)
Hamed Shams
 
What does good look like - CRAP Brighton 8 July 2025
Jan Kierzyk
 
ER_Model_with_Diagrams_Presentation.pptx
dharaadhvaryu1992
 
JavaScript - Good or Bad? Tips for Google Tag Manager
📊 Markus Baersch
 
Building Production-Ready AI Agents with LangGraph.pdf
Tamanna
 
GenAI-Introduction-to-Copilot-for-Bing-March-2025-FOR-HUB.pptx
cleydsonborges1
 

Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vision)

  • 1. [course site] Xavier Giro-i-Nieto [email protected] Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Image Classification on ImageNet #DLUPC
  • 2. 2 ImageNet Challenge ● 1,000 object classes (categories). ● Images: ○ 1.2 M train ○ 100k test.
  • 3. 3 Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web] ImageNet Dataset
  • 4. Slide credit: Rob Fergus (NYU) -9.8% Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2014). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] 4 Based on SIFT + Fisher Vectors ImageNet Challenge: 2012
  • 5. AlexNet (Supervision) 5 Orange A Krizhevsky, I Sutskever, GE Hinton “Imagenet classification with deep convolutional neural networks” NIPS 2012
  • 6. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] Slide credit: Rob Fergus (NYU) 6 ImageNet Challenge: 2013
  • 7. The development of better convnets is reduced to trial-and-error. 7 Zeiler-Fergus (ZF) Visualization can help in proposing better architectures. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing.
  • 8. “A convnet model that uses the same components (filtering, pooling) but in reverse, so instead of mapping pixels to features does the opposite.” Zeiler, Matthew D., Graham W. Taylor, and Rob Fergus. "Adaptive deconvolutional networks for mid and high level feature learning." Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011. 8 Zeiler-Fergus (ZF)
  • 9. Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In Computer Vision–ECCV 2014 (pp. 818-833). Springer International Publishing. 9 Zeiler-Fergus (ZF)
  • 10. 10 Regularization with more dropout: introduced in the input layer. Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. R. (2012). Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580. Chicago Zeiler-Fergus (ZF): Drop out
  • 11. ImageNet Classification 2013 Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... & Fei-Fei, L. (2015). Imagenet large scale visual recognition challenge. arXiv preprint arXiv:1409.0575. [web] -5% 11 ImageNet Challenge: 2013
  • 12. 12NVIDIA, “NVIDIA and IBM CLoud Support ImageNet Large Scale Visual Recognition Challenge” (2015) ImageNet Challenge: 2014
  • 15. 15 22 layers ! Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." GoogLeNet (Inception)
  • 17. 17 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. GoogLeNet (Inception)
  • 18. 18 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. Multiple scales GoogLeNet (Inception)
  • 19. GoogLeNet (NiN) 19 3x3 and 5x5 convolutions deal with different scales. Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides]
  • 20. 20 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. Dimensionality reduction GoogLeNet (Inception)
  • 21. 21 1x1 convolutions does dimensionality reduction (c3<c2) and accounts for rectified linear units (ReLU). Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. [Slides] GoogLeNet (Inception)
  • 22. 22 In GoogLeNet, the Cascaded 1x1 Convolutions compute reductions before the expensive 3x3 and 5x5 convolutions. GoogLeNet (Inception)
  • 23. 23 Lin, Min, Qiang Chen, and Shuicheng Yan. "Network in network." ICLR 2014. GoogLeNet (Inception)
  • 24. 24 Two Softmax Classifiers at intermediate layers combat the vanishing gradient while providing regularization at training time. ...and no fully connected layers needed (12 times fewer parameters than AlexNet. !) GoogLeNet (Inception)
  • 25. 25 Szegedy, Christian, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. "Going deeper with convolutions." CVPR 2015. [video] [slides] [poster] GoogLeNet (Inception)
  • 26. E2E: Classification: VGG 26 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." ICLR 2015. [video] [slides] [project]
  • 27. E2E: Classification: VGG 27 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 28. E2E: Classification: VGG: 3x3 Stacks 28 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project]
  • 29. E2E: Classification: VGG 29 Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." International Conference on Learning Representations (2015). [video] [slides] [project] ● No poolings between some convolutional layers. ● Convolution strides of 1 (no skipping).
  • 30. 30 3.6% top 5 error… with 152 layers !! ImageNet Challenge: 2015
  • 31. E2E: Classification: ResNet 31 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016. [slides]
  • 32. E2E: Classification: ResNet 32 ● Deeper networks (34 is deeper than 18) are more difficult to train. Thin curves: training error Bold curves: validation error He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016. [slides]
  • 33. ResNet 33 ● Residual learning: reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016. [slides]
  • 34. E2E: Classification: ResNet 34 He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." CVPR 2016. [slides]
  • 35. 35 Learn more Li Fei-Fei, “How we’re teaching computers to understand pictures” TEDTalks 2014. Russakovsky, Olga, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang et al. "Imagenet large scale visual recognition challenge." International Journal of Computer Vision 115, no. 3 (2015): 211-252. [web]
  • 36. 36 The end of the challenge http://image-net.org/challenges/beyond_ilsvrc
  • 37. 37 Thanks ! Q&A ? Follow me at https://imatge.upc.edu/web/people/xavier-giro @DocXavi /ProfessorXavi