SlideShare a Scribd company logo
[course site]
Object Detection
Day 3 Lecture 4
Amaia Salvador
amaia.salvador@upc.edu
Slide Credit: Xavier Giró
Images (global) Objects (local)
Deep ConvNets for Recognition for...
Video (2D+T)
2
Object Detection
CAT, DOG, DUCK
The task of assigning a
label and a bounding box
to all objects in the image
3
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
4
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
5
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? YES
Dog ? NO
Duck? NO
6
Object Detection as Classification
Classes = [cat, dog, duck]
Cat ? NO
Dog ? NO
Duck? NO
7
Object Detection as Classification
Problem:
Too many positions & scales to test
Solution: If your classifier is fast enough, go for it
8
HOG
Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. CVPR 2005 9
Deformable Part Model
Felzenszwalb et al, Object Detection with Discriminatively Trained Part Based Models, PAMI 2010
10
Object Detection with CNNs?
CNN classifiers are computationally demanding. We can’t test all positions & scales !
Solution: Look at a tiny subset of positions. Choose them wisely :)
11
Region Proposals
● Find “blobby” image regions that are likely to contain objects
● “Class-agnostic” object detector
● Look for “blob-like” regions
Slide Credit: CS231n 12
Region Proposals
Selective Search (SS) Multiscale Combinatorial Grouping (MCG)
[SS] Uijlings et al. Selective search for object recognition. IJCV 2013
[MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 13
Object Detection with CNNs: R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
14
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
1. Train network on proposals
2. Post-hoc training of SVMs & Box regressors on fc7 features
15
R-CNN
Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014
16
R-CNN: Problems
1. Slow at test-time: need to run full forward pass of
CNN for each region proposal
2. SVMs and regressors are post-hoc: CNN features
not updated in response to SVMs and regressors
3. Complex multistage training pipeline
Slide Credit: CS231n 17
Fast R-CNN
Girshick Fast R-CNN. ICCV 2015
Solution: Share computation of convolutional layers between region proposals for an image
R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal
18
Fast R-CNN
Hi-res input image:
3 x 800 x 600
with region
proposal
Convolution
and Pooling
Hi-res conv features:
C x H x W
with region proposal
Fully-connected
layers
Max-pool within
each grid cell
RoI conv features:
C x h x w
for region proposal
Fully-connected layers expect
low-res conv features:
C x h x w
Slide Credit: CS231n 19Girshick Fast R-CNN. ICCV 2015
Fast R-CNN
Solution: Train it all at together E2E
R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training.
20Girshick Fast R-CNN. ICCV 2015
Fast R-CNN
Slide Credit: CS231n
R-CNN Fast R-CNN
Training Time: 84 hours 9.5 hours
(Speedup) 1x 8.8x
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
mAP (VOC 2007) 66.0 66.9
Using VGG-16 CNN on Pascal VOC 2007 dataset
Faster!
FASTER!
Better!
21
Fast R-CNN: Problem
Slide Credit: CS231n
R-CNN Fast R-CNN
Test time per image 47 seconds 0.32 seconds
(Speedup) 1x 146x
Test time per image
with Selective Search
50 seconds 2 seconds
(Speedup) 1x 25x
Test-time speeds don’t include region proposals
22
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
23
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Faster R-CNN
Conv
layers
Region Proposal Network
FC6
Class probabilities
FC7
FC8
RPN Proposals
RoI
Pooling
Conv5_3
RPN Proposals
Fast R-CNN
24
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Region Proposal Network
Objectness scores
(object/no object)
Bounding Box Regression
In practice, k = 9 (3 different scales and 3 aspect ratios)
25
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
Faster R-CNN
Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
R-CNN Fast R-CNN Faster R-CNN
Test time per
image
(with proposals)
50 seconds 2 seconds 0.2 seconds
(Speedup) 1x 25x 250x
mAP (VOC 2007) 66.0 66.9 66.9
Slide Credit: CS231n 26
Faster R-CNN
27
● Faster R-CNN is the basis of the winners of COCO and
ILSVRC 2015 object detection competitions.
He et al. Deep residual learning for image recognition. arXiv 2015
YOLO: You Only Look Once
Slide Credit: CS231n
Divide image into S x S grid
Within each grid cell predict:
B Boxes: 4 coordinates + confidence
Class scores: C numbers
Regression from image to
7 x 7 x (5 * B + C) tensor
Direct prediction using a CNN
Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 28
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, arXiv 2015 29
SSD: Single Shot MultiBox Detector
Liu et al. SSD: Single Shot MultiBox Detector, arXiv 2015
System VOC2007 test mAP FPS (Titan X) Number of Boxes
Faster R-CNN (VGG16) 73.2 7 300
Faster R-CNN (ZF) 62.1 17 300
YOLO 63.4 45 98
Fast YOLO 52.7 155 98
SSD300 (VGG) 72.1 58 7308
SSD300 (VGG, cuDNN v5) 72.1 72 7308
SSD500 (VGG16) 75.1 23 20097
30
Training with Pascal VOC 07+12
Resources
● Related Lecture from CS231n @ Stanford [slides][video]
● Caffe Code for:
○ R-CNN
○ Fast R-CNN
○ Faster R-CNN [matlab][python]
● YOLO
○ Original (Darknet)
○ Tensorflow
○ Keras
● SSD (Caffe)
31

More Related Content

What's hot (20)

PDF
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
PPTX
Deep learning for object detection
Wenjing Chen
 
PDF
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
PDF
Faster R-CNN - PR012
Jinwon Lee
 
PDF
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Hyeongmin Lee
 
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
PPTX
Introduction to object detection
Amar Jindal
 
PDF
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
PDF
CIFAR-10
satyam_madala
 
PDF
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
PPTX
Yolo
Bang Tsui Liou
 
PPT
Action Recognition (Thesis presentation)
nikhilus85
 
PPTX
Object detection with deep learning
Sushant Shrivastava
 
PDF
Deep learning based object detection basics
Brodmann17
 
PPTX
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
PPTX
Deep neural networks
Si Haem
 
PPTX
Image classification using CNN
Noura Hussein
 
PPTX
Object detection
Jksuryawanshi
 
PDF
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 
PDF
R-CNN
Mohamed Rashid
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
Taegyun Jeon
 
Deep learning for object detection
Wenjing Chen
 
Machine Learning: Introduction to Neural Networks
Francesco Collova'
 
Faster R-CNN - PR012
Jinwon Lee
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
Hyeongmin Lee
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Introduction to object detection
Amar Jindal
 
YOLOv4: optimal speed and accuracy of object detection review
LEE HOSEONG
 
CIFAR-10
satyam_madala
 
PR-132: SSD: Single Shot MultiBox Detector
Jinwon Lee
 
Action Recognition (Thesis presentation)
nikhilus85
 
Object detection with deep learning
Sushant Shrivastava
 
Deep learning based object detection basics
Brodmann17
 
Deep Learning Tutorial | Deep Learning Tutorial For Beginners | What Is Deep ...
Simplilearn
 
Deep neural networks
Si Haem
 
Image classification using CNN
Noura Hussein
 
Object detection
Jksuryawanshi
 
Convolutional Neural Network Models - Deep Learning
Mohamed Loey
 

Similar to Deep Learning for Computer Vision: Object Detection (UPC 2016) (20)

PDF
D3L4-objects.pdf
ssusere945ae
 
PDF
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PDF
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
PDF
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
ag1729
 
PPTX
Object Detection is a very powerful field.pptx
usmanyaseen16
 
PDF
Fast methods for deep learning based object detection
Brodmann17
 
PPTX
Improving region based CNN object detector using bayesian optimization
Amgad Muhammad
 
PDF
Comparative Study of Object Detection Algorithms
IRJET Journal
 
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
PDF
object detection paper review
Yoonho Na
 
PDF
Modern convolutional object detectors
Kwanghee Choi
 
PDF
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
PPTX
ObjRecog2-17 (1).pptx
ssuserc074dd
 
PDF
Convolutional Features for Instance Search
Universitat Politècnica de Catalunya
 
PDF
Cvpr 2017 Summary Meetup
Amir Alush
 
PDF
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
PDF
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
dbpublications
 
PDF
Object Single Frame Using YOLO Model
IRJET Journal
 
PDF
物件偵測與辨識技術
CHENHuiMei
 
D3L4-objects.pdf
ssusere945ae
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
rcnn.pdfmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmm
ag1729
 
Object Detection is a very powerful field.pptx
usmanyaseen16
 
Fast methods for deep learning based object detection
Brodmann17
 
Improving region based CNN object detector using bayesian optimization
Amgad Muhammad
 
Comparative Study of Object Detection Algorithms
IRJET Journal
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
object detection paper review
Yoonho Na
 
Modern convolutional object detectors
Kwanghee Choi
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
ObjRecog2-17 (1).pptx
ssuserc074dd
 
Convolutional Features for Instance Search
Universitat Politècnica de Catalunya
 
Cvpr 2017 Summary Meetup
Amir Alush
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
Recognition and Detection of Real-Time Objects Using Unified Network of Faste...
dbpublications
 
Object Single Frame Using YOLO Model
IRJET Journal
 
物件偵測與辨識技術
CHENHuiMei
 
Ad

More from Universitat Politècnica de Catalunya (20)

PDF
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
PDF
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
PDF
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
PDF
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
PDF
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
PDF
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
PDF
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
PPTX
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
PPTX
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
PDF
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
PDF
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
PDF
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
PDF
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
PDF
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
PDF
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
PDF
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
PDF
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
PDF
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Universitat Politècnica de Catalunya
 
Deep Generative Learning for All
Universitat Politècnica de Catalunya
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Universitat Politècnica de Catalunya
 
The Transformer - Xavier Giró - UPC Barcelona 2021
Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Universitat Politècnica de Catalunya
 
Open challenges in sign language translation and production
Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Universitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Universitat Politècnica de Catalunya
 
Intepretability / Explainable AI for Deep Neural Networks
Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Universitat Politècnica de Catalunya
 
Curriculum Learning for Recurrent Video Object Segmentation
Universitat Politècnica de Catalunya
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Universitat Politècnica de Catalunya
 
Ad

Recently uploaded (20)

PPTX
Powerful Uses of Data Analytics You Should Know
subhashenia
 
PDF
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
PDF
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
PPTX
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
PDF
Research Methodology Overview Introduction
ayeshagul29594
 
PDF
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
PDF
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
PPTX
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
PPTX
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
PPTX
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
PPTX
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
PPTX
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
PPTX
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
PPTX
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
PPTX
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
PDF
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
PDF
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
PDF
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
PPT
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
PDF
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 
Powerful Uses of Data Analytics You Should Know
subhashenia
 
apidays Singapore 2025 - From API Intelligence to API Governance by Harsha Ch...
apidays
 
apidays Singapore 2025 - The API Playbook for AI by Shin Wee Chuang (PAND AI)
apidays
 
03_Ariane BERCKMOES_Ethias.pptx_AIBarometer_release_event
FinTech Belgium
 
Research Methodology Overview Introduction
ayeshagul29594
 
Driving Employee Engagement in a Hybrid World.pdf
Mia scott
 
Optimizing Large Language Models with vLLM and Related Tools.pdf
Tamanna36
 
SlideEgg_501298-Agentic AI.pptx agentic ai
530BYManoj
 
apidays Singapore 2025 - From Data to Insights: Building AI-Powered Data APIs...
apidays
 
apidays Helsinki & North 2025 - APIs at Scale: Designing for Alignment, Trust...
apidays
 
apidays Helsinki & North 2025 - From Chaos to Clarity: Designing (AI-Ready) A...
apidays
 
b6057ea5-8e8c-4415-90c0-ed8e9666ffcd.pptx
Anees487379
 
thid ppt defines the ich guridlens and gives the information about the ICH gu...
shaistabegum14
 
apidays Singapore 2025 - The Quest for the Greenest LLM , Jean Philippe Ehre...
apidays
 
Feb 2021 Ransomware Recovery presentation.pptx
enginsayin1
 
apidays Singapore 2025 - Surviving an interconnected world with API governanc...
apidays
 
Business implication of Artificial Intelligence.pdf
VishalChugh12
 
apidays Singapore 2025 - How APIs can make - or break - trust in your AI by S...
apidays
 
Growth of Public Expendituuure_55423.ppt
NavyaDeora
 
A GraphRAG approach for Energy Efficiency Q&A
Marco Brambilla
 

Deep Learning for Computer Vision: Object Detection (UPC 2016)

  • 2. Slide Credit: Xavier Giró Images (global) Objects (local) Deep ConvNets for Recognition for... Video (2D+T) 2
  • 3. Object Detection CAT, DOG, DUCK The task of assigning a label and a bounding box to all objects in the image 3
  • 4. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 4
  • 5. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 5
  • 6. Object Detection as Classification Classes = [cat, dog, duck] Cat ? YES Dog ? NO Duck? NO 6
  • 7. Object Detection as Classification Classes = [cat, dog, duck] Cat ? NO Dog ? NO Duck? NO 7
  • 8. Object Detection as Classification Problem: Too many positions & scales to test Solution: If your classifier is fast enough, go for it 8
  • 9. HOG Dalal and Triggs. Histograms of Oriented Gradients for Human Detection. CVPR 2005 9
  • 10. Deformable Part Model Felzenszwalb et al, Object Detection with Discriminatively Trained Part Based Models, PAMI 2010 10
  • 11. Object Detection with CNNs? CNN classifiers are computationally demanding. We can’t test all positions & scales ! Solution: Look at a tiny subset of positions. Choose them wisely :) 11
  • 12. Region Proposals ● Find “blobby” image regions that are likely to contain objects ● “Class-agnostic” object detector ● Look for “blob-like” regions Slide Credit: CS231n 12
  • 13. Region Proposals Selective Search (SS) Multiscale Combinatorial Grouping (MCG) [SS] Uijlings et al. Selective search for object recognition. IJCV 2013 [MCG] Arbeláez, Pont-Tuset et al. Multiscale combinatorial grouping. CVPR 2014 13
  • 14. Object Detection with CNNs: R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 14
  • 15. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 1. Train network on proposals 2. Post-hoc training of SVMs & Box regressors on fc7 features 15
  • 16. R-CNN Girshick et al. Rich feature hierarchies for accurate object detection and semantic segmentation. CVPR 2014 16
  • 17. R-CNN: Problems 1. Slow at test-time: need to run full forward pass of CNN for each region proposal 2. SVMs and regressors are post-hoc: CNN features not updated in response to SVMs and regressors 3. Complex multistage training pipeline Slide Credit: CS231n 17
  • 18. Fast R-CNN Girshick Fast R-CNN. ICCV 2015 Solution: Share computation of convolutional layers between region proposals for an image R-CNN Problem #1: Slow at test-time: need to run full forward pass of CNN for each region proposal 18
  • 19. Fast R-CNN Hi-res input image: 3 x 800 x 600 with region proposal Convolution and Pooling Hi-res conv features: C x H x W with region proposal Fully-connected layers Max-pool within each grid cell RoI conv features: C x h x w for region proposal Fully-connected layers expect low-res conv features: C x h x w Slide Credit: CS231n 19Girshick Fast R-CNN. ICCV 2015
  • 20. Fast R-CNN Solution: Train it all at together E2E R-CNN Problem #2&3: SVMs and regressors are post-hoc. Complex training. 20Girshick Fast R-CNN. ICCV 2015
  • 21. Fast R-CNN Slide Credit: CS231n R-CNN Fast R-CNN Training Time: 84 hours 9.5 hours (Speedup) 1x 8.8x Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x mAP (VOC 2007) 66.0 66.9 Using VGG-16 CNN on Pascal VOC 2007 dataset Faster! FASTER! Better! 21
  • 22. Fast R-CNN: Problem Slide Credit: CS231n R-CNN Fast R-CNN Test time per image 47 seconds 0.32 seconds (Speedup) 1x 146x Test time per image with Selective Search 50 seconds 2 seconds (Speedup) 1x 25x Test-time speeds don’t include region proposals 22
  • 23. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals 23 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
  • 24. Faster R-CNN Conv layers Region Proposal Network FC6 Class probabilities FC7 FC8 RPN Proposals RoI Pooling Conv5_3 RPN Proposals Fast R-CNN 24 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
  • 25. Region Proposal Network Objectness scores (object/no object) Bounding Box Regression In practice, k = 9 (3 different scales and 3 aspect ratios) 25 Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015
  • 26. Faster R-CNN Ren et al. Faster R-CNN: Towards real-time object detection with region proposal networks. NIPS 2015 R-CNN Fast R-CNN Faster R-CNN Test time per image (with proposals) 50 seconds 2 seconds 0.2 seconds (Speedup) 1x 25x 250x mAP (VOC 2007) 66.0 66.9 66.9 Slide Credit: CS231n 26
  • 27. Faster R-CNN 27 ● Faster R-CNN is the basis of the winners of COCO and ILSVRC 2015 object detection competitions. He et al. Deep residual learning for image recognition. arXiv 2015
  • 28. YOLO: You Only Look Once Slide Credit: CS231n Divide image into S x S grid Within each grid cell predict: B Boxes: 4 coordinates + confidence Class scores: C numbers Regression from image to 7 x 7 x (5 * B + C) tensor Direct prediction using a CNN Redmon et al. You Only Look Once: Unified, Real-Time Object Detection, CVPR 2016 28
  • 29. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, arXiv 2015 29
  • 30. SSD: Single Shot MultiBox Detector Liu et al. SSD: Single Shot MultiBox Detector, arXiv 2015 System VOC2007 test mAP FPS (Titan X) Number of Boxes Faster R-CNN (VGG16) 73.2 7 300 Faster R-CNN (ZF) 62.1 17 300 YOLO 63.4 45 98 Fast YOLO 52.7 155 98 SSD300 (VGG) 72.1 58 7308 SSD300 (VGG, cuDNN v5) 72.1 72 7308 SSD500 (VGG16) 75.1 23 20097 30 Training with Pascal VOC 07+12
  • 31. Resources ● Related Lecture from CS231n @ Stanford [slides][video] ● Caffe Code for: ○ R-CNN ○ Fast R-CNN ○ Faster R-CNN [matlab][python] ● YOLO ○ Original (Darknet) ○ Tensorflow ○ Keras ● SSD (Caffe) 31