SlideShare a Scribd company logo
Faster R-CNN
By; Deep Learning Team
Faster R-CNN(NIPS 2015)
Computer Vision Task
History(?) of R-CNN
• Rich feature hierarchies for accurate object detection and semantic segmentation(2013)
• Fast R-CNN(2015)
• Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(2015)
• Mask R-CNN(2017)
Is Faster R-CNN
Really Fast?
• Generally R-FCN and SSD
models are faster on
average while Faster R-
CNN models are more
accurate
• Faster R-CNN models can
be faster if we limit the
number of regions
proposed
R-CNN Architecture
R-CNN
Region Proposals – Selective Search
• Bottom-up segmentation, merging regions at multiple scales
Convert
regions to
boxes
R-CNN Training
• Pre-train a ConvNet(AlexNet) for ImageNet classification dataset
• Fine-tune for object detection(softmax + log loss)
• Cache feature vectors to disk
• Train post hoc linear SVMs(hinge loss)
• Train post hoc linear bounding-box regressors(squared loss)
“Post hoc” means the parameters are learned after
the ConvNet is fixed
Bounding-Box Regression
specifies the pixel coordinates of the center of proposal Pi’s
bounding box together with Pi’s width and height in pixels
means the ground-truth bounding box
Bounding-Box Regression
Problems of R-CNN
• Slow at test-time: need to run full forward path of CNN for
each region proposal
 13s/image on a GPU(K40)
 53s/image on a CPU
• SVM and regressors are post-hoc: CNN features not updated
in response to SVMs and regressors
• Complex multistage training pipeline (84 hours using K40
GPU)
 Fine-tune network with softmax classifier(log loss)
 Train post-hoc linear SVMs(hinge loss)
 Train post-hoc bounding-box regressions(squared loss)
Fast R-CNN
• Fix most of what’s wrong with R-CNN and SPP-net
• Train the detector in a single stage, end-to-end
 No caching features to disk
 No post hoc training steps
• Train all layers of the network
Fast R-CNN Architecture
Fast R-CNN
RoI Pooling
RoI Pooling
RoI in Conv feature map : 21x14  3x2 max pooling with stride(3, 2)  output : 7x7
RoI in Conv feature map : 35x42  5x6 max pooling with stride(5, 6)  output : 7x7
VGG-16
Training & Testing
1. Takes an input and a set of bounding boxes
2. Generate convolutional feature maps
3. For each bbox, get a fixed-length feature vector from RoI
pooling layer
4. Outputs have two information
 K+1 class labels
 Bounding box locations
• Loss function
R-CNN vs SPP-net vs Fast R-CNN
Runtime dominated by
region proposals!
Problems of Fast R-CNN
• Out-of-network region proposals are the test-time
computational bottleneck
• Is it fast enough??
Faster R-CNN(RPN + Fast R-CNN)
• Insert a Region Proposal
Network (RPN) after the last
convolutional layer  using GPU!
• RPN trained to produce region
proposals directly; no need for
external region proposals
• After RPN, use RoI Pooling and
an upstream classifier and bbox
regressor just like Fast R-CNN
Training Goal : Share Features
3 x 3
RPN
• Slide a small window on the
feature map
• Build a small network for
 Classifying object or not-object
 Regressing bbox locations
• Position of the sliding window
provides localization information
with reference to the image
• Box regression provides finer
localization information with
reference to this sliding window
ZF : 256-d, VGG : 512-d
RPN
• Use k anchor boxes at each
location
• Anchors are translation
invariant: use the same ones at
every location
• Regression gives offsets from
anchor boxes
• Classification gives the
probability that each
(regressed) anchor shows an
object
3 x 3
RPN(Fully Convolutional Network)
• Intermediate Layer – 256(or 512)
3x3 filter, stride 1, padding 1
• Cls layer – 18(9x2) 1x1 filter, stride
1, padding 0
• Reg layer – 36(9x4) 1x1 filter, stride
1, padding 0
ZF : 256-d, VGG : 512-d
Anchors as references
• Anchors: pre-defined reference boxes
• Multi-scale/size anchors:
 Multiple anchors are used at each position:
3 scale(128x128, 256x256, 512x512) and 3 aspect rations(2:1, 1:1, 1:2) yield 9
anchors
 Each anchor has its own prediction function
 Single-scale features, multi-scale predictions
Positive/Negative Samples
• An anchor is labeled as positive if
 The anchor is the one with highest IoU overlap with a ground-truth
box
 The anchor has an IoU overlap with a ground-truth box higher than
0.7
• Negative labels are assigned to anchors with IoU lower than
0.3 for all ground-truth boxes
• 50%/50% ratio of positive/negative anchors in a minibatch
RPN Loss Function
4-Step Alternating Training
Results
Experiments
Experiments
Experiments
Is It Enough?
• RoI Pooling has some quantization operations
• These quantizations introduce misalignments between the RoI
and the extracted features
• While this may not impact classification, it can make a
negative effect on predicting bbox
Mask R-CNN
Mask R-CNN
• Mask R-CNN extends Faster R-CNN by adding a branch for
predicting segmentation masks on each Region of Interest
(RoI), in parallel with the existing branch for classification and
bounding box regression
Loss Function, Mask Branch
• The mask branch has a K x m x m - dimensional output for
each RoI, which encodes K binary masks of resolution m × m,
one for each of the K classes.
• Applying per-pixel sigmoid
• For an RoI associated with ground-truth class k, Lmask is only
defined on the k-th mask
RoI Align
• RoI Align don’t use quantization of the RoI boundaries
• Bilinear interpolation is used for computing the exact values
of the input features
Results – MS COCO
ThankYou
Thank You

More Related Content

PDF
Faster R-CNN - PR012
Jinwon Lee
 
PDF
Auro tripathy - Localizing with CNNs
Auro Tripathy
 
PDF
object detection paper review
Yoonho Na
 
PPTX
Improving region based CNN object detector using bayesian optimization
Amgad Muhammad
 
PDF
Fast methods for deep learning based object detection
Brodmann17
 
PDF
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
PDF
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
PDF
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 
Faster R-CNN - PR012
Jinwon Lee
 
Auro tripathy - Localizing with CNNs
Auro Tripathy
 
object detection paper review
Yoonho Na
 
Improving region based CNN object detector using bayesian optimization
Amgad Muhammad
 
Fast methods for deep learning based object detection
Brodmann17
 
Faster R-CNN: Towards real-time object detection with region proposal network...
Universitat Politècnica de Catalunya
 
Mask-RCNN for Instance Segmentation
Dat Nguyen
 
“Understanding, Selecting and Optimizing Object Detectors for Edge Applicatio...
Edge AI and Vision Alliance
 

Similar to Week5-Faster R-CNN.pptx (20)

PPTX
Object Detection is a very powerful field.pptx
usmanyaseen16
 
PPTX
150807 Fast R-CNN
Junho Cho
 
PPTX
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
PDF
Comparative Study of Object Detection Algorithms
IRJET Journal
 
PPTX
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
PPTX
Faster rcnn
捷恩 蔡
 
PPTX
Fast rcnn
limHoJun
 
PPTX
Convolutional neural networks
Roozbeh Sanaei
 
PDF
D3L4-objects.pdf
ssusere945ae
 
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
PDF
Stadnford University practical presentation.pdf
horiamommand
 
PDF
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
PDF
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
Faster R-CNN
anna8885
 
PDF
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
PDF
Cvpr 2017 Summary Meetup
Amir Alush
 
PDF
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
PDF
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
PDF
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET Journal
 
PDF
Codetecon #KRK 3 - Object detection with Deep Learning
Matthew Opala
 
Object Detection is a very powerful field.pptx
usmanyaseen16
 
150807 Fast R-CNN
Junho Cho
 
Object detection - RCNNs vs Retinanet
Rishabh Indoria
 
Comparative Study of Object Detection Algorithms
IRJET Journal
 
Tutorial on Object Detection (Faster R-CNN)
Hwa Pyung Kim
 
Faster rcnn
捷恩 蔡
 
Fast rcnn
limHoJun
 
Convolutional neural networks
Roozbeh Sanaei
 
D3L4-objects.pdf
ssusere945ae
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
Stadnford University practical presentation.pdf
horiamommand
 
Object Detection Using R-CNN Deep Learning Framework
Nader Karimi
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Faster R-CNN
anna8885
 
Object Detection (D2L5 Insight@DCU Machine Learning Workshop 2017)
Universitat Politècnica de Catalunya
 
Cvpr 2017 Summary Meetup
Amir Alush
 
Brodmann17 CVPR 2017 review - meetup slides
Brodmann17
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
ynxm25hpxp
 
IRJET-Multiple Object Detection using Deep Neural Networks
IRJET Journal
 
Codetecon #KRK 3 - Object detection with Deep Learning
Matthew Opala
 
Ad

More from fahmi324663 (16)

PPTX
Week2-Design-Concepts for the good .pptx
fahmi324663
 
PPTX
Week 1 - 1st Meeting bahasa indonesia.pptx
fahmi324663
 
PPTX
Week-2 Communication-Networksxxxxxx.pptx
fahmi324663
 
PPTX
Intro to IS - Week 02 - Computer, Hardware, & Software.pptx
fahmi324663
 
PPTX
INTRODUCTION TO THE WORLD OF COMPUTERS #1.pptx
fahmi324663
 
PPTX
Week3- Face Identification with K-Nears Neighbour .pptx
fahmi324663
 
PPTX
Week3-Deep Neural Network (DNN).pptx
fahmi324663
 
PPTX
intelligentsystems-140424154432-phpapp01.pptx
fahmi324663
 
PPTX
Week2- Deep Learning Intuition.pptx
fahmi324663
 
PPTX
Week1- Introduction.pptx
fahmi324663
 
PPTX
WMP_MP02_revd(10092023).pptx
fahmi324663
 
PPTX
WMP_MP02_revd_03(10092023).pptx
fahmi324663
 
PPTX
DSS-1.pptx
fahmi324663
 
PPTX
si402_p02_konsep-arsitektur-enterprise.pptx
fahmi324663
 
PPTX
Pertemuan 12 Teorema Bayes Lanjutan.pptx
fahmi324663
 
PPTX
Pertemuan 4 Metode Forward Chaining.pptx
fahmi324663
 
Week2-Design-Concepts for the good .pptx
fahmi324663
 
Week 1 - 1st Meeting bahasa indonesia.pptx
fahmi324663
 
Week-2 Communication-Networksxxxxxx.pptx
fahmi324663
 
Intro to IS - Week 02 - Computer, Hardware, & Software.pptx
fahmi324663
 
INTRODUCTION TO THE WORLD OF COMPUTERS #1.pptx
fahmi324663
 
Week3- Face Identification with K-Nears Neighbour .pptx
fahmi324663
 
Week3-Deep Neural Network (DNN).pptx
fahmi324663
 
intelligentsystems-140424154432-phpapp01.pptx
fahmi324663
 
Week2- Deep Learning Intuition.pptx
fahmi324663
 
Week1- Introduction.pptx
fahmi324663
 
WMP_MP02_revd(10092023).pptx
fahmi324663
 
WMP_MP02_revd_03(10092023).pptx
fahmi324663
 
DSS-1.pptx
fahmi324663
 
si402_p02_konsep-arsitektur-enterprise.pptx
fahmi324663
 
Pertemuan 12 Teorema Bayes Lanjutan.pptx
fahmi324663
 
Pertemuan 4 Metode Forward Chaining.pptx
fahmi324663
 
Ad

Recently uploaded (20)

PPTX
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
PDF
Identification of unnecessary object allocations using static escape analysis
ESUG
 
PPTX
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
PPT
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
PPTX
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
PPTX
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
PPTX
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
PPTX
RED ROT DISEASE OF SUGARCANE.pptx
BikramjitDeuri
 
PPT
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
PDF
Sujay Rao Mandavilli Multi-barreled appraoch to educational reform FINAL FINA...
Sujay Rao Mandavilli
 
PPTX
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
PDF
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
ESUG
 
PDF
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
PPTX
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
PPTX
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
PPTX
Cell Structure and Organelles Slides PPT
JesusNeyra8
 
PPTX
Reticular formation_nuclei_afferent_efferent
muralinath2
 
PPTX
INTERNATIONAL CLASSIFICATION OF DISEASES ji.pptx
46JaybhayAshwiniHari
 
PPTX
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
PPT
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 
Q1_Science 8_Week4-Day 5.pptx science re
AizaRazonado
 
Identification of unnecessary object allocations using static escape analysis
ESUG
 
Unit 4 - Astronomy and Astrophysics - Milky Way And External Galaxies
RDhivya6
 
Grade_9_Science_Atomic_S_t_r_u_cture.ppt
QuintReynoldDoble
 
Hericium erinaceus, also known as lion's mane mushroom
TinaDadkhah1
 
Home Garden as a Component of Agroforestry system : A survey-based Study
AkhangshaRoy
 
fghvqwhfugqaifbiqufbiquvbfuqvfuqyvfqvfouiqvfq
PERMISONJERWIN
 
RED ROT DISEASE OF SUGARCANE.pptx
BikramjitDeuri
 
1a. Basic Principles of Medical Microbiology Part 2 [Autosaved].ppt
separatedwalk
 
Sujay Rao Mandavilli Multi-barreled appraoch to educational reform FINAL FINA...
Sujay Rao Mandavilli
 
Qualification of.UV visible spectrophotometer pptx
shrutipandit17
 
Evaluating Benchmark Quality: a Mutation-Testing- Based Methodology
ESUG
 
JADESreveals a large population of low mass black holes at high redshift
Sérgio Sacani
 
METABOLIC_SYNDROME Dr Shadab- kgmu lucknow pptx
ShadabAlam169087
 
INTRO-TO-CRIM-THEORIES-OF-CRIME-2023 (1).pptx
ChrisFlickIII
 
Cell Structure and Organelles Slides PPT
JesusNeyra8
 
Reticular formation_nuclei_afferent_efferent
muralinath2
 
INTERNATIONAL CLASSIFICATION OF DISEASES ji.pptx
46JaybhayAshwiniHari
 
Pharmacognosy: ppt :pdf :pharmacognosy :
Vishnukanchi darade
 
1. Basic Principles of Medical Microbiology Part 1.ppt
separatedwalk
 

Week5-Faster R-CNN.pptx

  • 1. Faster R-CNN By; Deep Learning Team
  • 4. History(?) of R-CNN • Rich feature hierarchies for accurate object detection and semantic segmentation(2013) • Fast R-CNN(2015) • Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(2015) • Mask R-CNN(2017)
  • 5. Is Faster R-CNN Really Fast? • Generally R-FCN and SSD models are faster on average while Faster R- CNN models are more accurate • Faster R-CNN models can be faster if we limit the number of regions proposed
  • 8. Region Proposals – Selective Search • Bottom-up segmentation, merging regions at multiple scales Convert regions to boxes
  • 9. R-CNN Training • Pre-train a ConvNet(AlexNet) for ImageNet classification dataset • Fine-tune for object detection(softmax + log loss) • Cache feature vectors to disk • Train post hoc linear SVMs(hinge loss) • Train post hoc linear bounding-box regressors(squared loss) “Post hoc” means the parameters are learned after the ConvNet is fixed
  • 10. Bounding-Box Regression specifies the pixel coordinates of the center of proposal Pi’s bounding box together with Pi’s width and height in pixels means the ground-truth bounding box
  • 12. Problems of R-CNN • Slow at test-time: need to run full forward path of CNN for each region proposal  13s/image on a GPU(K40)  53s/image on a CPU • SVM and regressors are post-hoc: CNN features not updated in response to SVMs and regressors • Complex multistage training pipeline (84 hours using K40 GPU)  Fine-tune network with softmax classifier(log loss)  Train post-hoc linear SVMs(hinge loss)  Train post-hoc bounding-box regressions(squared loss)
  • 13. Fast R-CNN • Fix most of what’s wrong with R-CNN and SPP-net • Train the detector in a single stage, end-to-end  No caching features to disk  No post hoc training steps • Train all layers of the network
  • 17. RoI Pooling RoI in Conv feature map : 21x14  3x2 max pooling with stride(3, 2)  output : 7x7 RoI in Conv feature map : 35x42  5x6 max pooling with stride(5, 6)  output : 7x7 VGG-16
  • 18. Training & Testing 1. Takes an input and a set of bounding boxes 2. Generate convolutional feature maps 3. For each bbox, get a fixed-length feature vector from RoI pooling layer 4. Outputs have two information  K+1 class labels  Bounding box locations • Loss function
  • 19. R-CNN vs SPP-net vs Fast R-CNN Runtime dominated by region proposals!
  • 20. Problems of Fast R-CNN • Out-of-network region proposals are the test-time computational bottleneck • Is it fast enough??
  • 21. Faster R-CNN(RPN + Fast R-CNN) • Insert a Region Proposal Network (RPN) after the last convolutional layer  using GPU! • RPN trained to produce region proposals directly; no need for external region proposals • After RPN, use RoI Pooling and an upstream classifier and bbox regressor just like Fast R-CNN
  • 22. Training Goal : Share Features
  • 23. 3 x 3 RPN • Slide a small window on the feature map • Build a small network for  Classifying object or not-object  Regressing bbox locations • Position of the sliding window provides localization information with reference to the image • Box regression provides finer localization information with reference to this sliding window ZF : 256-d, VGG : 512-d
  • 24. RPN • Use k anchor boxes at each location • Anchors are translation invariant: use the same ones at every location • Regression gives offsets from anchor boxes • Classification gives the probability that each (regressed) anchor shows an object
  • 25. 3 x 3 RPN(Fully Convolutional Network) • Intermediate Layer – 256(or 512) 3x3 filter, stride 1, padding 1 • Cls layer – 18(9x2) 1x1 filter, stride 1, padding 0 • Reg layer – 36(9x4) 1x1 filter, stride 1, padding 0 ZF : 256-d, VGG : 512-d
  • 26. Anchors as references • Anchors: pre-defined reference boxes • Multi-scale/size anchors:  Multiple anchors are used at each position: 3 scale(128x128, 256x256, 512x512) and 3 aspect rations(2:1, 1:1, 1:2) yield 9 anchors  Each anchor has its own prediction function  Single-scale features, multi-scale predictions
  • 27. Positive/Negative Samples • An anchor is labeled as positive if  The anchor is the one with highest IoU overlap with a ground-truth box  The anchor has an IoU overlap with a ground-truth box higher than 0.7 • Negative labels are assigned to anchors with IoU lower than 0.3 for all ground-truth boxes • 50%/50% ratio of positive/negative anchors in a minibatch
  • 34. Is It Enough? • RoI Pooling has some quantization operations • These quantizations introduce misalignments between the RoI and the extracted features • While this may not impact classification, it can make a negative effect on predicting bbox
  • 36. Mask R-CNN • Mask R-CNN extends Faster R-CNN by adding a branch for predicting segmentation masks on each Region of Interest (RoI), in parallel with the existing branch for classification and bounding box regression
  • 37. Loss Function, Mask Branch • The mask branch has a K x m x m - dimensional output for each RoI, which encodes K binary masks of resolution m × m, one for each of the K classes. • Applying per-pixel sigmoid • For an RoI associated with ground-truth class k, Lmask is only defined on the k-th mask
  • 38. RoI Align • RoI Align don’t use quantization of the RoI boundaries • Bilinear interpolation is used for computing the exact values of the input features