SlideShare a Scribd company logo
visionNoob
(Jaewon Lee)
PR-110
An Analysis of Scale Invariance in Object Detection – SNIP
Singh, B., & Davis, L. S. CVPR’18
1
https://arxiv.org/abs/1711.08189
2
References for Object Detection
PR-002: Deformable Convolutional Networks (2017)
PR-012: Faster R-CNN : Towards Real-Time Object Detection with Region Proposal
PR-016: You only look once: Unified, real-time object detection
PR-023: YOLO9000: Better, Faster, Stronger
PR-033: PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
PR-057: Mask R-CNN
PR-084: MegDet: A Large Mini-Batch Object Detector (CVPR2018)
3
MegDet: A Large Mini-Batch Object Detector(https://arxiv.org/abs/1711.07240)
Path Aggregation Network for Instance Segmentation (https://arxiv.org/abs/1803.01534)
Deformable ConvNets + Xception
Mask RCNN + Feature Pyramid Networks(FPN) + ResNeXt
Ensemble of multiple models using unlabeled data with multiple scales.
(today!) An Analysis of Scale Invariance in Object Detection – SNIP
MS COCO Results
What makes object detection harder
than image classification?
4
What makes object detection harder
than image classification?
5
http://cocodataset.org/
http://www.image-net.org/
MSCOCO
ImageNet
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks.
In less than five years,
the top-5 error on ImageNet 15%[20] to 2%[16]
The mAP of the best performing detector [18] COCO
[25] is only 62% – even at 50% overlap.
# of classes in COCO = 80 # of classes in Image = 1000
6
Relative Scale =
𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝑂𝑏𝑗𝑒𝑐𝑡 )
𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝐼𝑚𝑎𝑔𝑒 )
MS COCO dataset has 
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network
7
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017.
Current Practices for Object Detection
Convolution Neural Networks for Classification
8Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems. 2012.
Spatial resolution which contain high-level semantic feature is much lower
-> Make Object Detection harder
9
(PR-002)
10
Tutorial_ Deep Learning for Objects and Scenes
11
12
13
Current Practices for Object Detection
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017.
High resolution models lead to significantly
better mAP result on small object
(note that typical resolution in coco is 640 x 480)
14
Are CNNs robust to up-sampling?
15
16
17
18
19
20
21
22
23
24
25
26
(pose, appearance, etc)
27
28
Pretrained classification network : 224 x 224
Original : 640 x 480
Inference : 1400 x 2000
AP small
objects
29
Reduce variation in scale without total number of training samples!
30
31
[0, 80]
[40, 160]
[120, ∞]
32
[0, 80]
[40, 160]
[120, ∞]
33
too small
small
medium
large
too large
Large data variation
Large scale variation
Out of Receptive field
Too low spatial resolution
medium
Normalize Scale
34
Large data variation
Small scale variation
35
MS COCO dataset has 
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network
[0, 80]
[40, 160]
[120, ∞]
36
37
38
39
Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient Multi-Scale Training." arXiv preprint arXiv:1805.09300(2018).
Q&A
40

More Related Content

What's hot (20)

PDF
Computer vision for transportation
Wanjin Yu
 
PDF
Neural network-based low-frequency data extrapolation
Oleg Ovcharenko
 
PDF
Object Detection Beyond Mask R-CNN and RetinaNet II
Wanjin Yu
 
PDF
Architecture Design for Deep Neural Networks III
Wanjin Yu
 
PDF
Cognitive Engine: Boosting Scientific Discovery
diannepatricia
 
PPTX
Surveillance scene classification using machine learning
Utkarsh Contractor
 
PDF
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Wanjin Yu
 
PDF
SkyhookDM - Towards an Arrow-Native Storage System
JayjeetChakraborty
 
PDF
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
NAVER Engineering
 
PDF
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
Edge AI and Vision Alliance
 
PPTX
"Building and running the cloud GPU vacuum cleaner"
Frank Wuerthwein
 
PDF
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Igor Sfiligoi
 
PDF
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Daniel George
 
PPTX
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
EarthCube
 
PPTX
Tomoya Sato Master Thesis
pflab
 
PPTX
(Research Note) Delving deeper into convolutional neural networks for camera ...
Jacky Liu
 
PPTX
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 
PDF
Data-driven methods for the initialization of full-waveform inversion
Oleg Ovcharenko
 
PPTX
Coding the Continuum
Ian Foster
 
PDF
Data-intensive IceCube Cloud Burst
Igor Sfiligoi
 
Computer vision for transportation
Wanjin Yu
 
Neural network-based low-frequency data extrapolation
Oleg Ovcharenko
 
Object Detection Beyond Mask R-CNN and RetinaNet II
Wanjin Yu
 
Architecture Design for Deep Neural Networks III
Wanjin Yu
 
Cognitive Engine: Boosting Scientific Discovery
diannepatricia
 
Surveillance scene classification using machine learning
Utkarsh Contractor
 
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Wanjin Yu
 
SkyhookDM - Towards an Arrow-Native Storage System
JayjeetChakraborty
 
[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...
NAVER Engineering
 
"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...
Edge AI and Vision Alliance
 
"Building and running the cloud GPU vacuum cleaner"
Frank Wuerthwein
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Igor Sfiligoi
 
Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...
Daniel George
 
Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...
EarthCube
 
Tomoya Sato Master Thesis
pflab
 
(Research Note) Delving deeper into convolutional neural networks for camera ...
Jacky Liu
 
EDF2012 Peter Boncz - LOD benchmarking SRbench
European Data Forum
 
Data-driven methods for the initialization of full-waveform inversion
Oleg Ovcharenko
 
Coding the Continuum
Ian Foster
 
Data-intensive IceCube Cloud Burst
Igor Sfiligoi
 

Similar to PR-110: An Analysis of Scale Invariance in Object Detection – SNIP (20)

PPTX
Object detection with deep learning
Sushant Shrivastava
 
PDF
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Samira Akter Tumpa
 
PPTX
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
himob78718
 
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
PDF
Modern convolutional object detectors
Kwanghee Choi
 
PDF
Image Object Detection Pipeline
Abhinav Dadhich
 
PDF
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
PPTX
seminar ppt.pptx
VikulKumar16
 
PDF
D3L4-objects.pdf
ssusere945ae
 
PDF
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
PPTX
Recent Progress on Object Detection_20170331
Jihong Kang
 
PDF
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
PDF
Object Detetcion using SSD-MobileNet
IRJET Journal
 
PDF
物件偵測與辨識技術
CHENHuiMei
 
PDF
IRJET- Object Detection in an Image using Deep Learning
IRJET Journal
 
PPTX
Object Detection with Tensorflow
ElifTech
 
PPTX
odtslide-180529073940.pptx
ahmedchammam
 
PPTX
Object detection with Tensorflow Api
ArwinKhan1
 
PDF
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
PDF
IRJET- Real-Time Object Detection System using Caffe Model
IRJET Journal
 
Object detection with deep learning
Sushant Shrivastava
 
ObjectDetectionUsingMachineLearningandNeuralNetworks.pdf
Samira Akter Tumpa
 
2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptx
himob78718
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
Modern convolutional object detectors
Kwanghee Choi
 
Image Object Detection Pipeline
Abhinav Dadhich
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Universitat Politècnica de Catalunya
 
seminar ppt.pptx
VikulKumar16
 
D3L4-objects.pdf
ssusere945ae
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Universitat Politècnica de Catalunya
 
Recent Progress on Object Detection_20170331
Jihong Kang
 
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
Object Detetcion using SSD-MobileNet
IRJET Journal
 
物件偵測與辨識技術
CHENHuiMei
 
IRJET- Object Detection in an Image using Deep Learning
IRJET Journal
 
Object Detection with Tensorflow
ElifTech
 
odtslide-180529073940.pptx
ahmedchammam
 
Object detection with Tensorflow Api
ArwinKhan1
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
IRJET- Real-Time Object Detection System using Caffe Model
IRJET Journal
 
Ad

More from jaewon lee (8)

PDF
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
jaewon lee
 
PDF
PR-199: SNIPER:Efficient Multi Scale Training
jaewon lee
 
PPTX
PR 171: Large margin softmax loss for Convolutional Neural Networks
jaewon lee
 
PDF
PR157: Best of both worlds: human-machine collaboration for object annotation
jaewon lee
 
PPTX
PR-122: Can-Creative Adversarial Networks
jaewon lee
 
PPTX
Rgb data
jaewon lee
 
PPTX
Pytorch kr devcon
jaewon lee
 
PPTX
PR-134 How Does Batch Normalization Help Optimization?
jaewon lee
 
PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wild
jaewon lee
 
PR-199: SNIPER:Efficient Multi Scale Training
jaewon lee
 
PR 171: Large margin softmax loss for Convolutional Neural Networks
jaewon lee
 
PR157: Best of both worlds: human-machine collaboration for object annotation
jaewon lee
 
PR-122: Can-Creative Adversarial Networks
jaewon lee
 
Rgb data
jaewon lee
 
Pytorch kr devcon
jaewon lee
 
PR-134 How Does Batch Normalization Help Optimization?
jaewon lee
 
Ad

Recently uploaded (20)

PPTX
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
PPTX
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
PDF
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
PDF
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
PDF
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
PPTX
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
PDF
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
PDF
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
PDF
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
PDF
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
PDF
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
PPTX
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
PPTX
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
PDF
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
PDF
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
PDF
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
PDF
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
PDF
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
PPTX
Top Managed Service Providers in Los Angeles
Captain IT
 
PDF
Français Patch Tuesday - Juillet
Ivanti
 
MSP360 Backup Scheduling and Retention Best Practices.pptx
MSP360
 
Extensions Framework (XaaS) - Enabling Orchestrate Anything
ShapeBlue
 
SWEBOK Guide and Software Services Engineering Education
Hironori Washizaki
 
HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...
mcastillo49
 
CIFDAQ Token Spotlight for 9th July 2025
CIFDAQ
 
Webinar: Introduction to LF Energy EVerest
DanBrown980551
 
Meetup Kickoff & Welcome - Rohit Yadav, CSIUG Chairman
ShapeBlue
 
Smart Air Quality Monitoring with Serrax AQM190 LITE
SERRAX TECHNOLOGIES LLP
 
Impact of IEEE Computer Society in Advancing Emerging Technologies including ...
Hironori Washizaki
 
Why Orbit Edge Tech is a Top Next JS Development Company in 2025
mahendraalaska08
 
CloudStack GPU Integration - Rohit Yadav
ShapeBlue
 
✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨
SanjeetMishra29
 
Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...
Barts Health
 
Predicting the unpredictable: re-engineering recommendation algorithms for fr...
Speck&Tech
 
The Builder’s Playbook - 2025 State of AI Report.pdf
jeroen339954
 
TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...
TrustArc
 
Human-centred design in online workplace learning and relationship to engagem...
Tracy Tang
 
Apache CloudStack 201: Let's Design & Build an IaaS Cloud
ShapeBlue
 
Top Managed Service Providers in Los Angeles
Captain IT
 
Français Patch Tuesday - Juillet
Ivanti
 

PR-110: An Analysis of Scale Invariance in Object Detection – SNIP

  • 1. visionNoob (Jaewon Lee) PR-110 An Analysis of Scale Invariance in Object Detection – SNIP Singh, B., & Davis, L. S. CVPR’18 1 https://arxiv.org/abs/1711.08189
  • 2. 2 References for Object Detection PR-002: Deformable Convolutional Networks (2017) PR-012: Faster R-CNN : Towards Real-Time Object Detection with Region Proposal PR-016: You only look once: Unified, real-time object detection PR-023: YOLO9000: Better, Faster, Stronger PR-033: PVANet: Lightweight Deep Neural Networks for Real-time Object Detection PR-057: Mask R-CNN PR-084: MegDet: A Large Mini-Batch Object Detector (CVPR2018)
  • 3. 3 MegDet: A Large Mini-Batch Object Detector(https://arxiv.org/abs/1711.07240) Path Aggregation Network for Instance Segmentation (https://arxiv.org/abs/1803.01534) Deformable ConvNets + Xception Mask RCNN + Feature Pyramid Networks(FPN) + ResNeXt Ensemble of multiple models using unlabeled data with multiple scales. (today!) An Analysis of Scale Invariance in Object Detection – SNIP MS COCO Results
  • 4. What makes object detection harder than image classification? 4
  • 5. What makes object detection harder than image classification? 5 http://cocodataset.org/ http://www.image-net.org/ MSCOCO ImageNet [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In less than five years, the top-5 error on ImageNet 15%[20] to 2%[16] The mAP of the best performing detector [18] COCO [25] is only 62% – even at 50% overlap. # of classes in COCO = 80 # of classes in Image = 1000
  • 6. 6 Relative Scale = 𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝑂𝑏𝑗𝑒𝑐𝑡 ) 𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝐼𝑚𝑎𝑔𝑒 ) MS COCO dataset has  - Most small objects (Median 0.106) - Large scale variation (20x) - Large domain shift from pre-trained classification network
  • 7. 7 Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017. Current Practices for Object Detection
  • 8. Convolution Neural Networks for Classification 8Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. Spatial resolution which contain high-level semantic feature is much lower -> Make Object Detection harder
  • 10. 10 Tutorial_ Deep Learning for Objects and Scenes
  • 11. 11
  • 12. 12
  • 13. 13 Current Practices for Object Detection Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017. High resolution models lead to significantly better mAP result on small object (note that typical resolution in coco is 640 x 480)
  • 14. 14
  • 15. Are CNNs robust to up-sampling? 15
  • 16. 16
  • 17. 17
  • 18. 18
  • 19. 19
  • 20. 20
  • 21. 21
  • 22. 22
  • 23. 23
  • 24. 24
  • 25. 25
  • 27. 27
  • 28. 28 Pretrained classification network : 224 x 224 Original : 640 x 480 Inference : 1400 x 2000 AP small objects
  • 29. 29 Reduce variation in scale without total number of training samples!
  • 30. 30
  • 33. 33 too small small medium large too large Large data variation Large scale variation Out of Receptive field Too low spatial resolution
  • 34. medium Normalize Scale 34 Large data variation Small scale variation
  • 35. 35 MS COCO dataset has  - Most small objects (Median 0.106) - Large scale variation (20x) - Large domain shift from pre-trained classification network [0, 80] [40, 160] [120, ∞]
  • 36. 36
  • 37. 37
  • 38. 38
  • 39. 39 Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient Multi-Scale Training." arXiv preprint arXiv:1805.09300(2018).