PR-110: An Analysis of Scale Invariance in Object Detection – SNIP

0 likes905 views

The document summarizes a paper titled "An Analysis of Scale Invariance in Object Detection – SNIP" which proposes a technique called SNIP to address the challenges of scale variation in object detection. SNIP aims to normalize the scale of objects during training by cropping input images such that all objects fall within a predefined scale range. This helps reduce scale variation and domain shift from pre-trained classification models. The technique divides the scale space into three bins and crops images so that objects are resized to fall in the medium bin. This allows training detectors that are robust to scale without requiring more training samples.

visionNoob
(Jaewon Lee)
PR-110
An Analysis of Scale Invariance in Object Detection – SNIP
Singh, B., & Davis, L. S. CVPR’18
1
https://arxiv.org/abs/1711.08189

2
References for Object Detection
PR-002: Deformable Convolutional Networks (2017)
PR-012: Faster R-CNN : Towards Real-Time Object Detection with Region Proposal
PR-016: You only look once: Unified, real-time object detection
PR-023: YOLO9000: Better, Faster, Stronger
PR-033: PVANet: Lightweight Deep Neural Networks for Real-time Object Detection
PR-057: Mask R-CNN
PR-084: MegDet: A Large Mini-Batch Object Detector (CVPR2018)

3
MegDet: A Large Mini-Batch Object Detector(https://arxiv.org/abs/1711.07240)
Path Aggregation Network for Instance Segmentation (https://arxiv.org/abs/1803.01534)
Deformable ConvNets + Xception
Mask RCNN + Feature Pyramid Networks(FPN) + ResNeXt
Ensemble of multiple models using unlabeled data with multiple scales.
(today!) An Analysis of Scale Invariance in Object Detection – SNIP
MS COCO Results

What makes object detection harder
than image classification?
4

What makes object detection harder
than image classification?
5
http://cocodataset.org/
http://www.image-net.org/
MSCOCO
ImageNet
[20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural
networks. In Advances in neural information processing systems, pages 1097–1105, 2012.
[16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks.
In less than five years,
the top-5 error on ImageNet 15%[20] to 2%[16]
The mAP of the best performing detector [18] COCO
[25] is only 62% – even at 50% overlap.
# of classes in COCO = 80 # of classes in Image = 1000

6
Relative Scale =
𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝑂𝑏𝑗𝑒𝑐𝑡 )
𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝐼𝑚𝑎𝑔𝑒 )
MS COCO dataset has 
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network

7
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017.
Current Practices for Object Detection

Convolution Neural Networks for Classification
8Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural
networks." Advances in neural information processing systems. 2012.
Spatial resolution which contain high-level semantic feature is much lower
-> Make Object Detection harder

9
(PR-002)

10
Tutorial_ Deep Learning for Objects and Scenes

13
Current Practices for Object Detection
Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017.
High resolution models lead to significantly
better mAP result on small object
(note that typical resolution in coco is 640 x 480)

Are CNNs robust to up-sampling?
15

26
(pose, appearance, etc)

28
Pretrained classification network : 224 x 224
Original : 640 x 480
Inference : 1400 x 2000
AP small
objects

29
Reduce variation in scale without total number of training samples!

31
[0, 80]
[40, 160]
[120, ∞]

32
[0, 80]
[40, 160]
[120, ∞]

33
too small
small
medium
large
too large
Large data variation
Large scale variation
Out of Receptive field
Too low spatial resolution

medium
Normalize Scale
34
Large data variation
Small scale variation

35
MS COCO dataset has 
- Most small objects (Median 0.106)
- Large scale variation (20x)
- Large domain shift from pre-trained classification network
[0, 80]
[40, 160]
[120, ∞]

39
Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient Multi-Scale Training." arXiv preprint arXiv:1805.09300(2018).

Q&A
40

Ad

Recommended

PPTX

PR-146: CornerNet detecting objects as paired keypointsjaewon lee

PPT

Transferable GAN-generated Images Detection Framework.KIMMINHA3

PPT

[Seminar arxiv]fake face detection via adaptive residuals extraction network KIMMINHA3

PPT

[CVPRW 2020]Real world Super-Resolution via Kernel Estimation and Noise Injec...KIMMINHA3

PPT

“zero-shot” super-resolution using deep internal learning [CVPR2018]KIMMINHA3

PDF

Object Detection Beyond Mask R-CNN and RetinaNet IWanjin Yu

PDF

Feasibility of moment tensor inversion for a single-well microseismic data us...Oleg Ovcharenko

PDF

Transfer learning for low frequency extrapolation from shot gathers for FWI a...Oleg Ovcharenko

PDF

Computer vision for transportationWanjin Yu

PDF

Neural network-based low-frequency data extrapolationOleg Ovcharenko

PDF

Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu

PDF

Architecture Design for Deep Neural Networks IIIWanjin Yu

PDF

Cognitive Engine: Boosting Scientific Discoverydiannepatricia

PPTX

Surveillance scene classification using machine learningUtkarsh Contractor

PDF

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu

PDF

SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty

PDF

[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering

PDF

"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...Edge AI and Vision Alliance

PPTX

"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein

PDF

Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi

PDF

Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...Daniel George

PPTX

Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...EarthCube

PPTX

Tomoya Sato Master Thesispflab

PPTX

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

PPTX

EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum

PDF

Data-driven methods for the initialization of full-waveform inversionOleg Ovcharenko

PPTX

Coding the ContinuumIan Foster

PDF

Data-intensive IceCube Cloud BurstIgor Sfiligoi

PPTX

Object detection with deep learningSushant Shrivastava

PDF

ObjectDetectionUsingMachineLearningandNeuralNetworks.pdfSamira Akter Tumpa

More Related Content

What's hot (20)

PDF

Computer vision for transportationWanjin Yu

PDF

Neural network-based low-frequency data extrapolationOleg Ovcharenko

PDF

Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu

PDF

Architecture Design for Deep Neural Networks IIIWanjin Yu

PDF

Cognitive Engine: Boosting Scientific Discoverydiannepatricia

PPTX

Surveillance scene classification using machine learningUtkarsh Contractor

PDF

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu

PDF

SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty

PDF

[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering

PDF

"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...Edge AI and Vision Alliance

PPTX

"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein

PDF

Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi

PDF

Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...Daniel George

PPTX

Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...EarthCube

PPTX

Tomoya Sato Master Thesispflab

PPTX

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

PPTX

EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum

PDF

Data-driven methods for the initialization of full-waveform inversionOleg Ovcharenko

PPTX

Coding the ContinuumIan Foster

PDF

Data-intensive IceCube Cloud BurstIgor Sfiligoi

Computer vision for transportationWanjin Yu

Neural network-based low-frequency data extrapolationOleg Ovcharenko

Object Detection Beyond Mask R-CNN and RetinaNet IIWanjin Yu

Architecture Design for Deep Neural Networks IIIWanjin Yu

Cognitive Engine: Boosting Scientific Discoverydiannepatricia

Surveillance scene classification using machine learningUtkarsh Contractor

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu

SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty

[CVPR 2018] Utilizing unlabeled or noisy labeled data (classification, detect...NAVER Engineering

"A Fast Object Detector for ADAS using Deep Learning," a Presentation from Pa...Edge AI and Vision Alliance

"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein

Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi

Deep Learning for Hidden Signals - Enabling Real-time Multimessenger Astrophy...Daniel George

Novel Techniques & Connections Between High-Pressure Mineral Physics, Microto...EarthCube

Tomoya Sato Master Thesispflab

(Research Note) Delving deeper into convolutional neural networks for camera ...Jacky Liu

EDF2012 Peter Boncz - LOD benchmarking SRbenchEuropean Data Forum

Data-driven methods for the initialization of full-waveform inversionOleg Ovcharenko

Coding the ContinuumIan Foster

Data-intensive IceCube Cloud BurstIgor Sfiligoi

Similar to PR-110: An Analysis of Scale Invariance in Object Detection – SNIP (20)

PPTX

Object detection with deep learningSushant Shrivastava

PDF

ObjectDetectionUsingMachineLearningandNeuralNetworks.pdfSamira Akter Tumpa

PPTX

2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptxhimob78718

PDF

IRJET- Real-Time Object Detection using Deep Learning: A SurveyIRJET Journal

PDF

Modern convolutional object detectorsKwanghee Choi

PDF

Image Object Detection PipelineAbhinav Dadhich

PDF

Deep Learning for Computer Vision: Object Detection (UPC 2016)Universitat Politècnica de Catalunya

PPTX

seminar ppt.pptxVikulKumar16

PDF

D3L4-objects.pdfssusere945ae

PDF

Object Detection - Míriam Bellver - UPC Barcelona 2018Universitat Politècnica de Catalunya

PPTX

Recent Progress on Object Detection_20170331Jihong Kang

PDF

Partial Object Detection in Inclined Weather ConditionsIRJET Journal

PDF

Object Detetcion using SSD-MobileNetIRJET Journal

PDF

物件偵測與辨識技術CHENHuiMei

PDF

IRJET- Object Detection in an Image using Deep LearningIRJET Journal

PPTX

Object Detection with TensorflowElifTech

PPTX

odtslide-180529073940.pptxahmedchammam

PPTX

Object detection with Tensorflow ApiArwinKhan1

PDF

IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal

PDF

IRJET- Real-Time Object Detection System using Caffe ModelIRJET Journal

Object detection with deep learningSushant Shrivastava

ObjectDetectionUsingMachineLearningandNeuralNetworks.pdfSamira Akter Tumpa

2014 - CVPR Tutorial on Deep Learning for Vision - Object Detection.pptxhimob78718

IRJET- Real-Time Object Detection using Deep Learning: A SurveyIRJET Journal

Modern convolutional object detectorsKwanghee Choi

Image Object Detection PipelineAbhinav Dadhich

Deep Learning for Computer Vision: Object Detection (UPC 2016)Universitat Politècnica de Catalunya

seminar ppt.pptxVikulKumar16

D3L4-objects.pdfssusere945ae

Object Detection - Míriam Bellver - UPC Barcelona 2018Universitat Politècnica de Catalunya

Recent Progress on Object Detection_20170331Jihong Kang

Partial Object Detection in Inclined Weather ConditionsIRJET Journal

Object Detetcion using SSD-MobileNetIRJET Journal

物件偵測與辨識技術CHENHuiMei

IRJET- Object Detection in an Image using Deep LearningIRJET Journal

Object Detection with TensorflowElifTech

odtslide-180529073940.pptxahmedchammam

Object detection with Tensorflow ApiArwinKhan1

IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET Journal

IRJET- Real-Time Object Detection System using Caffe ModelIRJET Journal

Ad

More from jaewon lee (8)

PDF

PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wildjaewon lee

PDF

PR-199: SNIPER:Efficient Multi Scale Trainingjaewon lee

PPTX

PR 171: Large margin softmax loss for Convolutional Neural Networksjaewon lee

PDF

PR157: Best of both worlds: human-machine collaboration for object annotationjaewon lee

PPTX

PR-122: Can-Creative Adversarial Networksjaewon lee

PPTX

Rgb datajaewon lee

PPTX

Pytorch kr devconjaewon lee

PPTX

PR-134 How Does Batch Normalization Help Optimization?jaewon lee

PR-185: RetinaFace: Single-stage Dense Face Localisation in the Wildjaewon lee

PR-199: SNIPER:Efficient Multi Scale Trainingjaewon lee

PR 171: Large margin softmax loss for Convolutional Neural Networksjaewon lee

PR157: Best of both worlds: human-machine collaboration for object annotationjaewon lee

PR-122: Can-Creative Adversarial Networksjaewon lee

Rgb datajaewon lee

Pytorch kr devconjaewon lee

PR-134 How Does Batch Normalization Help Optimization?jaewon lee

Ad

Recently uploaded (20)

PPTX

MSP360 Backup Scheduling and Retention Best Practices.pptxMSP360

PPTX

Extensions Framework (XaaS) - Enabling Orchestrate AnythingShapeBlue

PDF

SWEBOK Guide and Software Services Engineering EducationHironori Washizaki

PDF

HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...mcastillo49

PDF

CIFDAQ Token Spotlight for 9th July 2025CIFDAQ

PPTX

Webinar: Introduction to LF Energy EVerestDanBrown980551

PDF

Meetup Kickoff & Welcome - Rohit Yadav, CSIUG ChairmanShapeBlue

PDF

Smart Air Quality Monitoring with Serrax AQM190 LITESERRAX TECHNOLOGIES LLP

PDF

Impact of IEEE Computer Society in Advancing Emerging Technologies including ...Hironori Washizaki

PDF

Why Orbit Edge Tech is a Top Next JS Development Company in 2025mahendraalaska08

PDF

CloudStack GPU Integration - Rohit YadavShapeBlue

PPTX

✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨SanjeetMishra29

PPTX

Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...Barts Health

PDF

Predicting the unpredictable: re-engineering recommendation algorithms for fr...Speck&Tech

PDF

The Builder’s Playbook - 2025 State of AI Report.pdfjeroen339954

PDF

TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...TrustArc

PDF

Human-centred design in online workplace learning and relationship to engagem...Tracy Tang

PDF

Apache CloudStack 201: Let's Design & Build an IaaS CloudShapeBlue

PPTX

Top Managed Service Providers in Los AngelesCaptain IT

PDF

Français Patch Tuesday - JuilletIvanti

MSP360 Backup Scheduling and Retention Best Practices.pptxMSP360

Extensions Framework (XaaS) - Enabling Orchestrate AnythingShapeBlue

SWEBOK Guide and Software Services Engineering EducationHironori Washizaki

HCIP-Data Center Facility Deployment V2.0 Training Material (Without Remarks ...mcastillo49

CIFDAQ Token Spotlight for 9th July 2025CIFDAQ

Webinar: Introduction to LF Energy EVerestDanBrown980551

Meetup Kickoff & Welcome - Rohit Yadav, CSIUG ChairmanShapeBlue

Smart Air Quality Monitoring with Serrax AQM190 LITESERRAX TECHNOLOGIES LLP

Impact of IEEE Computer Society in Advancing Emerging Technologies including ...Hironori Washizaki

Why Orbit Edge Tech is a Top Next JS Development Company in 2025mahendraalaska08

CloudStack GPU Integration - Rohit YadavShapeBlue

✨Unleashing Collaboration: Salesforce Channels & Community Power in Patna!✨SanjeetMishra29

Building a Production-Ready Barts Health Secure Data Environment Tooling, Acc...Barts Health

Predicting the unpredictable: re-engineering recommendation algorithms for fr...Speck&Tech

The Builder’s Playbook - 2025 State of AI Report.pdfjeroen339954

TrustArc Webinar - Data Privacy Trends 2025: Mid-Year Insights & Program Stra...TrustArc

Human-centred design in online workplace learning and relationship to engagem...Tracy Tang

Apache CloudStack 201: Let's Design & Build an IaaS CloudShapeBlue

Top Managed Service Providers in Los AngelesCaptain IT

Français Patch Tuesday - JuilletIvanti

PR-110: An Analysis of Scale Invariance in Object Detection – SNIP

1. visionNoob (Jaewon Lee) PR-110 An Analysis of Scale Invariance in Object Detection – SNIP Singh, B., & Davis, L. S. CVPR’18 1 https://arxiv.org/abs/1711.08189

2. 2 References for Object Detection PR-002: Deformable Convolutional Networks (2017) PR-012: Faster R-CNN : Towards Real-Time Object Detection with Region Proposal PR-016: You only look once: Unified, real-time object detection PR-023: YOLO9000: Better, Faster, Stronger PR-033: PVANet: Lightweight Deep Neural Networks for Real-time Object Detection PR-057: Mask R-CNN PR-084: MegDet: A Large Mini-Batch Object Detector (CVPR2018)

3. 3 MegDet: A Large Mini-Batch Object Detector(https://arxiv.org/abs/1711.07240) Path Aggregation Network for Instance Segmentation (https://arxiv.org/abs/1803.01534) Deformable ConvNets + Xception Mask RCNN + Feature Pyramid Networks(FPN) + ResNeXt Ensemble of multiple models using unlabeled data with multiple scales. (today!) An Analysis of Scale Invariance in Object Detection – SNIP MS COCO Results

4. What makes object detection harder than image classification? 4

5. What makes object detection harder than image classification? 5 http://cocodataset.org/ http://www.image-net.org/ MSCOCO ImageNet [20] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pages 1097–1105, 2012. [16] J. Hu, L. Shen, and G. Sun. Squeeze-and-excitation networks. In less than five years, the top-5 error on ImageNet 15%[20] to 2%[16] The mAP of the best performing detector [18] COCO [25] is only 62% – even at 50% overlap. # of classes in COCO = 80 # of classes in Image = 1000

6. 6 Relative Scale = 𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝑂𝑏𝑗𝑒𝑐𝑡 ) 𝑠𝑞𝑟𝑡(𝑎𝑟𝑒𝑎 𝐼𝑚𝑎𝑔𝑒 ) MS COCO dataset has  - Most small objects (Median 0.106) - Large scale variation (20x) - Large domain shift from pre-trained classification network

7. 7 Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017. Current Practices for Object Detection

8. Convolution Neural Networks for Classification 8Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012. Spatial resolution which contain high-level semantic feature is much lower -> Make Object Detection harder

10. 10 Tutorial_ Deep Learning for Objects and Scenes

13. 13 Current Practices for Object Detection Huang, Jonathan, et al. "Speed/accuracy trade-offs for modern convolutional object detectors." IEEE CVPR. Vol. 4. 2017. High resolution models lead to significantly better mAP result on small object (note that typical resolution in coco is 640 x 480)

15. Are CNNs robust to up-sampling? 15

26. 26 (pose, appearance, etc)

28. 28 Pretrained classification network : 224 x 224 Original : 640 x 480 Inference : 1400 x 2000 AP small objects

29. 29 Reduce variation in scale without total number of training samples!

31. 31 [0, 80] [40, 160] [120, ∞]

32. 32 [0, 80] [40, 160] [120, ∞]

33. 33 too small small medium large too large Large data variation Large scale variation Out of Receptive field Too low spatial resolution

34. medium Normalize Scale 34 Large data variation Small scale variation

35. 35 MS COCO dataset has  - Most small objects (Median 0.106) - Large scale variation (20x) - Large domain shift from pre-trained classification network [0, 80] [40, 160] [120, ∞]

39. 39 Singh, Bharat, Mahyar Najibi, and Larry S. Davis. "SNIPER: Efficient Multi-Scale Training." arXiv preprint arXiv:1805.09300(2018).