SlideShare a Scribd company logo
Mobility Technologies Co., Ltd.
Tackling Open Images Challenge
- presented at the 26th Symposium on Sensing via Image
Information
June 12, 2020
Hiroto Honda, Mobility Technologies Co., Ltd.
Mobility Technologies Co., Ltd.2
1 About Me
Mobility Technologies Co., Ltd.3
About Me
Hiroto Honda
https://hirotomusiker.github.io/
kaggle name : Schwert
‘Schwert’ = sword in German
R&D of Imaging devices in a Japanese Electronics company
→ DeNA computer vision team →Mobility Technologies
 
Mobility Technologies Co., Ltd.4
Check out my Blog Series!
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
Digging into Detectron 2 (object detection)
Mobility Technologies Co., Ltd.5
2 Kaggle and Open Images Challenge
Mobility Technologies Co., Ltd.
Val Data
6
How to Try Kaggle
Test data
→private leaderboard
→public leaderboard
Train Data
How can you maximize your
model’s score on the HIDDEN
test data?
Evaluation metrics are described in the ‘Evaluation’ section - mean
average precision、Dice Coefficient, and so on. Sometimes non-standard
metrics are employed and discussed in the ‘Discussion’ threads.
Cross Validation and Test data
Val Data
Train Data
Val Data
Train Data
Mobility Technologies Co., Ltd.7
Open Images Dataset (v5) :
900 million images collected from Flickr
・16M Bounding box annotations of 600 classes on 1.9M images
・Segmentation polygons on 350-class instances
・329 inter-object relationship
Open Images Challenge
https://storage.googleapis.com/openimages/web/challenge.html
https://www.kaggle.com/c/open-images-2019-object-detection/
Mobility Technologies Co., Ltd.8
1GB of bounding box data!! (on 500GB of image data)
How Huge is Open Images Dataset ?
Mobility Technologies Co., Ltd.9
3 How to Tackle Object Detection
Challenges
Mobility Technologies Co., Ltd.10
Object Detection
- detects object positions, sizes and classes from an image
- tremendous success of deep-learning-based approaches
(e.g. Faster R-CNN, YOLO, and EfficientDet)
Mobility Technologies Co., Ltd.11
NOT RECOMMENDED!
Okay, Why Not Code Object Detectors
Mobility Technologies Co., Ltd.12
What an Object Detector Looks Like
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
Mobility Technologies Co., Ltd.13
Backbone Network
Region Proposal
Network
ROI Head
accuracy written in papers is achieved by managing
more than 100 config parameters
https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
What an Object Detector Looks Like
Mobility Technologies Co., Ltd.14
How It Was Hard to Reproduce YOLOv3 in PyTorch
took months to perfectly reproduce the original repo’s accuracy.
implementation details such as weight init, loss definition, and lr schedule are
critical
https://github.com/DeNA/PyTorch_YOLOv3
blog: https://medium.com/@hirotoschwert/reproducing-training-performance-of-yolov3-in-pytorch-part-0-a792e15ac90d
Mobility Technologies Co., Ltd.15
You Should Care Tiny Accuracy Differences
Model Name AP
A: Faster R-CNN Res50 34.8
B: Faster R-CNN Res50 +
Feature Pyramid Network
36.7
C: RetinaNet (single-shot)
Res50 Feature Pyramid
Network + Focal Loss
35.7
NIPS’15
CVPR’17
ICCV’17
model B from a non-official repo with AP=33.0 is less accurate than
the official model A
Mobility Technologies Co., Ltd.16
MMDetection (CUHK) 
https://github.com/open-mmlab/mmdetection
Detectron 2 (Facebook)
https://github.com/facebookresearch/detectron2
automl/efficientdet (Google)
https://github.com/google/automl/tree/master/efficientdet
tpu/models (Google)
https://github.com/tensorflow/tpu/tree/master/models/official
R. Wightman repos (tf->pytorch, non-official)
https://github.com/rwightman
Popular and Reliable Detection Frameworks
Authors’ official repos are basically recommended
Schwert used
maskrcnn-benchmark for the
competition
Mobility Technologies Co., Ltd.
17
takes 1 GPU month to train one model!
How to Choose Approaches for Large-scale Detection Competition
1month
one attempt is so costly...
Mobility Technologies Co., Ltd.18
1:Last Year’s solutions
2:Detection papers (CVPR, ICCV…)
3:Benchmark website such as papers with code
are good resources to find:
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
How to Choose Approaches for Large-scale Detection Competition
Mobility Technologies Co., Ltd.19
Looks like ResNet50 works..
OK, let’s try ResNeXt101
...and why not adding Random Cropping_
Example of Bad Experiment
model 1 (baseline)
new
feature
A
new
feature
B
model 2
Important to add / remove one exclusive feature at a time!
Mobility Technologies Co., Ltd.20
4 Schwert’s Solution
Mobility Technologies Co., Ltd.21
Schwert’s ranks:
Detection Track: 6th / 558 (Gold) [1] [2]
Segmentation Track: 11th / 193 (Silver) [3]
Relationship Track: 30th / 201 (Silver)
Results of Open Images Competition (2019)
# Team Name # of
members
score
1 MMfruit 5 0.65887
2 imagesearch 7 0.65337
3 Prisms 6 0.64214
4 PFDet 6 0.62221
5 Omni-Detection 3 0.60406
6 Schwert 1 (solo) 0.60231
7 Team 5 5 0.60210
8 pudae 1 (solo) 0.59727
Got a solo gold medal at the first kaggle competition!
Mobility Technologies Co., Ltd.22
“An Exclusive Feature that Apparently Contributes to the score” (EFAC)
EFAC examples from the solution writeups of Open Images 2018 [4][5][6]
・class balancing (3rd、5pts↑)
・Ensemble (1st / 3rd、5pts↑)
・voting NMS (1st / 3rd)
・long cosine annealing (2nd)
・parent class expansion
・ResNext 152 + SE (1st, 2nd, 3rd)
class balancing and model ensemble are essential
Mobility Technologies Co., Ltd.23
mean Average Precision (mAP) at IoU > 0.5 , avg of 500 classes
1: EVERY class is equal, even if it’s extremely rare.
      images including ‘person’ instances:250,000
       ‘torch’ instances : 18
2: Strict localization is not required.
classification matters...
Evaluation Metrics
Mobility Technologies Co., Ltd.24
Method 1:Class Balancing [1]
- Equal probability for a model to encounter a certain class.
- Rare classes: increase sampling rate.
- Non-rare classes: limit number of images.
- Total number of images: 4k x 500 (2M) → efficient training
Mobility Technologies Co., Ltd.25
Method 2 : Ensembling Pipeline of Multiple Models [1]
・Baseliene model: ResNeXt152 [7] + Deformable Convnets v2 [8] + Feature
Pyramid Network [9]
・Train different types of models on training data with different seeds
・8 models are ensembled
Mobility Technologies Co., Ltd.26
Contribution of each exclusive feature on val and leaderboard accuracies
Ablation Study
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt101 None Inference Time 4k per class 69.8 54.0
ResNeXt101 DCN v2 Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 4k per class 72.2 (+2.4)
ResNeXt152 None Inference Time 16k per class 72.4 (+2.6)
ResNeXt152 DCN v2 Inference Time 4k per class 73.2 (+3.4) 56.4 (best
single model)
ResNeXt152 None Training Time 4k per class 72.4 (+2.6)*
Mobility Technologies Co., Ltd.27
Method 3:Enhanced (Voting) NMS [6]
Non-Maximum Suppression for Model Ensembling
When the multiple boxes from different models are overlapped, the
resulting box earns added confidence scores
Mobility Technologies Co., Ltd.28
Result of 8 Model Ensembling
Backbone Deformable
Convolutions
Parent
Expansion
Data Size val AP private LB
ResNeXt152 DCN v2 Inference
Time
4k per class 73.2 (+3.4) 56.4 (best
single
model)
Ensemble of
8 models +
NMS tuned
60.23
~13th
place
6th
place!
Mobility Technologies Co., Ltd.29
Visualization Demo of the Best Single Model
Mobility Technologies Co., Ltd.30
Visualization Demo of the Best Single Model
Mobility Technologies Co., Ltd.31
Independently train detection and segmentation
Schwert’s Approach on Segmentation Track (11th Place) [2]
Inference results using detection model
Mobility Technologies Co., Ltd.32
5 Take-Home Messages
Mobility Technologies Co., Ltd.33
・Kaggle is a wonderful platform where you can learn cutting-edge computer vision
methods and implementations. Discussion with great kagglers is always fun
・Like research, it’s a tough but fun job to develop (or surpass) the state-of-the-art method
methods
・Choosing a reliable framework is a must for Object Detection competitions
・Understand the past solutions and pick an Exclusive Feature that Apparently Contributes to
the score (EFAC)
Take-Home Messages
Mobility Technologies Co., Ltd.34
[1] Hiroto Honda, “The 6th Place Solution for the Open Images 2019 Object Detection Track, ”
presented at ICCVW 2019, https://hirotomusiker.github.io/files/schwert_open_images_6th_solution_v1.pdf
[2] Hiroto Honda, “6th place solution” , discussion in Open Images 2019 Object Detection Track,
https://www.kaggle.com/c/open-images-2019-object-detection/discussion/110953
[3] Hiroto Honda, “11th place solution, discussion in Open Images 2019 Instance Segmentation Track,
https://www.kaggle.com/c/open-images-2019-instance-segmentation/discussion/111351
[4] kivajok, 1st place writeup, https://storage.googleapis.com/openimages/web/challenge.html
[5] Takuya Akiba et al., “PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection
Track”, arXiv:1809.00778
[6] Yuan Gao et al., “Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete
Annotation and Data Imbalance”, arXiv:1810.06208
[7] Saining Xie et al., “Aggregated Residual Transformations for Deep Neural Networks,” CVPR 2017
[8] Xizhou Zhu et al., “Deformable ConvNets v2: More Deformable, Better Results”, CVPR 2019
[9] Tsung-Yi Lin et al., “Feature Pyramid Networks for Object Detection”, CVPR 2017
* All the photos used in this presentation were taken by Hiroto Honda
References
文章·画像等の内容の無断転載及び複製等の行為はご遠慮ください。
Mobility Technologies Co., Ltd.
35

More Related Content

PDF
データサイエンティストの仕事とデータ分析コンテスト
Ken'ichi Matsui
 
PDF
Action Recognitionの歴史と最新動向
Ohnishi Katsunori
 
PDF
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
Deep Learning JP
 
PDF
深度學習在AOI的應用
CHENHuiMei
 
PDF
Visual geometry with deep learning
NAVER Engineering
 
PDF
Transformer 動向調査 in 画像認識(修正版)
Kazuki Maeno
 
PPTX
Mmsys slideshare-intel-nokia
Rufael Mekuria
 
PDF
使用人工智慧檢測三維錫球瑕疵_台大傅楸善
CHENHuiMei
 
データサイエンティストの仕事とデータ分析コンテスト
Ken'ichi Matsui
 
Action Recognitionの歴史と最新動向
Ohnishi Katsunori
 
【DL輪読会】Unpaired Image Super-Resolution Using Pseudo-Supervision
Deep Learning JP
 
深度學習在AOI的應用
CHENHuiMei
 
Visual geometry with deep learning
NAVER Engineering
 
Transformer 動向調査 in 画像認識(修正版)
Kazuki Maeno
 
Mmsys slideshare-intel-nokia
Rufael Mekuria
 
使用人工智慧檢測三維錫球瑕疵_台大傅楸善
CHENHuiMei
 

What's hot (20)

PDF
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
CubiCasa
 
PDF
画像生成・生成モデル メタサーベイ
cvpaper. challenge
 
PDF
Generation of Planar Radiographs from 3D Anatomical Models Using the GPU
thyandrecardoso
 
PDF
Exemplar: Designing Sensor-based interactions by demonstration... (a CHI2007 ...
bjoern611
 
PDF
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
CHENHuiMei
 
PDF
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
Ravi Kiran B.
 
PDF
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
CHENHuiMei
 
PDF
REVIEW ON SECRET IMAGE SHARING USING QR CODE GENERATION TECHNIC
priyanka singh
 
PPTX
Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Proces...
AugmentedWorldExpo
 
PDF
Corrosion Detection Using A.I : A Comparison of Standard Computer Vision Tech...
csandit
 
PDF
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
GiuseppeCaliendo2
 
PDF
【CVPR 2020 メタサーベイ】Video Analysis and Understanding
cvpaper. challenge
 
PDF
Emerging 3D Scanning Technologies for PropTech
PetteriTeikariPhD
 
PDF
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII
 
PDF
Tactile Internet with Human-in-the-Loop
Förderverein Technische Fakultät
 
PDF
High level-api in tensorflow
Hyungjoo Cho
 
PPTX
Sparse Isotropic Hashing
Ikuro Sato
 
PDF
Perceptually Lossless Compression with Error Concealment for Periscope and So...
sipij
 
PDF
PointNet
PetteriTeikariPhD
 
PDF
210610 SSIIi2021 Computer Vision x Trasnformer
exwzds
 
Indoor Point Cloud Processing - Deep learning for semantic segmentation of in...
CubiCasa
 
画像生成・生成モデル メタサーベイ
cvpaper. challenge
 
Generation of Planar Radiographs from 3D Anatomical Models Using the GPU
thyandrecardoso
 
Exemplar: Designing Sensor-based interactions by demonstration... (a CHI2007 ...
bjoern611
 
2018AOI論壇_深度學習於表面瑕疪檢測_元智大學蔡篤銘
CHENHuiMei
 
AUTO AI 2021 talk Real world data augmentations for autonomous driving : B Ra...
Ravi Kiran B.
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
CHENHuiMei
 
REVIEW ON SECRET IMAGE SHARING USING QR CODE GENERATION TECHNIC
priyanka singh
 
Chris Varekamp (Philips Group Innovation, Research): Depth estimation, Proces...
AugmentedWorldExpo
 
Corrosion Detection Using A.I : A Comparison of Standard Computer Vision Tech...
csandit
 
Master Thesis of Computer Engineering SuperResoluton Giuseppe Caliendo
GiuseppeCaliendo2
 
【CVPR 2020 メタサーベイ】Video Analysis and Understanding
cvpaper. challenge
 
Emerging 3D Scanning Technologies for PropTech
PetteriTeikariPhD
 
SSII2021 [OS3-01] 設備や環境の高品質計測点群取得と自動モデル化技術
SSII
 
Tactile Internet with Human-in-the-Loop
Förderverein Technische Fakultät
 
High level-api in tensorflow
Hyungjoo Cho
 
Sparse Isotropic Hashing
Ikuro Sato
 
Perceptually Lossless Compression with Error Concealment for Periscope and So...
sipij
 
210610 SSIIi2021 Computer Vision x Trasnformer
exwzds
 
Ad

Similar to Tackling Open Images Challenge (2019) (20)

PDF
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
PDF
2a Mini-conf PredictCovid. Field: Artificial Intelligence
Alex Camargo
 
PDF
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET Journal
 
PDF
IRJET- Object Detection in an Image using Convolutional Neural Network
IRJET Journal
 
PPTX
Obscenity Detection in Images
Anil Kumar Gupta
 
PDF
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET Journal
 
PPTX
Rapid object detection using boosted cascade of simple features
Hirantha Pradeep
 
PDF
Motion capture for Animation
IRJET Journal
 
PDF
IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET Journal
 
PDF
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET Journal
 
PPTX
An Introduction to Face Detection
Livares Technologies Pvt Ltd
 
PPTX
230208 MLOps Getting from Good to Great.pptx
Arthur240715
 
PDF
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET Journal
 
PDF
2013 Lecture 5: AR Tools and Interaction
Mark Billinghurst
 
PDF
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
IRJET Journal
 
PDF
ROAD POTHOLE DETECTION USING YOLOV4 DARKNET
IRJET Journal
 
PDF
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
PDF
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
PPTX
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
PDF
IRJET - Human Pose Detection using Deep Learning
IRJET Journal
 
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
2a Mini-conf PredictCovid. Field: Artificial Intelligence
Alex Camargo
 
IRJET- Object Detection and Recognition using Single Shot Multi-Box Detector
IRJET Journal
 
IRJET- Object Detection in an Image using Convolutional Neural Network
IRJET Journal
 
Obscenity Detection in Images
Anil Kumar Gupta
 
IRJET- Implementation of Gender Detection with Notice Board using Raspberry Pi
IRJET Journal
 
Rapid object detection using boosted cascade of simple features
Hirantha Pradeep
 
Motion capture for Animation
IRJET Journal
 
IRJET - Automated Fraud Detection Framework in Examination Halls
IRJET Journal
 
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET Journal
 
An Introduction to Face Detection
Livares Technologies Pvt Ltd
 
230208 MLOps Getting from Good to Great.pptx
Arthur240715
 
IRJET - Multi-Label Road Scene Prediction for Autonomous Vehicles using Deep ...
IRJET Journal
 
2013 Lecture 5: AR Tools and Interaction
Mark Billinghurst
 
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDS
IRJET Journal
 
ROAD POTHOLE DETECTION USING YOLOV4 DARKNET
IRJET Journal
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
Efficient Point Cloud Pre-processing using The Point Cloud Library
CSCJournals
 
Lessons Learned from Building Machine Learning Software at Netflix
Justin Basilico
 
IRJET - Human Pose Detection using Deep Learning
IRJET Journal
 
Ad

Recently uploaded (20)

PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
PDF
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PDF
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PPTX
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
PPTX
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
PDF
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
PDF
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PPTX
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
PPTX
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
The-Ethical-Hackers-Imperative-Safeguarding-the-Digital-Frontier.pptx
sujalchauhan1305
 
A Strategic Analysis of the MVNO Wave in Emerging Markets.pdf
IPLOOK Networks
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Make GenAI investments go further with the Dell AI Factory
Principled Technologies
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
Dev Dives: Automate, test, and deploy in one place—with Unified Developer Exp...
AndreeaTom
 
Applied-Statistics-Mastering-Data-Driven-Decisions.pptx
parmaryashparmaryash
 
Get More from Fiori Automation - What’s New, What Works, and What’s Next.pdf
Precisely
 
Google I/O Extended 2025 Baku - all ppts
HusseinMalikMammadli
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
SparkLabs Primer on Artificial Intelligence 2025
SparkLabs Group
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Oracle AI Vector Search- Getting Started and what's new in 2025- AIOUG Yatra ...
Sandesh Rao
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
AI in Daily Life: How Artificial Intelligence Helps Us Every Day
vanshrpatil7
 
Introduction to Flutter by Ayush Desai.pptx
ayushdesai204
 

Tackling Open Images Challenge (2019)

  • 1. Mobility Technologies Co., Ltd. Tackling Open Images Challenge - presented at the 26th Symposium on Sensing via Image Information June 12, 2020 Hiroto Honda, Mobility Technologies Co., Ltd.
  • 2. Mobility Technologies Co., Ltd.2 1 About Me
  • 3. Mobility Technologies Co., Ltd.3 About Me Hiroto Honda https://hirotomusiker.github.io/ kaggle name : Schwert ‘Schwert’ = sword in German R&D of Imaging devices in a Japanese Electronics company → DeNA computer vision team →Mobility Technologies  
  • 4. Mobility Technologies Co., Ltd.4 Check out my Blog Series! https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd Digging into Detectron 2 (object detection)
  • 5. Mobility Technologies Co., Ltd.5 2 Kaggle and Open Images Challenge
  • 6. Mobility Technologies Co., Ltd. Val Data 6 How to Try Kaggle Test data →private leaderboard →public leaderboard Train Data How can you maximize your model’s score on the HIDDEN test data? Evaluation metrics are described in the ‘Evaluation’ section - mean average precision、Dice Coefficient, and so on. Sometimes non-standard metrics are employed and discussed in the ‘Discussion’ threads. Cross Validation and Test data Val Data Train Data Val Data Train Data
  • 7. Mobility Technologies Co., Ltd.7 Open Images Dataset (v5) : 900 million images collected from Flickr ・16M Bounding box annotations of 600 classes on 1.9M images ・Segmentation polygons on 350-class instances ・329 inter-object relationship Open Images Challenge https://storage.googleapis.com/openimages/web/challenge.html https://www.kaggle.com/c/open-images-2019-object-detection/
  • 8. Mobility Technologies Co., Ltd.8 1GB of bounding box data!! (on 500GB of image data) How Huge is Open Images Dataset ?
  • 9. Mobility Technologies Co., Ltd.9 3 How to Tackle Object Detection Challenges
  • 10. Mobility Technologies Co., Ltd.10 Object Detection - detects object positions, sizes and classes from an image - tremendous success of deep-learning-based approaches (e.g. Faster R-CNN, YOLO, and EfficientDet)
  • 11. Mobility Technologies Co., Ltd.11 NOT RECOMMENDED! Okay, Why Not Code Object Detectors
  • 12. Mobility Technologies Co., Ltd.12 What an Object Detector Looks Like https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd
  • 13. Mobility Technologies Co., Ltd.13 Backbone Network Region Proposal Network ROI Head accuracy written in papers is achieved by managing more than 100 config parameters https://medium.com/@hirotoschwert/digging-into-detectron-2-47b2e794fabd What an Object Detector Looks Like
  • 14. Mobility Technologies Co., Ltd.14 How It Was Hard to Reproduce YOLOv3 in PyTorch took months to perfectly reproduce the original repo’s accuracy. implementation details such as weight init, loss definition, and lr schedule are critical https://github.com/DeNA/PyTorch_YOLOv3 blog: https://medium.com/@hirotoschwert/reproducing-training-performance-of-yolov3-in-pytorch-part-0-a792e15ac90d
  • 15. Mobility Technologies Co., Ltd.15 You Should Care Tiny Accuracy Differences Model Name AP A: Faster R-CNN Res50 34.8 B: Faster R-CNN Res50 + Feature Pyramid Network 36.7 C: RetinaNet (single-shot) Res50 Feature Pyramid Network + Focal Loss 35.7 NIPS’15 CVPR’17 ICCV’17 model B from a non-official repo with AP=33.0 is less accurate than the official model A
  • 16. Mobility Technologies Co., Ltd.16 MMDetection (CUHK)  https://github.com/open-mmlab/mmdetection Detectron 2 (Facebook) https://github.com/facebookresearch/detectron2 automl/efficientdet (Google) https://github.com/google/automl/tree/master/efficientdet tpu/models (Google) https://github.com/tensorflow/tpu/tree/master/models/official R. Wightman repos (tf->pytorch, non-official) https://github.com/rwightman Popular and Reliable Detection Frameworks Authors’ official repos are basically recommended Schwert used maskrcnn-benchmark for the competition
  • 17. Mobility Technologies Co., Ltd. 17 takes 1 GPU month to train one model! How to Choose Approaches for Large-scale Detection Competition 1month one attempt is so costly...
  • 18. Mobility Technologies Co., Ltd.18 1:Last Year’s solutions 2:Detection papers (CVPR, ICCV…) 3:Benchmark website such as papers with code are good resources to find: “An Exclusive Feature that Apparently Contributes to the score” (EFAC) How to Choose Approaches for Large-scale Detection Competition
  • 19. Mobility Technologies Co., Ltd.19 Looks like ResNet50 works.. OK, let’s try ResNeXt101 ...and why not adding Random Cropping_ Example of Bad Experiment model 1 (baseline) new feature A new feature B model 2 Important to add / remove one exclusive feature at a time!
  • 20. Mobility Technologies Co., Ltd.20 4 Schwert’s Solution
  • 21. Mobility Technologies Co., Ltd.21 Schwert’s ranks: Detection Track: 6th / 558 (Gold) [1] [2] Segmentation Track: 11th / 193 (Silver) [3] Relationship Track: 30th / 201 (Silver) Results of Open Images Competition (2019) # Team Name # of members score 1 MMfruit 5 0.65887 2 imagesearch 7 0.65337 3 Prisms 6 0.64214 4 PFDet 6 0.62221 5 Omni-Detection 3 0.60406 6 Schwert 1 (solo) 0.60231 7 Team 5 5 0.60210 8 pudae 1 (solo) 0.59727 Got a solo gold medal at the first kaggle competition!
  • 22. Mobility Technologies Co., Ltd.22 “An Exclusive Feature that Apparently Contributes to the score” (EFAC) EFAC examples from the solution writeups of Open Images 2018 [4][5][6] ・class balancing (3rd、5pts↑) ・Ensemble (1st / 3rd、5pts↑) ・voting NMS (1st / 3rd) ・long cosine annealing (2nd) ・parent class expansion ・ResNext 152 + SE (1st, 2nd, 3rd) class balancing and model ensemble are essential
  • 23. Mobility Technologies Co., Ltd.23 mean Average Precision (mAP) at IoU > 0.5 , avg of 500 classes 1: EVERY class is equal, even if it’s extremely rare.       images including ‘person’ instances:250,000        ‘torch’ instances : 18 2: Strict localization is not required. classification matters... Evaluation Metrics
  • 24. Mobility Technologies Co., Ltd.24 Method 1:Class Balancing [1] - Equal probability for a model to encounter a certain class. - Rare classes: increase sampling rate. - Non-rare classes: limit number of images. - Total number of images: 4k x 500 (2M) → efficient training
  • 25. Mobility Technologies Co., Ltd.25 Method 2 : Ensembling Pipeline of Multiple Models [1] ・Baseliene model: ResNeXt152 [7] + Deformable Convnets v2 [8] + Feature Pyramid Network [9] ・Train different types of models on training data with different seeds ・8 models are ensembled
  • 26. Mobility Technologies Co., Ltd.26 Contribution of each exclusive feature on val and leaderboard accuracies Ablation Study Backbone Deformable Convolutions Parent Expansion Data Size val AP private LB ResNeXt101 None Inference Time 4k per class 69.8 54.0 ResNeXt101 DCN v2 Inference Time 4k per class 72.2 (+2.4) ResNeXt152 None Inference Time 4k per class 72.2 (+2.4) ResNeXt152 None Inference Time 16k per class 72.4 (+2.6) ResNeXt152 DCN v2 Inference Time 4k per class 73.2 (+3.4) 56.4 (best single model) ResNeXt152 None Training Time 4k per class 72.4 (+2.6)*
  • 27. Mobility Technologies Co., Ltd.27 Method 3:Enhanced (Voting) NMS [6] Non-Maximum Suppression for Model Ensembling When the multiple boxes from different models are overlapped, the resulting box earns added confidence scores
  • 28. Mobility Technologies Co., Ltd.28 Result of 8 Model Ensembling Backbone Deformable Convolutions Parent Expansion Data Size val AP private LB ResNeXt152 DCN v2 Inference Time 4k per class 73.2 (+3.4) 56.4 (best single model) Ensemble of 8 models + NMS tuned 60.23 ~13th place 6th place!
  • 29. Mobility Technologies Co., Ltd.29 Visualization Demo of the Best Single Model
  • 30. Mobility Technologies Co., Ltd.30 Visualization Demo of the Best Single Model
  • 31. Mobility Technologies Co., Ltd.31 Independently train detection and segmentation Schwert’s Approach on Segmentation Track (11th Place) [2] Inference results using detection model
  • 32. Mobility Technologies Co., Ltd.32 5 Take-Home Messages
  • 33. Mobility Technologies Co., Ltd.33 ・Kaggle is a wonderful platform where you can learn cutting-edge computer vision methods and implementations. Discussion with great kagglers is always fun ・Like research, it’s a tough but fun job to develop (or surpass) the state-of-the-art method methods ・Choosing a reliable framework is a must for Object Detection competitions ・Understand the past solutions and pick an Exclusive Feature that Apparently Contributes to the score (EFAC) Take-Home Messages
  • 34. Mobility Technologies Co., Ltd.34 [1] Hiroto Honda, “The 6th Place Solution for the Open Images 2019 Object Detection Track, ” presented at ICCVW 2019, https://hirotomusiker.github.io/files/schwert_open_images_6th_solution_v1.pdf [2] Hiroto Honda, “6th place solution” , discussion in Open Images 2019 Object Detection Track, https://www.kaggle.com/c/open-images-2019-object-detection/discussion/110953 [3] Hiroto Honda, “11th place solution, discussion in Open Images 2019 Instance Segmentation Track, https://www.kaggle.com/c/open-images-2019-instance-segmentation/discussion/111351 [4] kivajok, 1st place writeup, https://storage.googleapis.com/openimages/web/challenge.html [5] Takuya Akiba et al., “PFDet: 2nd Place Solution to Open Images Challenge 2018 Object Detection Track”, arXiv:1809.00778 [6] Yuan Gao et al., “Solution for Large-Scale Hierarchical Object Detection Datasets with Incomplete Annotation and Data Imbalance”, arXiv:1810.06208 [7] Saining Xie et al., “Aggregated Residual Transformations for Deep Neural Networks,” CVPR 2017 [8] Xizhou Zhu et al., “Deformable ConvNets v2: More Deformable, Better Results”, CVPR 2019 [9] Tsung-Yi Lin et al., “Feature Pyramid Networks for Object Detection”, CVPR 2017 * All the photos used in this presentation were taken by Hiroto Honda References