SlideShare a Scribd company logo
2
Most read
3
Most read
6
Most read
SIGNATE
国立国会図書館の画像データレイアウト認識
1st place solution
coz.a
@coz_a_
Task
• Object Detection
• Metrics: mean IoU
• Dataset
• # of Images
• # of Boxes
train test
古典籍 1219 211
明治期以降刊行 1175 252
Total 2394 463
train
1_overall 2_handwritten 3_typography 4_illustration 5_stamp 6_headline 7_caption 8_textline
古典籍 1219 13851 9262 1119 369 - - -
明治期以降刊行 1175 - - 1207 78 3150 1462 60447
Total 2394 13851 9262 2326 447 3150 1462 60447
one-to-one
CNN Architecture
margin (*)
(1_overall)
keypoint heatmap
(category 2~8)
box size (*)
local offset
EfficientNet
(ImageNet pretrained)
BiFPN
image
[b, 3, h, w]
[b, 4]
[b, 2, h/4, w/4]
[b, 2, h/4, w/4]
[b, 7, h/4, w/4]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks https://arxiv.org/abs/1905.11946
EfficientDet: Scalable and Efficient Object Detection https://arxiv.org/abs/1911.09070
Objects as Points https://arxiv.org/abs/1904.07850
(*) normalized by
input image width
category mask
[b, 7, 1, 1]
×
CenterNet
Margin Regression
古典籍:
[1, 1, 1, 1, 0, 0, 0]
明治期以降刊行:
[0, 0, 1, 1, 1, 1, 1]
Training Parameters
• 5-Fold CV
• Batch Size: 6 (2 GPU, GTX1080ti x 2)
• Epochs: 104
• Optimizer: RAdam, LR=1.2e-3 (x0.1 at epoch=[64, 96])
• Data Augmentation:
• Random Crop & Scale
• Gray Scale / Thresholding (cv2.adaptiveThreshold)
• Random Rotate (±0.2degree)
• Cutout (side edge)
• Loss Function:
• keypoint heatmap: Focal Loss (weight=1.0)
• box size: L1 Loss (weight=5.0)
• local offset : L1 Loss (weight=0.2)
• margin : L1 Loss (weight=12.5)
On the Variance of the Adaptive Learning Rate and Beyond https://arxiv.org/abs/1908.03265
Prediction
Pad & Scale
Detect
5model x 5fold
768x576
896x672
1024x768
1152x864
1280x960
640x480
1408x1056
Φ=4, train on 724x576
Φ=3, train on 896x672
Φ=2, train on 1024x768
Φ=1, train on 1152x864
Φ=0, train on 1280x960
Input
Boxes
3scale x 5model x 5fold
Weighted
Boxes Fusion
iou_threshold=0.38
Output
Weighted Boxes Fusion: ensembling boxes for object detection models https://arxiv.org/abs/1910.13302
(*) Φ is model scaling parameter.
ref. EfficientDet: Scalable and Efficient Object Detection
(*)
CNN -> build boxes
-> score threshold=0.005
-> nms(iou threshold=0.18)
Score History
public private
single model (Φ=4), 5-Fold CV,
NMS ensemble
0.79143 0.82140
single model (Φ=4), 5-Fold CV, TTA (3 scale),
NMS ensemble
0.80315 0.82782
3 model (Φ=[0, 2, 4]), 5-Fold CV, TTA (3 scale),
NMS ensemble
0.80468 0.82961
3 model (Φ=[0, 2, 4]), 5-Fold CV, TTA (3 scale),
WBF ensemble
0.82226 0.84791
5 model (Φ=[0, 1, 2, 3, 4]), 5-Fold CV, TTA (3 scale),
WBF ensemble
0.82340 0.84978

More Related Content

PPTX
MS COCO Dataset Introduction
Shinagawa Seitaro
 
PDF
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
Deep Learning JP
 
PDF
動作認識の最前線:手法,タスク,データセット
Toru Tamaki
 
PDF
Transformerを多層にする際の勾配消失問題と解決法について
Sho Takase
 
PDF
【メタサーベイ】数式ドリブン教師あり学習
cvpaper. challenge
 
PDF
方策勾配型強化学習の基礎と応用
Ryo Iwaki
 
PPTX
【DL輪読会】"A Generalist Agent"
Deep Learning JP
 
PDF
[DL輪読会]自動運転技術の課題に役立つかもしれない論文3本
Deep Learning JP
 
MS COCO Dataset Introduction
Shinagawa Seitaro
 
[DL輪読会]Model soups: averaging weights of multiple fine-tuned models improves ...
Deep Learning JP
 
動作認識の最前線:手法,タスク,データセット
Toru Tamaki
 
Transformerを多層にする際の勾配消失問題と解決法について
Sho Takase
 
【メタサーベイ】数式ドリブン教師あり学習
cvpaper. challenge
 
方策勾配型強化学習の基礎と応用
Ryo Iwaki
 
【DL輪読会】"A Generalist Agent"
Deep Learning JP
 
[DL輪読会]自動運転技術の課題に役立つかもしれない論文3本
Deep Learning JP
 

What's hot (20)

PPTX
[DL輪読会]SOLAR: Deep Structured Representations for Model-Based Reinforcement L...
Deep Learning JP
 
PPTX
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
Deep Learning JP
 
PDF
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
Deep Learning JP
 
PPTX
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
Deep Learning JP
 
PDF
Attentionの基礎からTransformerの入門まで
AGIRobots
 
PDF
【DL輪読会】Hierarchical Text-Conditional Image Generation with CLIP Latents
Deep Learning JP
 
PDF
【チュートリアル】コンピュータビジョンによる動画認識
Hirokatsu Kataoka
 
PPTX
BERT分類ワークショップ.pptx
Kouta Nakayama
 
PPTX
モデル高速化百選
Yusuke Uchida
 
PPTX
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
Deep Learning JP
 
PDF
CV分野での最近の脱○○系3選
Kazuyuki Miyazawa
 
PDF
【メタサーベイ】Video Transformer
cvpaper. challenge
 
PDF
【DL輪読会】A Path Towards Autonomous Machine Intelligence
Deep Learning JP
 
PPTX
分散深層学習 @ NIPS'17
Takuya Akiba
 
PDF
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
Deep Learning JP
 
PDF
プログラミングコンテストでのデータ構造 2 ~平衡二分探索木編~
Takuya Akiba
 
PPTX
Triplet Loss 徹底解説
tancoro
 
PPTX
近年のHierarchical Vision Transformer
Yusuke Uchida
 
PDF
文献紹介:Multi-Task Learning for Dense Prediction Tasks: A Survey
Toru Tamaki
 
[DL輪読会]SOLAR: Deep Structured Representations for Model-Based Reinforcement L...
Deep Learning JP
 
[DL輪読会]AutoAugment: LearningAugmentation Strategies from Data & Learning Data...
Deep Learning JP
 
[DL輪読会]data2vec: A General Framework for Self-supervised Learning in Speech,...
Deep Learning JP
 
[DL輪読会]LightTrack: A Generic Framework for Online Top-Down Human Pose Tracking
Deep Learning JP
 
Attentionの基礎からTransformerの入門まで
AGIRobots
 
【DL輪読会】Hierarchical Text-Conditional Image Generation with CLIP Latents
Deep Learning JP
 
【チュートリアル】コンピュータビジョンによる動画認識
Hirokatsu Kataoka
 
BERT分類ワークショップ.pptx
Kouta Nakayama
 
モデル高速化百選
Yusuke Uchida
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
Deep Learning JP
 
CV分野での最近の脱○○系3選
Kazuyuki Miyazawa
 
【メタサーベイ】Video Transformer
cvpaper. challenge
 
【DL輪読会】A Path Towards Autonomous Machine Intelligence
Deep Learning JP
 
分散深層学習 @ NIPS'17
Takuya Akiba
 
[DL輪読会]Wasserstein GAN/Towards Principled Methods for Training Generative Adv...
Deep Learning JP
 
プログラミングコンテストでのデータ構造 2 ~平衡二分探索木編~
Takuya Akiba
 
Triplet Loss 徹底解説
tancoro
 
近年のHierarchical Vision Transformer
Yusuke Uchida
 
文献紹介:Multi-Task Learning for Dense Prediction Tasks: A Survey
Toru Tamaki
 
Ad

Similar to SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution (20)

PDF
Computer vision
Dmitry Ryabokon
 
PDF
Backbone search for object detection for applications in intrusion warning sy...
IAESIJAI
 
PDF
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
ijaia
 
PDF
Detection of Dense, Overlapping, Geometric Objects
gerogepatton
 
PDF
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
gerogepatton
 
PPTX
slide-171212080528.pptx
SharanrajK22MMT1003
 
PPTX
Real Time Object Dectection using machine learning
pratik pratyay
 
PDF
最近の研究情勢についていくために - Deep Learningを中心に -
Hiroshi Fukui
 
PPTX
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Abdulrahman Kerim
 
PDF
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
PDF
Detection focal loss 딥러닝 논문읽기 모임 발표자료
taeseon ryu
 
PDF
Final Report - Major Project - MAP
Arjun Aravind
 
PDF
DSNet Joint Semantic Learning for Object Detection in Inclement Weather Condi...
IRJET Journal
 
PDF
Intro to TF Object Detection
Javier Esteve Meliá
 
PDF
Efficient de cvpr_2020_paper
shanullah3
 
PDF
Object Detetcion using SSD-MobileNet
IRJET Journal
 
PDF
Manuscript document digitalization and recognition: a first approach
Servicio de Difusión de la Creación Intelectual (SEDICI)
 
PPTX
Object detection with Tensorflow Api
ArwinKhan1
 
PPTX
[NS][Lab_Seminar_241118]Relation Matters: Foreground-aware Graph-based Relati...
thanhdowork
 
PPTX
Traffic Violation Detector using Object Detection
shri ram murti smarak college of engineering,technology & research
 
Computer vision
Dmitry Ryabokon
 
Backbone search for object detection for applications in intrusion warning sy...
IAESIJAI
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
ijaia
 
Detection of Dense, Overlapping, Geometric Objects
gerogepatton
 
DETECTION OF DENSE, OVERLAPPING, GEOMETRIC OBJECTS
gerogepatton
 
slide-171212080528.pptx
SharanrajK22MMT1003
 
Real Time Object Dectection using machine learning
pratik pratyay
 
最近の研究情勢についていくために - Deep Learningを中心に -
Hiroshi Fukui
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Abdulrahman Kerim
 
IRJET- Real-Time Object Detection using Deep Learning: A Survey
IRJET Journal
 
Detection focal loss 딥러닝 논문읽기 모임 발표자료
taeseon ryu
 
Final Report - Major Project - MAP
Arjun Aravind
 
DSNet Joint Semantic Learning for Object Detection in Inclement Weather Condi...
IRJET Journal
 
Intro to TF Object Detection
Javier Esteve Meliá
 
Efficient de cvpr_2020_paper
shanullah3
 
Object Detetcion using SSD-MobileNet
IRJET Journal
 
Manuscript document digitalization and recognition: a first approach
Servicio de Difusión de la Creación Intelectual (SEDICI)
 
Object detection with Tensorflow Api
ArwinKhan1
 
[NS][Lab_Seminar_241118]Relation Matters: Foreground-aware Graph-based Relati...
thanhdowork
 
Traffic Violation Detector using Object Detection
shri ram murti smarak college of engineering,technology & research
 
Ad

Recently uploaded (20)

PPTX
Introduction to computer chapter one 2017.pptx
mensunmarley
 
PDF
blockchain123456789012345678901234567890
tanvikhunt1003
 
PDF
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
PPTX
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
PPTX
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
PPTX
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
PPTX
short term internship project on Data visualization
JMJCollegeComputerde
 
PPTX
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
PPTX
Probability systematic sampling methods.pptx
PrakashRajput19
 
PPT
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
PDF
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
PPTX
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
PDF
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
PDF
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
PDF
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
PPTX
INFO8116 -Big data architecture and analytics
guddipatel10
 
PPTX
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
PPTX
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
PDF
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
PDF
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 
Introduction to computer chapter one 2017.pptx
mensunmarley
 
blockchain123456789012345678901234567890
tanvikhunt1003
 
Technical Writing Module-I Complete Notes.pdf
VedprakashArya13
 
Power BI in Business Intelligence with AI
KPR Institute of Engineering and Technology
 
White Blue Simple Modern Enhancing Sales Strategy Presentation_20250724_21093...
RamNeymarjr
 
Presentation (1) (1).pptx k8hhfftuiiigff
karthikjagath2005
 
short term internship project on Data visualization
JMJCollegeComputerde
 
INFO8116 - Week 10 - Slides.pptx big data architecture
guddipatel10
 
Probability systematic sampling methods.pptx
PrakashRajput19
 
Grade 5 PPT_Science_Q2_W6_Methods of reproduction.ppt
AaronBaluyut
 
WISE main accomplishments for ISQOLS award July 2025.pdf
StatsCommunications
 
Data-Driven Machine Learning for Rail Infrastructure Health Monitoring
Sione Palu
 
oop_java (1) of ice or cse or eee ic.pdf
sabiquntoufiqlabonno
 
SUMMER INTERNSHIP REPORT[1] (AutoRecovered) (6) (1).pdf
pandeydiksha814
 
Mastering Financial Analysis Materials.pdf
SalamiAbdullahi
 
INFO8116 -Big data architecture and analytics
guddipatel10
 
Data Security Breach: Immediate Action Plan
varmabhuvan266
 
Complete_STATA_Introduction_Beginner.pptx
mbayekebe
 
D9110.pdfdsfvsdfvsdfvsdfvfvfsvfsvffsdfvsdfvsd
minhn6673
 
Classifcation using Machine Learning and deep learning
bhaveshagrawal35
 

SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution

  • 2. Task • Object Detection • Metrics: mean IoU • Dataset • # of Images • # of Boxes train test 古典籍 1219 211 明治期以降刊行 1175 252 Total 2394 463 train 1_overall 2_handwritten 3_typography 4_illustration 5_stamp 6_headline 7_caption 8_textline 古典籍 1219 13851 9262 1119 369 - - - 明治期以降刊行 1175 - - 1207 78 3150 1462 60447 Total 2394 13851 9262 2326 447 3150 1462 60447 one-to-one
  • 3. CNN Architecture margin (*) (1_overall) keypoint heatmap (category 2~8) box size (*) local offset EfficientNet (ImageNet pretrained) BiFPN image [b, 3, h, w] [b, 4] [b, 2, h/4, w/4] [b, 2, h/4, w/4] [b, 7, h/4, w/4] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks https://arxiv.org/abs/1905.11946 EfficientDet: Scalable and Efficient Object Detection https://arxiv.org/abs/1911.09070 Objects as Points https://arxiv.org/abs/1904.07850 (*) normalized by input image width category mask [b, 7, 1, 1] × CenterNet Margin Regression 古典籍: [1, 1, 1, 1, 0, 0, 0] 明治期以降刊行: [0, 0, 1, 1, 1, 1, 1]
  • 4. Training Parameters • 5-Fold CV • Batch Size: 6 (2 GPU, GTX1080ti x 2) • Epochs: 104 • Optimizer: RAdam, LR=1.2e-3 (x0.1 at epoch=[64, 96]) • Data Augmentation: • Random Crop & Scale • Gray Scale / Thresholding (cv2.adaptiveThreshold) • Random Rotate (±0.2degree) • Cutout (side edge) • Loss Function: • keypoint heatmap: Focal Loss (weight=1.0) • box size: L1 Loss (weight=5.0) • local offset : L1 Loss (weight=0.2) • margin : L1 Loss (weight=12.5) On the Variance of the Adaptive Learning Rate and Beyond https://arxiv.org/abs/1908.03265
  • 5. Prediction Pad & Scale Detect 5model x 5fold 768x576 896x672 1024x768 1152x864 1280x960 640x480 1408x1056 Φ=4, train on 724x576 Φ=3, train on 896x672 Φ=2, train on 1024x768 Φ=1, train on 1152x864 Φ=0, train on 1280x960 Input Boxes 3scale x 5model x 5fold Weighted Boxes Fusion iou_threshold=0.38 Output Weighted Boxes Fusion: ensembling boxes for object detection models https://arxiv.org/abs/1910.13302 (*) Φ is model scaling parameter. ref. EfficientDet: Scalable and Efficient Object Detection (*) CNN -> build boxes -> score threshold=0.005 -> nms(iou threshold=0.18)
  • 6. Score History public private single model (Φ=4), 5-Fold CV, NMS ensemble 0.79143 0.82140 single model (Φ=4), 5-Fold CV, TTA (3 scale), NMS ensemble 0.80315 0.82782 3 model (Φ=[0, 2, 4]), 5-Fold CV, TTA (3 scale), NMS ensemble 0.80468 0.82961 3 model (Φ=[0, 2, 4]), 5-Fold CV, TTA (3 scale), WBF ensemble 0.82226 0.84791 5 model (Φ=[0, 1, 2, 3, 4]), 5-Fold CV, TTA (3 scale), WBF ensemble 0.82340 0.84978