SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution

SIGNATE
国立国会図書館の画像データレイアウト認識
1st place solution
coz.a
@coz_a_

Task
• Object Detection
• Metrics: mean IoU
• Dataset
• # of Images
• # of Boxes
train test
古典籍 1219 211
明治期以降刊行 1175 252
Total 2394 463
train
1_overall 2_handwritten 3_typography 4_illustration 5_stamp 6_headline 7_caption 8_textline
古典籍 1219 13851 9262 1119 369 - - -
明治期以降刊行 1175 - - 1207 78 3150 1462 60447
Total 2394 13851 9262 2326 447 3150 1462 60447
one-to-one

CNN Architecture
margin (*)
(1_overall)
keypoint heatmap
(category 2~8)
box size (*)
local offset
EfficientNet
(ImageNet pretrained)
BiFPN
image
[b, 3, h, w]
[b, 4]
[b, 2, h/4, w/4]
[b, 2, h/4, w/4]
[b, 7, h/4, w/4]
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks https://arxiv.org/abs/1905.11946
EfficientDet: Scalable and Efficient Object Detection https://arxiv.org/abs/1911.09070
Objects as Points https://arxiv.org/abs/1904.07850
(*) normalized by
input image width
category mask
[b, 7, 1, 1]
×
CenterNet
Margin Regression
古典籍:
[1, 1, 1, 1, 0, 0, 0]
明治期以降刊行:
[0, 0, 1, 1, 1, 1, 1]

Training Parameters
• 5-Fold CV
• Batch Size: 6 (2 GPU, GTX1080ti x 2)
• Epochs: 104
• Optimizer: RAdam, LR=1.2e-3 (x0.1 at epoch=[64, 96])
• Data Augmentation:
• Random Crop & Scale
• Gray Scale / Thresholding (cv2.adaptiveThreshold)
• Random Rotate (±0.2degree)
• Cutout (side edge)
• Loss Function:
• keypoint heatmap: Focal Loss (weight=1.0)
• box size: L1 Loss (weight=5.0)
• local offset : L1 Loss (weight=0.2)
• margin : L1 Loss (weight=12.5)
On the Variance of the Adaptive Learning Rate and Beyond https://arxiv.org/abs/1908.03265

Prediction
Pad & Scale
Detect
5model x 5fold
768x576
896x672
1024x768
1152x864
1280x960
640x480
1408x1056
Φ=4, train on 724x576
Input
Boxes
3scale x 5model x 5fold
Weighted
Boxes Fusion
iou_threshold=0.38
Output
Weighted Boxes Fusion: ensembling boxes for object detection models https://arxiv.org/abs/1910.13302
(*) Φ is model scaling parameter.
ref. EfficientDet: Scalable and Efficient Object Detection
(*)
CNN -> build boxes
-> score threshold=0.005
-> nms(iou threshold=0.18)

Score History
public private
single model (Φ=4), 5-Fold CV,
NMS ensemble
0.79143 0.82140
single model (Φ=4), 5-Fold CV, TTA (3 scale),
NMS ensemble
0.80315 0.82782
3 model (Φ=[0, 2, 4]), 5-Fold CV, TTA (3 scale),
NMS ensemble
0.80468 0.82961
3 model (Φ=[0, 2, 4]), 5-Fold CV, TTA (3 scale),
WBF ensemble
0.82226 0.84791
5 model (Φ=[0, 1, 2, 3, 4]), 5-Fold CV, TTA (3 scale),
WBF ensemble
0.82340 0.84978

SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution

More Related Content

What's hot (20)

Similar to SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution (20)

Recently uploaded (20)

SIGNATE 国立国会図書館の画像データレイアウト認識 1st place solution