IMPROVING CNN-RNN HYBRID NETWORKS
FOR HANDWRITING RECOGNITION
Kartik Dutta, Praveen Krishnan, Minesh Mathew
and C.V. Jawahar
CVIT, IIIT Hyderabad, India
Problem
conference in london will it end with
Word Recognition
Line Recognition
Prior works
• Using BLSTM’s
 Formulated as
Sequence-2-Sequence
problem.
 Bidirectional LSTM networks
using CTC layer
[Bluche ICDAR’15, Sueiras
Neurocomputing’18]
for recognition.
• Shi 2016 proposed
a SoA Hybrid architecture for
scene text recognition
Prior works
• Variations of BLSTM’s
 MDLSTM
[Voigtlaender, et al
ICFHR’16]
 SepMDLSTM
[Chen et al ICDAR’17]
• Puigcerver et al.,
ICDAR 2017, analyze
on the effectiveness of
BLSTM’s vs MDLSTM’s.
Prior works
• Wigington et al., ICDAR’17 use new pre-processing
& augmentation strategies
Profile Normalization
Elastic Distortion
Our Prior work (DAS 18)
CNN-RNN Hybrid network
Feature Sequence from CNN layer
STN
• Used to correct Distortions in Input
• E2E trainable
• Components
 Localization Network
 Grid Generator
 Sampler
Jaderberg et al., NIPS, 2015
Contributions
• Pre-training, Data Augmentation & Normalization
Word/Line Normalization Multi Scale
Elastic Distortion Synthetic Data
Pre-processing
• We use the algorithm
used in Vinciarelli et
al., PR, 2001
• Shear i/p image &
evaluate to histogram
of contours of nearly
vertical strokes
• No parameter tuning
Pre-processing
• Image de-slanting
Multi-Scale Training
• In order to predict
characters at multiple
scales.
• Fix a 2d rectangle size
 Scale i/p image to larger
or smaller sizes
 Translate the
transformed image.
Data Augmentation
• Affine Transformation
 Combination of Rotation, Scaling & Translation
• Elastic Distortion
 Using a random displacement field, each pixel is interpolated.
 The field is smoothed using a Gaussian filter.
Pre-Training
• IIIT-HWS dataset
 10M word images
 10K vocabulary words
• Rendered using open source handwritten type
fonts.
• Parameters of rendering:-
 kerning level , stroke width
 Sampling of foreground and background pixel distribution
from a Gaussian distributions
P Krishnan and CV Jawahar, Generating Synthetic Data for Text Recognition, arXiv 2016
Datasets
Dataset Historical #Lines #Words #Writers
Rimes No 12,093 66,982 1300
GW Yes 656 4,894 1
IAM No 13,353 1,15,320 657
Sample word images
Evaluation Protocol
• Lexicon based and free decoding
• Evaluation
 Mean word error rate (WER)
 Mean character error rate (CER) based on Levenshtein
distance between the two words.
Ablation Study-I
• Let check performance of the original CRNN
network only on IAM train data
22.86
We know deep learning architectures are data hungry
Let us try pre-training with synthetic data
WER
Ablation Study-II
• Let check performance of the original CRNN
network only on IAM train data, pre-trained on
IIIT-HWS data
22.86
Let us try out a few architectural improvements
First a STN layer
20.10
WER
Ablation Study-III
• Let us add a STN layer & use the same training
scheme as before
Let us add residual blocks to our network
18.3
WER
22.86
20.10
WER
Ablation Study-IV
• Let us add more (residual) conv. layers & use the
same training scheme as before
Using a deeper network helps with HWR
Let us try using slant correction
18.3
WER
22.86
20.10
WER
16.19
Ablation Study-V
• Let us add pre-processing to the previous
architecture and training strategy
A little improvement
Now let us see the improvement with our augmentation strategies
18.3
WER
22.86
20.10
WER
16.19 15.79
Ablation Study-VI
• Let check performance of our previous network,
with the same strategy, but with our data
augmentation strategy
Data augmentation makes a huge difference in HWR
18.3
WER
22.86
20.10
WER
16.19
13.16
15.79
Ablation Study-VII
• Let check performance of our previous network,
with the same strategy, but with our data
augmentation strategy
Finally performing a Test time augmentation
18.3
WER
22.86
20.10
WER
16.19 15.79
13.16
12.61
Ablation: IAM Isolated HWR
Method WER CER
CRNN 22.86 11.08
CRNN-Synth 20.10 9.31
SCRNN-Synth 18.3 7.82
SDCRNN-Synth 16.19 6.34
PP-SDCRNN-Synth 15.79 5.98
PP-SDCRNN-Synth
+Augmentation
12.61 4.88
Isolated Word Recognition-I
Krishnan
et al.
DAS'18
Wigington
et al.
ICDAR'17
Sueiras et
al.
Neuroco
mputing'1
8
This Work
Sueiras et
al. -- Full
Lexicon
Neuroco
mputing'1
8
Stuner et
al. -- Full
Lexicon
CoRR'16
Pozanski
et al. --
Full
Lexicon
CVPR'16
Krishnan
et al. --
Full
Lexicon
DAS'18
Wigington
et al. --
Full
Lexicon
ICDAR'17
This Work
-- Full
Lexicon
WER 16.19 19.07 23.8 12.61 12.7 5.93 6.45 5.1 5.71 4.8
CER 6.34 6.07 8.8 4.88 6.2 2.78 3.44 2.66 3.03 2.52
0
5
10
15
20
25
IAM
WER CER
Isolated Word Recognition-II
Wigington et
al.
ICDAR'17
Sueiras et
al.
Neurocomp
uting'18
This Work
Sueiras et
al.
Neurocomp
uting'18 --
Comp.
Lexicon
Pozanski et
al. CVPR'16
-- Comp.
Lexicon
Wigington et
al.
ICDAR'17 --
Comp.
Lexicon
Stuner et al.
CoRR'16 --
Comp.
Lexicon
This Work --
Comp.
Lexicon
WER 11.29 15.9 7.04 6.6 3.9 2.85 3.48 1.86
CER 3.09 4.8 2.32 2.6 1.9 1.36 1.34 0.65
0
2
4
6
8
10
12
14
16
18
RIMES
WER CER
Line Level Recognition-I
Pham et al.
ICFHR'14
Krishnan et al.
DAS'18
Chen et al.
ICDAR'17
Puigcerver et al.
ICDAR'17
This Work
WER 35.1 32.89 34.55 18.4 17.82
CER 10.8 9.78 11.15 5.8 5.7
0
5
10
15
20
25
30
35
40
IAM
WER CER
Filter Visualizations
• First column is input, rest are activations
Taken from 2nd conv layer
Qualitative Results-IAM
Conclusion
• We show the state-of-the-art DL architecture for
handwriting recognition
 CNN-RNN encoder decoder and STN module
 Pretraining with synthetic data
 Preprocessing with slant correction
 Various Augmentations
o Multi scale
o Elastic + Affine
o Test Time
Thank You

More Related Content

PDF
CNNs: from the Basics to Recent Advances
PPTX
powerpoint feb
PPTX
Data aggregation in wireless sensor networks
PDF
Deep learning fundamental and Research project on IBM POWER9 system from NUS
PPTX
Prediction of pKa from chemical structure using free and open source tools
PPTX
Mx net image segmentation to predict and diagnose the cardiac diseases karp...
PPTX
Research Presentation for those who are dong M.Tech
PDF
Fpga implementation of 4 bit parallel cyclic redundancy code
CNNs: from the Basics to Recent Advances
powerpoint feb
Data aggregation in wireless sensor networks
Deep learning fundamental and Research project on IBM POWER9 system from NUS
Prediction of pKa from chemical structure using free and open source tools
Mx net image segmentation to predict and diagnose the cardiac diseases karp...
Research Presentation for those who are dong M.Tech
Fpga implementation of 4 bit parallel cyclic redundancy code

Similar to ICFHR'18-DataAug.pptx (20)

PPTX
Wireless Sensor Network ZiaUlHaq-GSC presentation 2.pptx
PDF
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
PPT
P1121133727
PDF
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
DOCX
Distributed web systems performance forecasting
DOCX
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
PDF
Detecting and Recognising Highly Arbitrary Shaped Texts from Product Images
PDF
Generator of pseudorandom sequences
PDF
Introduction to Chainer
PDF
Introduction to Chainer
PPTX
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
PPTX
Cvpr 2018 papers review (efficient computing)
PPTX
DAS_18-KD-v2.pptx
PPT
IGARSS2011-I-Ling.ppt
PPTX
Omid Badretale Low-Dose CT noise reduction
PDF
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
PDF
An Improved DEEHC to Extend Lifetime of WSN
PDF
Performance analysis of congestion-aware Q-routing algorithm for network on chip
PDF
Super Resolution with OCR Optimization
PDF
StateKeeper Report
Wireless Sensor Network ZiaUlHaq-GSC presentation 2.pptx
Tutorial-on-DNN-09A-Co-design-Sparsity.pdf
P1121133727
“Learning Compact DNN Models for Embedded Vision,” a Presentation from the Un...
Distributed web systems performance forecasting
JAVA 2013 IEEE DATAMINING PROJECT Distributed web systems performance forecas...
Detecting and Recognising Highly Arbitrary Shaped Texts from Product Images
Generator of pseudorandom sequences
Introduction to Chainer
Introduction to Chainer
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Cvpr 2018 papers review (efficient computing)
DAS_18-KD-v2.pptx
IGARSS2011-I-Ling.ppt
Omid Badretale Low-Dose CT noise reduction
Reconfigurable High Performance Secured NoC Design Using Hierarchical Agent-b...
An Improved DEEHC to Extend Lifetime of WSN
Performance analysis of congestion-aware Q-routing algorithm for network on chip
Super Resolution with OCR Optimization
StateKeeper Report
Ad

Recently uploaded (20)

PPTX
F&B 5th Semester exam Class Notes (2).pptx
PPT
Soldering technics Aerospace electronic assembly
PDF
Cattle Scales (https://cattlescales.com.au/)
PPTX
Malnutrition_Presentation_Revised.pptxhwjsjjsjs
PDF
Gain improved scalability over networks and sub-networks using DVI extender
PPTX
Quiz template 300 pages advanced and Tech friendly
PPTX
Purple Pink Gradient Modern Metaverse Presentation_20250817_191428_0000.pptx
PDF
Melt Flow Index Tester from Perfect Group India
PDF
Instit16health2.pdfghujjjgjkkggikjhgghhhjj
PPT
Access List. Configuration of Layer three Router Access List
DOCX
Buy Abortion Pills Best Prices In Qatar$
PPTX
Computer Hardware - Technology and Livelihood Education
PPT
COA______________₹₹_₹₹33₹₹₹33₹₹₹3UNIT1V8.ppt
PPTX
ppt to the world finance to the world in growing
PPTX
Java_Basics_Grade6 powerpoint prese.pptx
PPTX
Presentation utk shar baurlah bhhkuaie.pptx
PPTX
UAV WHAT MINERAL ARE REQUIRED FOR MAKING OF UAV
PPTX
Operating_Systems_Presentation_With_Icons (1).pptx
PPTX
加拿大埃德蒙顿康考迪亚大学毕业证书{CUE海牙认证CUE成绩单防伪}100%复刻
PDF
script scriptscriptscriptscriptscriptscript
F&B 5th Semester exam Class Notes (2).pptx
Soldering technics Aerospace electronic assembly
Cattle Scales (https://cattlescales.com.au/)
Malnutrition_Presentation_Revised.pptxhwjsjjsjs
Gain improved scalability over networks and sub-networks using DVI extender
Quiz template 300 pages advanced and Tech friendly
Purple Pink Gradient Modern Metaverse Presentation_20250817_191428_0000.pptx
Melt Flow Index Tester from Perfect Group India
Instit16health2.pdfghujjjgjkkggikjhgghhhjj
Access List. Configuration of Layer three Router Access List
Buy Abortion Pills Best Prices In Qatar$
Computer Hardware - Technology and Livelihood Education
COA______________₹₹_₹₹33₹₹₹33₹₹₹3UNIT1V8.ppt
ppt to the world finance to the world in growing
Java_Basics_Grade6 powerpoint prese.pptx
Presentation utk shar baurlah bhhkuaie.pptx
UAV WHAT MINERAL ARE REQUIRED FOR MAKING OF UAV
Operating_Systems_Presentation_With_Icons (1).pptx
加拿大埃德蒙顿康考迪亚大学毕业证书{CUE海牙认证CUE成绩单防伪}100%复刻
script scriptscriptscriptscriptscriptscript
Ad

ICFHR'18-DataAug.pptx

  • 1. IMPROVING CNN-RNN HYBRID NETWORKS FOR HANDWRITING RECOGNITION Kartik Dutta, Praveen Krishnan, Minesh Mathew and C.V. Jawahar CVIT, IIIT Hyderabad, India
  • 2. Problem conference in london will it end with Word Recognition Line Recognition
  • 3. Prior works • Using BLSTM’s  Formulated as Sequence-2-Sequence problem.  Bidirectional LSTM networks using CTC layer [Bluche ICDAR’15, Sueiras Neurocomputing’18] for recognition. • Shi 2016 proposed a SoA Hybrid architecture for scene text recognition
  • 4. Prior works • Variations of BLSTM’s  MDLSTM [Voigtlaender, et al ICFHR’16]  SepMDLSTM [Chen et al ICDAR’17] • Puigcerver et al., ICDAR 2017, analyze on the effectiveness of BLSTM’s vs MDLSTM’s.
  • 5. Prior works • Wigington et al., ICDAR’17 use new pre-processing & augmentation strategies Profile Normalization Elastic Distortion
  • 6. Our Prior work (DAS 18) CNN-RNN Hybrid network Feature Sequence from CNN layer
  • 7. STN • Used to correct Distortions in Input • E2E trainable • Components  Localization Network  Grid Generator  Sampler Jaderberg et al., NIPS, 2015
  • 8. Contributions • Pre-training, Data Augmentation & Normalization Word/Line Normalization Multi Scale Elastic Distortion Synthetic Data
  • 9. Pre-processing • We use the algorithm used in Vinciarelli et al., PR, 2001 • Shear i/p image & evaluate to histogram of contours of nearly vertical strokes • No parameter tuning
  • 11. Multi-Scale Training • In order to predict characters at multiple scales. • Fix a 2d rectangle size  Scale i/p image to larger or smaller sizes  Translate the transformed image.
  • 12. Data Augmentation • Affine Transformation  Combination of Rotation, Scaling & Translation • Elastic Distortion  Using a random displacement field, each pixel is interpolated.  The field is smoothed using a Gaussian filter.
  • 13. Pre-Training • IIIT-HWS dataset  10M word images  10K vocabulary words • Rendered using open source handwritten type fonts. • Parameters of rendering:-  kerning level , stroke width  Sampling of foreground and background pixel distribution from a Gaussian distributions P Krishnan and CV Jawahar, Generating Synthetic Data for Text Recognition, arXiv 2016
  • 14. Datasets Dataset Historical #Lines #Words #Writers Rimes No 12,093 66,982 1300 GW Yes 656 4,894 1 IAM No 13,353 1,15,320 657 Sample word images
  • 15. Evaluation Protocol • Lexicon based and free decoding • Evaluation  Mean word error rate (WER)  Mean character error rate (CER) based on Levenshtein distance between the two words.
  • 16. Ablation Study-I • Let check performance of the original CRNN network only on IAM train data 22.86 We know deep learning architectures are data hungry Let us try pre-training with synthetic data WER
  • 17. Ablation Study-II • Let check performance of the original CRNN network only on IAM train data, pre-trained on IIIT-HWS data 22.86 Let us try out a few architectural improvements First a STN layer 20.10 WER
  • 18. Ablation Study-III • Let us add a STN layer & use the same training scheme as before Let us add residual blocks to our network 18.3 WER 22.86 20.10 WER
  • 19. Ablation Study-IV • Let us add more (residual) conv. layers & use the same training scheme as before Using a deeper network helps with HWR Let us try using slant correction 18.3 WER 22.86 20.10 WER 16.19
  • 20. Ablation Study-V • Let us add pre-processing to the previous architecture and training strategy A little improvement Now let us see the improvement with our augmentation strategies 18.3 WER 22.86 20.10 WER 16.19 15.79
  • 21. Ablation Study-VI • Let check performance of our previous network, with the same strategy, but with our data augmentation strategy Data augmentation makes a huge difference in HWR 18.3 WER 22.86 20.10 WER 16.19 13.16 15.79
  • 22. Ablation Study-VII • Let check performance of our previous network, with the same strategy, but with our data augmentation strategy Finally performing a Test time augmentation 18.3 WER 22.86 20.10 WER 16.19 15.79 13.16 12.61
  • 23. Ablation: IAM Isolated HWR Method WER CER CRNN 22.86 11.08 CRNN-Synth 20.10 9.31 SCRNN-Synth 18.3 7.82 SDCRNN-Synth 16.19 6.34 PP-SDCRNN-Synth 15.79 5.98 PP-SDCRNN-Synth +Augmentation 12.61 4.88
  • 24. Isolated Word Recognition-I Krishnan et al. DAS'18 Wigington et al. ICDAR'17 Sueiras et al. Neuroco mputing'1 8 This Work Sueiras et al. -- Full Lexicon Neuroco mputing'1 8 Stuner et al. -- Full Lexicon CoRR'16 Pozanski et al. -- Full Lexicon CVPR'16 Krishnan et al. -- Full Lexicon DAS'18 Wigington et al. -- Full Lexicon ICDAR'17 This Work -- Full Lexicon WER 16.19 19.07 23.8 12.61 12.7 5.93 6.45 5.1 5.71 4.8 CER 6.34 6.07 8.8 4.88 6.2 2.78 3.44 2.66 3.03 2.52 0 5 10 15 20 25 IAM WER CER
  • 25. Isolated Word Recognition-II Wigington et al. ICDAR'17 Sueiras et al. Neurocomp uting'18 This Work Sueiras et al. Neurocomp uting'18 -- Comp. Lexicon Pozanski et al. CVPR'16 -- Comp. Lexicon Wigington et al. ICDAR'17 -- Comp. Lexicon Stuner et al. CoRR'16 -- Comp. Lexicon This Work -- Comp. Lexicon WER 11.29 15.9 7.04 6.6 3.9 2.85 3.48 1.86 CER 3.09 4.8 2.32 2.6 1.9 1.36 1.34 0.65 0 2 4 6 8 10 12 14 16 18 RIMES WER CER
  • 26. Line Level Recognition-I Pham et al. ICFHR'14 Krishnan et al. DAS'18 Chen et al. ICDAR'17 Puigcerver et al. ICDAR'17 This Work WER 35.1 32.89 34.55 18.4 17.82 CER 10.8 9.78 11.15 5.8 5.7 0 5 10 15 20 25 30 35 40 IAM WER CER
  • 27. Filter Visualizations • First column is input, rest are activations Taken from 2nd conv layer
  • 29. Conclusion • We show the state-of-the-art DL architecture for handwriting recognition  CNN-RNN encoder decoder and STN module  Pretraining with synthetic data  Preprocessing with slant correction  Various Augmentations o Multi scale o Elastic + Affine o Test Time

Editor's Notes

  • #2: Good Evening! Hi, I am
  • #4: In the space of offline handwritten word recognition, RNNs (recurrent neural networks) especially BLSTM using CTC layer proposed by Graves et.al is the most successful ones where the underlying problem of text recognition is formulated as sequence2sequence mapping. There were many follow-up works such as Bluche et.al, which further improved its applicability to word recognition. Shi 2016 Hybrid architecture : A very successful hybrid CNN+RNN Scene text reco architecture which was successfully adapted for HWR
  • #5: Various variations of BLSTM’s such as MDLSTM, MDirLSTM, etc have been proposed so as to further improve word recognition results. Recently Puigcerver et al., ICDAR’17 questioned the effectiveness of these variations over BLSTM’s for HWR. Also language models are used to aid line level recognition, along with lexicon for word level recognition.
  • #6: Stuner: Cascade of LSTM’s, similar to a cascade of weak classifiers. If a word is rejected through the whole cascade, Viterbi decoding is used. However, it can only do lexicon based decoding. Wigington used a network similar to the original CRNN network, but added profile normalization and a variation of elastic distortion augmentation scheme to achieve previous state of the art results for unconstrained HWR.
  • #7: Now we come to our architecture. The top figure shows the main components of our model. We first have a spatial transformer network, to remove geometric distortion in the input. Then we have a convolutional block arranged like ResNet-18. Refer to the bottom figure now. Suppose we had a lot of 2d feature maps as shown here. Suppose we reshaped each feature map to a 1 d vector as shown above. Then the feature maps from the CNN now becomes a temporal sequence of feature vectors. Now coming back to the CNN-RNN hybrid network, the feature sequence from the last conv layer is given as input to the BLSTM layer. At the end of our network we have the CTC loss function, which we backpropogate to train our network.
  • #8: The Spatial Transformer Network or STN module was introduced by Jaderberg et al. in 2015. He showed how this module could be useful in cases of remove distortions in scene text. Since handwritten data also has distortion due to variable hand movements, we decided to include this layer in our architecture. It is an e2e trainable layer and does not require a separate loss function to be trained. It consists of 3 components as listed. Localization N/w gives us the parameters of the transformation, say affine that we wish to apply. The grid generator and the sampler generates the output feature map and apply the transformation onto it.
  • #10: We use the image de-slant and de-sloping technique proposed by Vinciarelli et al. at PR in 2001. We use it both during word and line level recognition. This method requires no parameter tuning and is used directly on both isolated word level and line level images.
  • #12: : The idea of multi-scale transformation is to learn to predict characters at multiple scales. The scale of a character is dependent on the context in which it occurs in a word. If the initial input image is larger than fixed box size, can only augment it at smaller scales. If it is smaller than fixed box size than do larger or smaller size augmentation, pad remaining space. Wigington et al. also presents a method which address this issue by normalizing the scale of all images present in a dataset using profile normalization. Our approach address the same issue with the help of data augmentation while training the network. 
  • #13: Human handwriting has a high degree of oscillation. These variations can be captured to certain extend using elastic distortions. The basic idea is to generate a random displacement field which dictates the computation of new location to each pixel through interpolation. The top row shows the input images, the 2nd row images shows the output image after applying the affine transformation, the 3rd row images shows the output image after applying elastic distortion and the last row shows the output images after applying affine+elastic distortion.
  • #16: There are two ways in which we decode the output given by our network. In the Lexicon or constrained setting, the network choses a word from the input dictionary that we provide that minimizes the CTC loss. In the unconstrained or lexicon free setting, the network is not constrained by any such dictionary. We use 2 evaluation metrics, the mean Character and Word error rate. The CER is based on the leveshtien distance b/w the gt and predicted sequence. The WER is 1 for a prediction if the CER is non zero. We take the mean of both across all the test samples.
  • #17: Let’s perform an ablation study using the original CRNN architecture by training it from scratch using IAM train set and test it on IAM test corpus. The performance is reported using WER for word recognition and we do decoding in an unconstrained setting.
  • #18: Let’s keep our architecture same and pre-train it from synthetic data before fine-tuning it using the IAM train handwriting dataset and test it on the corresponding test corpus. The performance is reported using WER for word recognition and we do decoding in an unconstrained setting. There is a clear improvement in performance.
  • #19: Continuing the story
  • #20: Continuing the story
  • #21: Continuing the story
  • #22: In addition to the above + From the original CRNN model we progressively reduce the error and obtain an absolute reduction in our WER and CER by more than 45% and 56% respectively.
  • #23: In addition to the above + From the original CRNN model we progressively reduce the error and obtain an absolute reduction in our WER and CER by more than 45% and 56% respectively.
  • #24: This table shows the summary of the last 6 slides where we talk about the ablation study that was performed by us in this paper.
  • #27: Clarify that all are unconstrained
  • #28: To get further insights into the workings of the convolutional layers, we visualize the layer activations of an initial convolution layer on passing an image through the trained network. The first column shows the pre-processed input image taken from the IAM dataset. The first and second column shows filter which activate on the foreground and background respectively. The 3rd and 4th columns shows filter which act as horizontal and vertical line detectors respectively.
  • #29: Here we see qualitative results for the IAM dataset for isolated HWR We achieve accurate results with ambiguously written words. Most of the failures suffer from ambiguity in the visual space, improper segmentation or due to the presence of an extra character at the end of a word.
  • #30: To recap, we presented the state of the art model for handwriting recognition. There are 6 main components in our model : CNN-RNN hybrid network, STN module, having a deep network with residual layers, pre-training with synthetic data. Also, we apply slant and slope correction pre-processing technique, along with multi scale, affine, elastic and test time augmentation in our network.
  • #31: Any questions ?