SlideShare a Scribd company logo
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Text Localization in Scientific Figures
using Fully Convolutional Neural Networks
on Limited Training Data
Morten Jessen, Falk B¨oschen, Ansgar Scherp
DocEng, September 2019
Morten Jessen, Falk B¨oschen, Ansgar Scherp 1 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Motivation
Figures are widely used in scientific papers, media, and other
Figures often contain information that is not present in the
surrounding text and transport core message(s) of a document
Extracted text can be used for
improving existing retrieval systems
building (better) figure retrieval systems
making figures available to visually impaired people
. . .
However, common Optical Character Recognition (OCR)
engines have problems with processing figures
So far the focus was on unsupervised approaches due to the
lack of training data
Morten Jessen, Falk B¨oschen, Ansgar Scherp 2 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Our previous unsupervised Approach [MTAP’18]
Observation: text localization (1)-(4) is most challenging part
Propose a supervised approach for text localization that can
work with limited training data =⇒ DocEng’19
Morten Jessen, Falk B¨oschen, Ansgar Scherp 3 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Datasets of Scholarly Figures
CHIME-R CHIME-S DeGruyter EconBiz DeTEXT
Number of Images 115 85 120 121 192
Text elements 14 12 24 25 14
Words 18 18 34 35 20
Characters 76 69 149 151 120
Available datasets are quite small
Makes training of supervised methods difficult
Morten Jessen, Falk B¨oschen, Ansgar Scherp 4 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
CHIME-R Examples
115 real images
bar-, pie- and line-charts
Morten Jessen, Falk B¨oschen, Ansgar Scherp 5 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
CHIME-S Examples
85 synthetic images
bar-, pie- and line-charts
Morten Jessen, Falk B¨oschen, Ansgar Scherp 6 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
DeGruyter Examples
120 figures from academic books
Additional content: Scatter-, flow-/process-charts, histograms
Morten Jessen, Falk B¨oschen, Ansgar Scherp 7 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
EconBiz Examples
121 randomly extracted scholarly figures
Additional content: maps
Morten Jessen, Falk B¨oschen, Ansgar Scherp 8 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
DeTEXT Examples
192 biomedical images
Additional content: medical images (real and abstract)
Morten Jessen, Falk B¨oschen, Ansgar Scherp 9 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Overview
Focus on a neural network based approach for text
localization in scientific figures
Evaluate different approaches to address the challenge of
limited training data
Pre-Training on large datasets
Artificial dataset extension
We use a common Optical Character Recognition engine for
text recognition (Tesseract)
Morten Jessen, Falk B¨oschen, Ansgar Scherp 10 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Faster R-CNN
Figure: Faster R-CNN Architecture [Ren et al., 2015].
Morten Jessen, Falk B¨oschen, Ansgar Scherp 11 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Pre-Training on COCO-Text
COCO-Text: images from MS-COCO plus text annotations
We use images with English, machine written and legible text
145, 000 text annotations on 63, 686 images
(avg.: 2.28 annotations/image)
Morten Jessen, Falk B¨oschen, Ansgar Scherp 12 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Artificial Dataset Extension
Extend the figure datasets with transformed versions
(rotation, noise, translation, flipping, rescaling) of each figure
Morten Jessen, Falk B¨oschen, Ansgar Scherp 13 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Text Exraction with Tesseract 4.0
OCR engine using LSTM neural network
Text extraction process
Generate multiple input images from one bounding box
(provded by Faster R-CNN)
Stop when Tesseract’s confidence score is ≥ 96%, OR
take best otherwise
Morten Jessen, Falk B¨oschen, Ansgar Scherp 14 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Localization
Pre-Training
Artificial Dataset Extension
Recognition
Preprocessing for OCR
Increase bounding box size by 5px
Added white border, 25px
Rotations: 0°, 90°, 270°, 45°, 315°, 30°, 60°, 300°, 330°
Resizing shortest side to: 100px, 200px
Binarization
Morten Jessen, Falk B¨oschen, Ansgar Scherp 15 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Evaluation Measures
Text localization: detection of bounding boxes
Average Prevision (AP), AP50, AP75 over
“Intersection over Union” (IoU)
Precision, Recall
Text recognition: extraction of text from bounding boxes
Levenshtein Distance: number of edits needed to correct word
Gestalt Pattern Matching: correctness of extraction in relation
to word length
Morten Jessen, Falk B¨oschen, Ansgar Scherp 16 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Intersection over Union (IoU)
(a) Definition of IoU.
(b) Examples for IoU.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 17 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Average Precision (AP) over IoU
Figure: Visualization of different IoU Values.
AP50: Percentage of predictions with IoU > 0.5
AP75: Percentage of predictions with IoU > 0.75
AP: Summary metric, combines ten equally spaced IoU
thresholds (0.50, 0.55, 0.60, ..., 0.90, 0.95)
Morten Jessen, Falk B¨oschen, Ansgar Scherp 18 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Effect of Pre-Training on COCO-Text
Pretraining: none on COCO-Text
AP50 91.35% 95.21%
AP75 63.49% 76.33%
AP 58.37% 65.98%
Table: Comparison for training with and without pre-training on
COCO-Text.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 19 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Effect of Dataset Augmentation
AP AP75 AP 50
without augmentation 52.90% 53.02% 90.34%
with augmentation 60.81% 67.57% 92.88%
Table: Comparison: Effect of artificially extended dataset on ResNet101.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 20 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Generalization Experiments: Train on 4 + Test on Last
Tested on AP50 AP75 AP
CHIME-R 80.45% 45.19% 45.82%
CHIME-S 87.73% 30.59% 41.12%
DeGruyter 86.63% 35.88% 43.06%
EconBiz 84.61% 15.88% 34.03%
DeTEXT 70.32% 29.49% 34.46%
Table: Generalization: Training on four of the datasets for 200, 000
iterations and testing on the fifth dataset.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 21 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Comparison to Unsupervised Approach: Localization
Precision Recall F1 (STD)
TX 0.66 0.55 0.56 (0.25)
NN 0.86 0.83 0.87 (0.12)
Table: Comparison of the unsupervised approach (TX) with our proposed
supervised approach (NN) for text localization in scientific figures.
Morten Jessen, Falk B¨oschen, Ansgar Scherp 22 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Comparison to Unsupervised Approach: Recognition
Levenshteinavg (SD) Global Levenshtein (SD)
TX 6.23 (4.93) 108.81 (108.53)
NN 3.44 (4.42) 39.11 (41.75)
Table: Comparison of text recognition of the unsupervised approach (TX)
with our proposed supervised approach (NN).
Morten Jessen, Falk B¨oschen, Ansgar Scherp 23 / 24
Introduction
Datasets
Overview: Supervised Approach
Results
Outlook
Summary
Proposed a supervised text extraction approach from scientific
figures using neural networks
Showed that dataset extension and pre-training with natural
images alleviates problem of limited training data
Supervised approach outperforms the previously known best
unsupervised approach(es)
Capable of handling different datasets: generalizes to new
datasets if they contain figures of same type
Thank you! Any questions? Email: ansgar.scherp@essex.ac.uk
Morten Jessen, Falk B¨oschen, Ansgar Scherp 24 / 24

More Related Content

Similar to Text Localization in Scientific Figures using Fully Convolutional Neural Networks on Limited Training Data (20)

PDF
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
sherinmm
 
PPTX
Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya
 
PPTX
Power and sample size calculations for survival analysis webinar Slides
nQuery
 
PPTX
Innovative Sample Size Methods For Clinical Trials
nQuery
 
PPTX
Bayesian Approaches To Improve Sample Size Webinar
nQuery
 
PDF
Non-parametric Subject Prediction
Shenghui Wang
 
PPTX
Master Research Presentation
akaspers
 
PDF
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
cscpconf
 
PPS
Probability Forecasting - a Machine Learning Perspective
butest
 
PDF
WE1.TO9.2.pdf
grssieee
 
PDF
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
PPTX
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
PPTX
Final_Presentation.pptx
SudeekshaKoricherla
 
PPTX
Particle swarm optimization
Mahesh Tibrewal
 
PDF
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
Saeid Balaneshinkordan (Balaneshin-kordan)
 
PDF
Image enhancement used in Deep Learning models.pdf
venkateshmuvce
 
PPTX
Lung abnormalities indentification with explainable AI
surajsuruhb
 
PPT
32_Nov07_MachineLear..
butest
 
PDF
Noise-robust classification with hypergraph neural network
nooriasukmaningtyas
 
PDF
2-IJCSE-00536
Boshra Albayaty
 
MAXIMUM CORRENTROPY BASED DICTIONARY LEARNING FOR PHYSICAL ACTIVITY RECOGNITI...
sherinmm
 
Aditya Bhattacharya Chest XRay Image Analysis Using Deep Learning
Aditya Bhattacharya
 
Power and sample size calculations for survival analysis webinar Slides
nQuery
 
Innovative Sample Size Methods For Clinical Trials
nQuery
 
Bayesian Approaches To Improve Sample Size Webinar
nQuery
 
Non-parametric Subject Prediction
Shenghui Wang
 
Master Research Presentation
akaspers
 
SVM-PSO based Feature Selection for Improving Medical Diagnosis Reliability u...
cscpconf
 
Probability Forecasting - a Machine Learning Perspective
butest
 
WE1.TO9.2.pdf
grssieee
 
[DSC Europe 23][DigiHealth] Vesna Pajic - Machine Learning Techniques for omi...
DataScienceConferenc1
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Dmitry Grapov
 
Final_Presentation.pptx
SudeekshaKoricherla
 
Particle swarm optimization
Mahesh Tibrewal
 
Optimization Method for Weighting Explicit and Latent Concepts in Clinical De...
Saeid Balaneshinkordan (Balaneshin-kordan)
 
Image enhancement used in Deep Learning models.pdf
venkateshmuvce
 
Lung abnormalities indentification with explainable AI
surajsuruhb
 
32_Nov07_MachineLear..
butest
 
Noise-robust classification with hypergraph neural network
nooriasukmaningtyas
 
2-IJCSE-00536
Boshra Albayaty
 

More from Ansgar Scherp (20)

PPTX
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Ansgar Scherp
 
PDF
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
Ansgar Scherp
 
PPTX
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
Ansgar Scherp
 
PDF
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
Ansgar Scherp
 
PPTX
Mining and Managing Large-scale Linked Open Data
Ansgar Scherp
 
PDF
Knowledge Discovery in Social Media and Scientific Digital Libraries
Ansgar Scherp
 
PPTX
A Comparison of Different Strategies for Automated Semantic Document Annotation
Ansgar Scherp
 
PPTX
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Ansgar Scherp
 
PDF
A Framework for Iterative Signing of Graph Data on the Web
Ansgar Scherp
 
PDF
Smart photo selection: interpret gaze as personal interest
Ansgar Scherp
 
PPTX
Events in Multimedia - Theory, Model, Application
Ansgar Scherp
 
PPTX
Can you see it? Annotating Image Regions based on Users' Gaze Information
Ansgar Scherp
 
PPTX
Linked open data - how to juggle with more than a billion triples
Ansgar Scherp
 
PPTX
SchemEX -- Building an Index for Linked Open Data
Ansgar Scherp
 
PPTX
SchemEX -- Building an Index for Linked Open Data
Ansgar Scherp
 
PPTX
A Model of Events for Integrating Event-based Information in Complex Socio-te...
Ansgar Scherp
 
PPTX
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
Ansgar Scherp
 
PPTX
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
Ansgar Scherp
 
PPTX
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Ansgar Scherp
 
PPTX
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Ansgar Scherp
 
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Ansgar Scherp
 
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
Ansgar Scherp
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
Ansgar Scherp
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
Ansgar Scherp
 
Mining and Managing Large-scale Linked Open Data
Ansgar Scherp
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Ansgar Scherp
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
Ansgar Scherp
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Ansgar Scherp
 
A Framework for Iterative Signing of Graph Data on the Web
Ansgar Scherp
 
Smart photo selection: interpret gaze as personal interest
Ansgar Scherp
 
Events in Multimedia - Theory, Model, Application
Ansgar Scherp
 
Can you see it? Annotating Image Regions based on Users' Gaze Information
Ansgar Scherp
 
Linked open data - how to juggle with more than a billion triples
Ansgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
Ansgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
Ansgar Scherp
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
Ansgar Scherp
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
Ansgar Scherp
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
Ansgar Scherp
 
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Ansgar Scherp
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Ansgar Scherp
 
Ad

Recently uploaded (20)

PPTX
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
PDF
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
PDF
Portable Hyperspectral Imaging (pHI) for the enhanced recording of archaeolog...
crabbn
 
PPTX
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
PPTX
Class12_Physics_Chapter2 electric potential and capacitance.pptx
mgmahati1234
 
PPTX
Bacillus thuringiensis.crops & golden rice
priyadharshini87125
 
PDF
Annual report 2024 - Inria - English version.pdf
Inria
 
PDF
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
PDF
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
PDF
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
PDF
Unit-5 ppt.pdf unit 5 organic chemistry 3
visionshukla007
 
PDF
Plankton and Fisheries Bovas Joel Notes.pdf
J. Bovas Joel BFSc
 
PPTX
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
PDF
Asthamudi lake and its fisheries&importance .pdf
J. Bovas Joel BFSc
 
PDF
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
PDF
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
PPTX
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
PPTX
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
PPTX
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
PDF
oil and gas chemical injection system
Okeke Livinus
 
Diagnostic Features of Common Oral Ulcerative Lesions.pptx
Dr Palak borade
 
High-speedBouldersandtheDebrisFieldinDARTEjecta
Sérgio Sacani
 
Portable Hyperspectral Imaging (pHI) for the enhanced recording of archaeolog...
crabbn
 
770043401-q1-Ppt-pe-and-Health-7-week-1-lesson-1.pptx
AizaRazonado
 
Class12_Physics_Chapter2 electric potential and capacitance.pptx
mgmahati1234
 
Bacillus thuringiensis.crops & golden rice
priyadharshini87125
 
Annual report 2024 - Inria - English version.pdf
Inria
 
Integrating Lifestyle Data into Personalized Health Solutions (www.kiu.ac.ug)
publication11
 
Plant growth promoting bacterial non symbiotic
psuvethapalani
 
Rapid protoplanet formation in the outer Solar System recorded in a dunite fr...
Sérgio Sacani
 
Unit-5 ppt.pdf unit 5 organic chemistry 3
visionshukla007
 
Plankton and Fisheries Bovas Joel Notes.pdf
J. Bovas Joel BFSc
 
Q1_Science 8_Week3-Day 1.pptx science lesson
AizaRazonado
 
Asthamudi lake and its fisheries&importance .pdf
J. Bovas Joel BFSc
 
Carbonate formation and fluctuating habitability on Mars
Sérgio Sacani
 
Carbon-richDustInjectedintotheInterstellarMediumbyGalacticWCBinaries Survives...
Sérgio Sacani
 
Q1 - W1 - D2 - Models of matter for science.pptx
RyanCudal3
 
Systamatic Acquired Resistence (SAR).pptx
giriprasanthmuthuraj
 
Microbiome_Engineering_Poster_Fixed.pptx
SupriyaPolisetty1
 
oil and gas chemical injection system
Okeke Livinus
 
Ad

Text Localization in Scientific Figures using Fully Convolutional Neural Networks on Limited Training Data