Scene Text Understanding

Motivation

Recognizing scene text is a challenging problem, even more so than the recognition of scanned documents. Given the rapid growth of camera-based applications readily available on mobile phones, understanding scene text is more important than ever. One could, for instance, foresee an application to answer questions such as, “What does this sign say?”. This is related to the problem of Optical Character Recognition (OCR), which has a long history in the computer vision community. However, the success of OCR systems is largely restricted to text from scanned documents. Scene text exhibits a large variability in appearances, and can prove to be challenging even for the state-of-the-art OCR methods. Many scene understanding methods recognize objects and regions like roads, trees, sky in the image successfully, but tend to ignore the text on the sign board. Our goal is to fill this gap in understanding the scene.

Highlights

Binarization as a labelling problem (see our ICDAR'11 paper)
Both open and closed vocabulary word recognition (see our CVPR'12 paper and BMVC'12 paper)
Use of both top-down (lexicons) and bottom-up cues (character detection)
Holistic recognition approach of lexicon-driven scene text recognition (see our ICDAR'13 paper)
Applications in image retrieval (see our ICCV'13 paper)
Word recognition for large lexicions and cropped word image retrieval (see our ACCV'14 paper)

Code

Generating exemplars (based on our ICDAR'13 paper)

README

Coming Soon:

Scene Text Binarization

Scene Character Recognition

Datasets

IIIT Scene Text Retrieval (IIIT STR)

Video Scene Text Retrieval Datasets (Sports-10K and TV series-1M)

Related Publications

1. Udit Roy, Anand Mishra, Karteek Alahari and C. V. Jawahar, Scene Text Recognition and Retrieval for Large Lexicons, ACCV 2014 [pdf][Abstract][Poster][Lexicons][bibtex]

2. Anand Mishra, Karteek Alahari and C. V. Jawahar, Image Retrieval using Textual Cues, IEEE ICCV 2013
[pdf][Abstract][Project page][bibtex]

3. Vibhor Goel, Anand Mishra, Karteek Alahari and C. V. Jawahar, Whole is Greater than Sum of Parts: Recognizing Scene Text Words, ICDAR 2013. [pdf][Abstract][bibtex]

4. Anand Mishra, Karteek Alahari and C. V. Jawahar, Scene Text Recognition using Higher Order Language Priors, BMVC 2012 (Oral). [pdf][Abstract][Slides][bibtex]

5. Anand Mishra, Karteek Alahari and C. V. Jawahar, Top-down and Bottom-up cues for Scene Text Recognition, IEEE CVPR 2012. [pdf][Abstract][Poster][bibtex]

6. Anand Mishra, Karteek Alahari and C.V. Jawahar, An MRF model for Binarization of Natural Scene Texts, ICDAR 2011 (Oral). [pdf][Abstract] [Slides] [bibtex]

People

Udit Roy
Anand Mishra
Karteek Alahari
C. V. Jawahar

Acknowledgements

Anand Mishra is partly supported by MSR India PhD Fellowship 2012.

Copyright Notice

The documents contained in these directories are included by the contributing authors as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author's copyright.