SlideShare a Scribd company logo
3
Most read
10
Most read
12
Most read
SEMINAR
on
O.C.R
(OPTICAL CHARACTER RECOGNITION)
Presenting by
Rahul Kumar Mallik
Student of B.Tech. (CSE)-4th Year/ VllI Semester (2018-19)
Introduction !
 Suppose you wanted to digitize a magazine article or a printed contract !
 You could spend hours retyping and then correcting misprints or you could
convert all the required material into digital format several minutes using a
scanner .
Here we Introduce O.C.R !
O.C.R !
 Optical character recognition (also optical character reader, OCR) is
the mechanical or electronic conversion of images of typed, handwritten
or printed text into machine-encoded text, whether from a scanned
document, a photo of a document, a scene-photo (for example the text on
signs and billboards in a landscape photo) or from subtitle text
superimposed on an image
 It is a common method of digitising printed texts so that they can be
electronically edited, searched, stored more compactly, displayed on-line,
and used in machine processes such as cognitive computing , machine
translation, (extracted) text-to-speech, key data and text mining. OCR
is a field of research in pattern recognition, artificial
intelligence and computer vision.
 O.C.R technology involving telegraphy and creating
reading device for the blind .
o Emanuel Goldberg developed a machine that read
character and converted them into standard
telegraph code.
o In the late 1920s and in 1930s Emanuel Goldberg
developed what he called a ”Statistical Machine”
for searching microfilm archives using O.C.R .
o In 1931 he was granted USA Patent No. 1,838,389
for the Invention.
o The Patent was acquired by IBM .
31 Aug 1881 – 13 Sep 1970
EMANUEL GOLDBERG
INVENTION OF OCR !
Techniques used for Developing O.C.R ?
 De-skew – If the document was not aligned properly when scanned, it may need to be tilted a
few degrees clockwise or counterclockwise in order to make lines of text perfectly horizontal
or vertical.
 Despeckle – remove positive and negative spots, smoothing edges
 Line and word detection – Establishes baseline for word and character shapes,
separates words if necessary.
 Script recognition – In multilingual documents, the script may change at the level of
the words and hence, identification of the script is necessary, before the right OCR
can be invoked to handle the specific script.
 Character isolation or "segmentation"
 Normalise aspect ratio and scale .
 Line removal – Cleans up non-glyph boxes and lines
 Layout analysis or "zoning" – Identifies columns, paragraphs, captions, etc. as
distinct blocks. Especially important in multi-column layouts and tables.
BINARISATION
 Convert an image from color or greyscale to black-and-white
(called a "binary image" because there are two colours ).
 The task of binarisation is performed as a simple way of
separating the text (or any other desired image component) from
the background. The task of binarisation itself is necessary since
most commercial recognition algorithms work only on binary images
since it proves to be simpler to do so. In addition, the effectiveness
of the binarisation step influences to a significant extent the quality
of the character recognition stage and the careful decisions are
made in the choice of the binarisation employed for a given input
image type; since the quality of the binarisation method employed
to obtain the binary result depends on the type of the input image
(scanned document, scene text image, historical degraded
document etc.).
FORCING BETTER INPUT !
 Special fonts like OCR-A, OCR-B, or MICR fonts, with precisely specified
sizing, spacing, and distinctive character shapes, allow a higher accuracy
rate during transcription. These were often used in early matrix-matching
systems.
 "Comb fields" are pre-printed boxes that encourage humans to write more
legibly – one glyph per box. These are often printed in a "dropout color“
which can be easily removed by the OCR system.
 Palm OS used a special set of glyphs, known as "Graffiti" which are
similar to printed English characters but simplified or modified for easier
recognition on the platform's computationally limited hardware. Users
would need to learn how to write these special glyphs.
 Zone-based OCR restricts the image to a specific part of a document. This
is often referred to as "Template OCR".
Digital Library !
A digital library is a special library with a collection of digital objects that can
include text, visual material, audio material, video material, stored
as electronic media formats (as opposed to print, or other media.
, along with means for organizing, storing, and retrieving the files and media
contained in the library collection. Digital libraries can vary immensely in size
and scope, and can be maintained by individuals, organizations, or affiliated
with established physical library buildings or institutions, or with academic
institutions. The digital content may be stored locally, or accessed remotely
via computer networks. An electronic library is a type of information
retrieval system. These information retrieval systems are able to exchange
information with each other through interoperability and sustainability .
Applications of O.C.R !
OCR engines have been developed into many kinds of domain-specific OCR
applications, such as receipt OCR, invoice OCR, check OCR, legal billing document
OCR.
They can be used for:
 Data entry for business documents, e.g. check ,passport, invoice, bank statement
and receipt
 Automatic number plate recognition
 Automatic insurance documents key information extraction
 Extracting business card information into a contact list
 More quickly make textual versions of printed documents, e.g. book
scanning for Project Gutenberg
 Make electronic images of printed documents searchable, e.g. Google Books
 Converting handwriting in real time to control a computer (pen computing)
 Defeating CAPTCHA anti-bot systems, though these are specifically designed to
prevent OCR. The purpose can also be to test the robustness of CAPTCHA anti-bot
system
ACCURACY !
Accuracy rates can be measured in several ways, and how they are measured can
greatly affect the reported accuracy rate. For example, if word context (basically a
lexicon of words) is not used to correct software finding non-existent words, a
character error rate of 1% (99% accuracy) may result in an error rate of 5% (95%
accuracy) or worse if the measurement is based on whether each whole word was
recognized with no incorrect letters.
 The Information Science Research Institute (ISRI) had the mission to foster the improvement of
automated technologies for understand machines printed documents and its conducted the most
authoritative of the Annual Test of OCR Accuracy from 1992 to 1996.
 65% Accurate on Latin , even the image is clear.
 Total accuracy can be achieved by human review or Data Dictionary Authentication.
 Other areas—including recognition of hand printing, cursive handwriting, and
printed text in other scripts (especially those East Asian language characters
which have many strokes for a single character)—are still the subject of active
research. The MNIST database is commonly used for testing systems' ability to
recognise handwritten digits.
Benefits of OCR !
 Documents will be text-searchable with OCR processing. From that, it gives you the
advantage of using the name of your documents, reference numbers, addresses,
etc; when searching through your data base.
 Saving you lots of time when using a digital file rather than paper documents.
 OCR processing can massively improve your customer services. If you take
incoming calls which require you to access documents then having those
documents available instantly in digital form can make the overall customer
experience better due to the speed of searching for the files they need and the ability
to edit their contents easily.
 Your documents can become editable with OCR. We can convert the files to MS
Word and any other editable digital formats.
 OCR allows you to copy and paste from the document itself whether that’s in PDF
format or MS Word format.
 With low costing OCR processing. The advantage is that it can improve how your
business operates.
 OCR is also known to boost staff morale when their working environment is easier
to work within and less paper-centric
Thank You For
Paying Attention !

More Related Content

PPTX
Optical Character Recognition (OCR) based Retrieval
Biniam Asnake
 
PPTX
Optical Character Recognition (OCR)
Vidyut Singhania
 
PDF
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
ijcsitcejournal
 
DOCX
Optical character recognition IEEE Paper Study
Er. Ashish Pandey
 
PPTX
OCR (Optical Character Recognition)
IstiaqueBinIslam
 
PPTX
Optical character recognition (ocr) ppt
Deijee Kalita
 
PPTX
Final Report on Optical Character Recognition
Vidyut Singhania
 
PPTX
Optical Character Recognition
Durjoy Saha
 
Optical Character Recognition (OCR) based Retrieval
Biniam Asnake
 
Optical Character Recognition (OCR)
Vidyut Singhania
 
A STUDY ON OPTICAL CHARACTER RECOGNITION TECHNIQUES
ijcsitcejournal
 
Optical character recognition IEEE Paper Study
Er. Ashish Pandey
 
OCR (Optical Character Recognition)
IstiaqueBinIslam
 
Optical character recognition (ocr) ppt
Deijee Kalita
 
Final Report on Optical Character Recognition
Vidyut Singhania
 
Optical Character Recognition
Durjoy Saha
 

What's hot (20)

PDF
Optical Character Recognition Using Python
YogeshIJTSRD
 
PPT
Text reader [OCR]
MisbahUddin52
 
PPTX
Optical Character Recognition( OCR )
Karan Panjwani
 
PDF
Optical Character Recognition (OCR) System
iosrjce
 
PPT
optical character recognition system
Vijay Apurva
 
PPTX
OCR Presentation (Optical Character Recognition)
Neeraj Neupane
 
PPTX
Automatic handwriting recognition
BIJIT GHOSH
 
PPTX
Text Detection and Recognition
Badruz Nasrin Basri
 
PPTX
Tamil OCR using Tesseract OCR Engine
balamurugan.k Kalibalamurugan
 
PPTX
Handwriting Recognition
Bindu Karki
 
PDF
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
iosrjce
 
PPTX
Presentation on OCR
xsconfused
 
PPTX
Treebank annotation
Mohit Jasapara
 
PDF
A brief introduction to OCR (Optical character recognition)
Terry Taewoong Um
 
PPTX
offline character recognition for handwritten gujarati text
Bhumika Patel
 
DOC
Ocr abstract
Punya Prakash
 
PPTX
Handwriting Recognition Using Deep Learning and Computer Version
Naiyan Noor
 
PPTX
Handwritten character recognition using artificial neural network
Harshana Madusanka Jayamaha
 
PPTX
Artificial intelligence for speech recognition
sowmith chatlapally
 
Optical Character Recognition Using Python
YogeshIJTSRD
 
Text reader [OCR]
MisbahUddin52
 
Optical Character Recognition( OCR )
Karan Panjwani
 
Optical Character Recognition (OCR) System
iosrjce
 
optical character recognition system
Vijay Apurva
 
OCR Presentation (Optical Character Recognition)
Neeraj Neupane
 
Automatic handwriting recognition
BIJIT GHOSH
 
Text Detection and Recognition
Badruz Nasrin Basri
 
Tamil OCR using Tesseract OCR Engine
balamurugan.k Kalibalamurugan
 
Handwriting Recognition
Bindu Karki
 
Handwritten Character Recognition: A Comprehensive Review on Geometrical Anal...
iosrjce
 
Presentation on OCR
xsconfused
 
Treebank annotation
Mohit Jasapara
 
A brief introduction to OCR (Optical character recognition)
Terry Taewoong Um
 
offline character recognition for handwritten gujarati text
Bhumika Patel
 
Ocr abstract
Punya Prakash
 
Handwriting Recognition Using Deep Learning and Computer Version
Naiyan Noor
 
Handwritten character recognition using artificial neural network
Harshana Madusanka Jayamaha
 
Artificial intelligence for speech recognition
sowmith chatlapally
 
Ad

Similar to Optical Character Recognition (20)

PDF
50120130406005
IAEME Publication
 
PDF
Enhancing OCR Accuracy Using Training Datasets for Digital and Printed Text
Globose Technology Solutions
 
PPTX
Practically genius1
GlobalSuperElite GlobalSuperElite
 
PPTX
Hardware to Software
GlobalSuperElite GlobalSuperElite
 
PDF
How Image-to-Text Converters Work: A Comprehensive Guide
imageocrcontact
 
PPTX
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
NeerajBudhlakoti
 
PPTX
OCR 's Functions
prithvi764
 
PPTX
How to create a corpus of machine-readable texts: challenges and solutions
Monika Renate Barget
 
PDF
How Intelligent Character Recognition (ICR) is Overcoming OCR Limitations in ...
E42 (Light Information Systems Pvt Ltd)
 
PDF
How Intelligent Character Recognition (ICR) is Overcoming OCR Limitations in ...
E42 (Light Information Systems Pvt Ltd)
 
PDF
O45018291
IJERA Editor
 
PPTX
Paper based interaction
Varuna Harshana
 
PPT
05a
Badri Patro
 
PDF
From Data Collection to Text Recognition: The OCR Training Dataset Journey
Globose Technology Solutions
 
PDF
CRC Final Report
Sangram Keshari Senapati
 
PPTX
Automating_Bank_Data_Management_Using_OCR_Technology[2] [Autosaved]-1.pptx
sudharshan1504
 
PDF
Ocr 1
Manoj Nanduri
 
PPTX
OCR Presentation hjhPresentation 23.pptx
SupriyaGhosh51
 
PPT
OCR, optical character reader
Learn with Tibetan Norser
 
PDF
What is Optical Character Recognition (OCR) Technology?
ARC Document Solutions
 
50120130406005
IAEME Publication
 
Enhancing OCR Accuracy Using Training Datasets for Digital and Printed Text
Globose Technology Solutions
 
Hardware to Software
GlobalSuperElite GlobalSuperElite
 
How Image-to-Text Converters Work: A Comprehensive Guide
imageocrcontact
 
OPTICAL CHARACTER RECOGNIZATION NEERAJ.pptx
NeerajBudhlakoti
 
OCR 's Functions
prithvi764
 
How to create a corpus of machine-readable texts: challenges and solutions
Monika Renate Barget
 
How Intelligent Character Recognition (ICR) is Overcoming OCR Limitations in ...
E42 (Light Information Systems Pvt Ltd)
 
How Intelligent Character Recognition (ICR) is Overcoming OCR Limitations in ...
E42 (Light Information Systems Pvt Ltd)
 
O45018291
IJERA Editor
 
Paper based interaction
Varuna Harshana
 
From Data Collection to Text Recognition: The OCR Training Dataset Journey
Globose Technology Solutions
 
CRC Final Report
Sangram Keshari Senapati
 
Automating_Bank_Data_Management_Using_OCR_Technology[2] [Autosaved]-1.pptx
sudharshan1504
 
OCR Presentation hjhPresentation 23.pptx
SupriyaGhosh51
 
OCR, optical character reader
Learn with Tibetan Norser
 
What is Optical Character Recognition (OCR) Technology?
ARC Document Solutions
 
Ad

Recently uploaded (20)

PPTX
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
PPTX
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
PPTX
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
PPTX
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
PPTX
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
PDF
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
PPTX
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
PDF
Virat Kohli- the Pride of Indian cricket
kushpar147
 
PPTX
Basics and rules of probability with real-life uses
ravatkaran694
 
PPTX
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
PPTX
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
PDF
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
PPTX
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
PPTX
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
PDF
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
PPTX
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
PDF
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
PPTX
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
PPTX
Care of patients with elImination deviation.pptx
AneetaSharma15
 
PPTX
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 
Information Texts_Infographic on Forgetting Curve.pptx
Tata Sevilla
 
How to Manage Leads in Odoo 18 CRM - Odoo Slides
Celine George
 
Continental Accounting in Odoo 18 - Odoo Slides
Celine George
 
Cleaning Validation Ppt Pharmaceutical validation
Ms. Ashatai Patil
 
Gupta Art & Architecture Temple and Sculptures.pptx
Virag Sontakke
 
Biological Classification Class 11th NCERT CBSE NEET.pdf
NehaRohtagi1
 
Applications of matrices In Real Life_20250724_091307_0000.pptx
gehlotkrish03
 
Virat Kohli- the Pride of Indian cricket
kushpar147
 
Basics and rules of probability with real-life uses
ravatkaran694
 
Sonnet 130_ My Mistress’ Eyes Are Nothing Like the Sun By William Shakespear...
DhatriParmar
 
An introduction to Prepositions for beginners.pptx
drsiddhantnagine
 
The-Invisible-Living-World-Beyond-Our-Naked-Eye chapter 2.pdf/8th science cur...
Sandeep Swamy
 
CONCEPT OF CHILD CARE. pptx
AneetaSharma15
 
Artificial Intelligence in Gastroentrology: Advancements and Future Presprec...
AyanHossain
 
What is CFA?? Complete Guide to the Chartered Financial Analyst Program
sp4989653
 
Five Point Someone – Chetan Bhagat | Book Summary & Analysis by Bhupesh Kushwaha
Bhupesh Kushwaha
 
The Minister of Tourism, Culture and Creative Arts, Abla Dzifa Gomashie has e...
nservice241
 
CARE OF UNCONSCIOUS PATIENTS .pptx
AneetaSharma15
 
Care of patients with elImination deviation.pptx
AneetaSharma15
 
Dakar Framework Education For All- 2000(Act)
santoshmohalik1
 

Optical Character Recognition

  • 1. SEMINAR on O.C.R (OPTICAL CHARACTER RECOGNITION) Presenting by Rahul Kumar Mallik Student of B.Tech. (CSE)-4th Year/ VllI Semester (2018-19)
  • 2. Introduction !  Suppose you wanted to digitize a magazine article or a printed contract !  You could spend hours retyping and then correcting misprints or you could convert all the required material into digital format several minutes using a scanner . Here we Introduce O.C.R !
  • 3. O.C.R !  Optical character recognition (also optical character reader, OCR) is the mechanical or electronic conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo (for example the text on signs and billboards in a landscape photo) or from subtitle text superimposed on an image  It is a common method of digitising printed texts so that they can be electronically edited, searched, stored more compactly, displayed on-line, and used in machine processes such as cognitive computing , machine translation, (extracted) text-to-speech, key data and text mining. OCR is a field of research in pattern recognition, artificial intelligence and computer vision.
  • 4.  O.C.R technology involving telegraphy and creating reading device for the blind . o Emanuel Goldberg developed a machine that read character and converted them into standard telegraph code. o In the late 1920s and in 1930s Emanuel Goldberg developed what he called a ”Statistical Machine” for searching microfilm archives using O.C.R . o In 1931 he was granted USA Patent No. 1,838,389 for the Invention. o The Patent was acquired by IBM . 31 Aug 1881 – 13 Sep 1970 EMANUEL GOLDBERG INVENTION OF OCR !
  • 5. Techniques used for Developing O.C.R ?  De-skew – If the document was not aligned properly when scanned, it may need to be tilted a few degrees clockwise or counterclockwise in order to make lines of text perfectly horizontal or vertical.  Despeckle – remove positive and negative spots, smoothing edges  Line and word detection – Establishes baseline for word and character shapes, separates words if necessary.  Script recognition – In multilingual documents, the script may change at the level of the words and hence, identification of the script is necessary, before the right OCR can be invoked to handle the specific script.  Character isolation or "segmentation"  Normalise aspect ratio and scale .  Line removal – Cleans up non-glyph boxes and lines  Layout analysis or "zoning" – Identifies columns, paragraphs, captions, etc. as distinct blocks. Especially important in multi-column layouts and tables.
  • 6. BINARISATION  Convert an image from color or greyscale to black-and-white (called a "binary image" because there are two colours ).  The task of binarisation is performed as a simple way of separating the text (or any other desired image component) from the background. The task of binarisation itself is necessary since most commercial recognition algorithms work only on binary images since it proves to be simpler to do so. In addition, the effectiveness of the binarisation step influences to a significant extent the quality of the character recognition stage and the careful decisions are made in the choice of the binarisation employed for a given input image type; since the quality of the binarisation method employed to obtain the binary result depends on the type of the input image (scanned document, scene text image, historical degraded document etc.).
  • 7. FORCING BETTER INPUT !  Special fonts like OCR-A, OCR-B, or MICR fonts, with precisely specified sizing, spacing, and distinctive character shapes, allow a higher accuracy rate during transcription. These were often used in early matrix-matching systems.  "Comb fields" are pre-printed boxes that encourage humans to write more legibly – one glyph per box. These are often printed in a "dropout color“ which can be easily removed by the OCR system.  Palm OS used a special set of glyphs, known as "Graffiti" which are similar to printed English characters but simplified or modified for easier recognition on the platform's computationally limited hardware. Users would need to learn how to write these special glyphs.  Zone-based OCR restricts the image to a specific part of a document. This is often referred to as "Template OCR".
  • 8. Digital Library ! A digital library is a special library with a collection of digital objects that can include text, visual material, audio material, video material, stored as electronic media formats (as opposed to print, or other media. , along with means for organizing, storing, and retrieving the files and media contained in the library collection. Digital libraries can vary immensely in size and scope, and can be maintained by individuals, organizations, or affiliated with established physical library buildings or institutions, or with academic institutions. The digital content may be stored locally, or accessed remotely via computer networks. An electronic library is a type of information retrieval system. These information retrieval systems are able to exchange information with each other through interoperability and sustainability .
  • 9. Applications of O.C.R ! OCR engines have been developed into many kinds of domain-specific OCR applications, such as receipt OCR, invoice OCR, check OCR, legal billing document OCR. They can be used for:  Data entry for business documents, e.g. check ,passport, invoice, bank statement and receipt  Automatic number plate recognition  Automatic insurance documents key information extraction  Extracting business card information into a contact list  More quickly make textual versions of printed documents, e.g. book scanning for Project Gutenberg  Make electronic images of printed documents searchable, e.g. Google Books  Converting handwriting in real time to control a computer (pen computing)  Defeating CAPTCHA anti-bot systems, though these are specifically designed to prevent OCR. The purpose can also be to test the robustness of CAPTCHA anti-bot system
  • 10. ACCURACY ! Accuracy rates can be measured in several ways, and how they are measured can greatly affect the reported accuracy rate. For example, if word context (basically a lexicon of words) is not used to correct software finding non-existent words, a character error rate of 1% (99% accuracy) may result in an error rate of 5% (95% accuracy) or worse if the measurement is based on whether each whole word was recognized with no incorrect letters.  The Information Science Research Institute (ISRI) had the mission to foster the improvement of automated technologies for understand machines printed documents and its conducted the most authoritative of the Annual Test of OCR Accuracy from 1992 to 1996.  65% Accurate on Latin , even the image is clear.  Total accuracy can be achieved by human review or Data Dictionary Authentication.  Other areas—including recognition of hand printing, cursive handwriting, and printed text in other scripts (especially those East Asian language characters which have many strokes for a single character)—are still the subject of active research. The MNIST database is commonly used for testing systems' ability to recognise handwritten digits.
  • 11. Benefits of OCR !  Documents will be text-searchable with OCR processing. From that, it gives you the advantage of using the name of your documents, reference numbers, addresses, etc; when searching through your data base.  Saving you lots of time when using a digital file rather than paper documents.  OCR processing can massively improve your customer services. If you take incoming calls which require you to access documents then having those documents available instantly in digital form can make the overall customer experience better due to the speed of searching for the files they need and the ability to edit their contents easily.  Your documents can become editable with OCR. We can convert the files to MS Word and any other editable digital formats.  OCR allows you to copy and paste from the document itself whether that’s in PDF format or MS Word format.  With low costing OCR processing. The advantage is that it can improve how your business operates.  OCR is also known to boost staff morale when their working environment is easier to work within and less paper-centric
  • 12. Thank You For Paying Attention !