CA2192023A1 - Separation of Touching Characters in Optical Character Recognition - Google Patents

Separation of Touching Characters in Optical Character Recognition

Info

Publication number
CA2192023A1
CA2192023A1 CA2192023A CA2192023A CA2192023A1 CA 2192023 A1 CA2192023 A1 CA 2192023A1 CA 2192023 A CA2192023 A CA 2192023A CA 2192023 A CA2192023 A CA 2192023A CA 2192023 A1 CA2192023 A1 CA 2192023A1
Authority
CA
Canada
Prior art keywords
characters
touching
separation
classified
modules
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CA2192023A
Other languages
French (fr)
Other versions
CA2192023C (en
Inventor
Hamadi Jamali
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Canon Inc
Original Assignee
Canon Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Canon Inc filed Critical Canon Inc
Publication of CA2192023A1 publication Critical patent/CA2192023A1/en
Application granted granted Critical
Publication of CA2192023C publication Critical patent/CA2192023C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/15Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

Method and apparatus for separating touching characters within an optical character recognition (OCR) computer (1). An input document (20) is scanned by scanner (2), forming a set of scan lines (3). A segmentation process (4) is performed on the scan lines (3) to create a set of segmented image boxes (5).
Candidate characters within the image boxes (5) are classified by a classification module (6), based upon a library of stored models (7). When the candidate characters have high degree of confidence, they are classified and coded into a binary form (8), such as ASCII. Those candidate characters that are not classified are processed by a touching character decision module (9) to determine whether a series of separation modules (10-14) is to be invoked. The execution of modules (10-13), followed by the reexecution of modules (4) and (6), may or may not cause all of the touching characters to be separated. Any touching characters that remain are subjected to one or more reprocessing cycles. The reprocessing can entail examination (14) of adjacent scan lines (3), shifting of separation threshold T by separation threshold determination module (10), or re-execution of deconvolution step (12) with changed parameters or structure.
CA002192023A 1995-12-22 1996-12-04 Separation of touching characters in optical character recognition Expired - Fee Related CA2192023C (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US08/577,727 1995-12-22
US08/577,727 US5768414A (en) 1995-12-22 1995-12-22 Separation of touching characters in optical character recognition

Publications (2)

Publication Number Publication Date
CA2192023A1 true CA2192023A1 (en) 1997-06-23
CA2192023C CA2192023C (en) 2000-04-04

Family

ID=24309920

Family Applications (1)

Application Number Title Priority Date Filing Date
CA002192023A Expired - Fee Related CA2192023C (en) 1995-12-22 1996-12-04 Separation of touching characters in optical character recognition

Country Status (5)

Country Link
US (1) US5768414A (en)
EP (1) EP0780782B1 (en)
JP (1) JPH1027214A (en)
CA (1) CA2192023C (en)
DE (1) DE69626182T2 (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6055336A (en) * 1996-11-18 2000-04-25 Canon Kabushiki Kaisha Image processing system which converts multi-value image data into binary image data
US6487311B1 (en) * 1999-05-04 2002-11-26 International Business Machines Corporation OCR-based image compression
US7400768B1 (en) 2001-08-24 2008-07-15 Cardiff Software, Inc. Enhanced optical recognition of digitized images through selective bit insertion
US7283669B2 (en) * 2003-01-29 2007-10-16 Lockheed Martin Corporation Fine segmentation refinement for an optical character recognition system
JP4834351B2 (en) * 2005-08-22 2011-12-14 株式会社東芝 Character recognition device and character recognition method
US7454063B1 (en) 2005-09-22 2008-11-18 The United States Of America As Represented By The Director National Security Agency Method of optical character recognition using feature recognition and baseline estimation
US7856142B2 (en) * 2007-01-26 2010-12-21 Sharp Laboratories Of America, Inc. Methods and systems for detecting character content in a digital image
CN101354746B (en) * 2007-07-23 2011-08-31 夏普株式会社 Device and method for extracting character image
EP2225700A4 (en) * 2007-11-30 2014-04-23 Lumex As A method for processing optical character recognition (ocr) output data, wherein the output data comprises double printed character images
US8761511B2 (en) * 2009-09-30 2014-06-24 F. Scott Deaver Preprocessing of grayscale images for optical character recognition
US8345978B2 (en) 2010-03-30 2013-01-01 Microsoft Corporation Detecting position of word breaks in a textual line image
WO2013164849A2 (en) * 2012-04-12 2013-11-07 Tata Consultancy Services Limited A system and method for detection and segmentation of touching characters for ocr
CN106874906B (en) * 2017-01-17 2023-02-28 腾讯科技(上海)有限公司 Image binarization method and device and terminal

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4562594A (en) * 1983-09-29 1985-12-31 International Business Machines Corp. (Ibm) Method and apparatus for segmenting character images
US5048100A (en) * 1988-12-15 1991-09-10 Michael Kuperstein Self organizing neural network method and system for general classification of patterns
US5040229A (en) * 1990-02-02 1991-08-13 Eastman Kodak Company Contour feature-based method for identification and segmentation of touching characters
US5500905A (en) * 1991-06-12 1996-03-19 Microelectronics And Computer Technology Corporation Pattern recognition neural network with saccade-like operation
US5440651A (en) * 1991-06-12 1995-08-08 Microelectronics And Computer Technology Corp. Pattern recognition neural network
US5542006A (en) * 1994-06-21 1996-07-30 Eastman Kodak Company Neural network based character position detector for use in optical character recognition

Also Published As

Publication number Publication date
EP0780782A2 (en) 1997-06-25
CA2192023C (en) 2000-04-04
EP0780782B1 (en) 2003-02-12
JPH1027214A (en) 1998-01-27
EP0780782A3 (en) 1998-07-08
US5768414A (en) 1998-06-16
DE69626182D1 (en) 2003-03-20
DE69626182T2 (en) 2003-11-13

Similar Documents

Publication Publication Date Title
US4817171A (en) Pattern recognition system
US7233697B2 (en) Character recognition device and a method therefor
US5995659A (en) Method of searching and extracting text information from drawings
CA2192023A1 (en) Separation of Touching Characters in Optical Character Recognition
KR950001551A (en) Image segmentation and how to classify image elements
WO1997015026A1 (en) Processor based method for extracting tables from printed documents
JPH0721319A (en) Automatic determination device of asian language
EP1357508B1 (en) Layout analysis
CN111931769A (en) Invoice processing device, invoice processing apparatus, invoice computing device and invoice storage medium combining RPA and AI
Rege et al. Text-image separation in document images using boundary/perimeter detection
Chowdhury et al. Automated segmentation of math-zones from document images
Jamil et al. Multilingual artificial text extraction and script identification from video images
Rodrigues et al. Cursive character recognition–a character segmentation method using projection profile-based technique
Sun Page segmentation for Manhattan and non-Manhattan layout documents via selective CRLA
Aparna et al. A complete OCR system development of Tamil magazine documents
Kwak et al. Video caption image enhancement for an efficient character recognition
Vijayarani et al. MULTI-SCRIPT LANGUAGE IDENTIFICATION FROM DOCUMENT IMAGES
Li An implementation of ocr system based on skeleton matching
Wang et al. Hierarchical content classification and script determination for automatic document image processing
O'Keefe et al. Image labelling using an associative memory
Kurdy et al. Omnifont Arabic optical character recognition system
Wang et al. Document segmentation and classification with top-down approach
Dzyubanenko et al. Method of the optical recognition of technical documentation and the transformation of graphic information into machine-readable form for cognitive analysis
Bailey et al. Electronic schematic recognition
Liu et al. Pixel-Level Segmentation of Handwritten and Printed Texts in Document Images with Deep Learning

Legal Events

Date Code Title Description
EEER Examination request
MKLA Lapsed

Effective date: 20161205