CA2192023A1 - Separation of Touching Characters in Optical Character Recognition - Google Patents
Separation of Touching Characters in Optical Character RecognitionInfo
- Publication number
- CA2192023A1 CA2192023A1 CA2192023A CA2192023A CA2192023A1 CA 2192023 A1 CA2192023 A1 CA 2192023A1 CA 2192023 A CA2192023 A CA 2192023A CA 2192023 A CA2192023 A CA 2192023A CA 2192023 A1 CA2192023 A1 CA 2192023A1
- Authority
- CA
- Canada
- Prior art keywords
- characters
- touching
- separation
- classified
- modules
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
- G06V30/15—Cutting or merging image elements, e.g. region growing, watershed or clustering-based techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
Method and apparatus for separating touching characters within an optical character recognition (OCR) computer (1). An input document (20) is scanned by scanner (2), forming a set of scan lines (3). A segmentation process (4) is performed on the scan lines (3) to create a set of segmented image boxes (5).
Candidate characters within the image boxes (5) are classified by a classification module (6), based upon a library of stored models (7). When the candidate characters have high degree of confidence, they are classified and coded into a binary form (8), such as ASCII. Those candidate characters that are not classified are processed by a touching character decision module (9) to determine whether a series of separation modules (10-14) is to be invoked. The execution of modules (10-13), followed by the reexecution of modules (4) and (6), may or may not cause all of the touching characters to be separated. Any touching characters that remain are subjected to one or more reprocessing cycles. The reprocessing can entail examination (14) of adjacent scan lines (3), shifting of separation threshold T by separation threshold determination module (10), or re-execution of deconvolution step (12) with changed parameters or structure.
Candidate characters within the image boxes (5) are classified by a classification module (6), based upon a library of stored models (7). When the candidate characters have high degree of confidence, they are classified and coded into a binary form (8), such as ASCII. Those candidate characters that are not classified are processed by a touching character decision module (9) to determine whether a series of separation modules (10-14) is to be invoked. The execution of modules (10-13), followed by the reexecution of modules (4) and (6), may or may not cause all of the touching characters to be separated. Any touching characters that remain are subjected to one or more reprocessing cycles. The reprocessing can entail examination (14) of adjacent scan lines (3), shifting of separation threshold T by separation threshold determination module (10), or re-execution of deconvolution step (12) with changed parameters or structure.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US08/577,727 | 1995-12-22 | ||
US08/577,727 US5768414A (en) | 1995-12-22 | 1995-12-22 | Separation of touching characters in optical character recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2192023A1 true CA2192023A1 (en) | 1997-06-23 |
CA2192023C CA2192023C (en) | 2000-04-04 |
Family
ID=24309920
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002192023A Expired - Fee Related CA2192023C (en) | 1995-12-22 | 1996-12-04 | Separation of touching characters in optical character recognition |
Country Status (5)
Country | Link |
---|---|
US (1) | US5768414A (en) |
EP (1) | EP0780782B1 (en) |
JP (1) | JPH1027214A (en) |
CA (1) | CA2192023C (en) |
DE (1) | DE69626182T2 (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6055336A (en) * | 1996-11-18 | 2000-04-25 | Canon Kabushiki Kaisha | Image processing system which converts multi-value image data into binary image data |
US6487311B1 (en) * | 1999-05-04 | 2002-11-26 | International Business Machines Corporation | OCR-based image compression |
US7400768B1 (en) | 2001-08-24 | 2008-07-15 | Cardiff Software, Inc. | Enhanced optical recognition of digitized images through selective bit insertion |
US7283669B2 (en) * | 2003-01-29 | 2007-10-16 | Lockheed Martin Corporation | Fine segmentation refinement for an optical character recognition system |
JP4834351B2 (en) * | 2005-08-22 | 2011-12-14 | 株式会社東芝 | Character recognition device and character recognition method |
US7454063B1 (en) | 2005-09-22 | 2008-11-18 | The United States Of America As Represented By The Director National Security Agency | Method of optical character recognition using feature recognition and baseline estimation |
US7856142B2 (en) * | 2007-01-26 | 2010-12-21 | Sharp Laboratories Of America, Inc. | Methods and systems for detecting character content in a digital image |
CN101354746B (en) * | 2007-07-23 | 2011-08-31 | 夏普株式会社 | Device and method for extracting character image |
EP2225700A4 (en) * | 2007-11-30 | 2014-04-23 | Lumex As | A method for processing optical character recognition (ocr) output data, wherein the output data comprises double printed character images |
US8761511B2 (en) * | 2009-09-30 | 2014-06-24 | F. Scott Deaver | Preprocessing of grayscale images for optical character recognition |
US8345978B2 (en) | 2010-03-30 | 2013-01-01 | Microsoft Corporation | Detecting position of word breaks in a textual line image |
WO2013164849A2 (en) * | 2012-04-12 | 2013-11-07 | Tata Consultancy Services Limited | A system and method for detection and segmentation of touching characters for ocr |
CN106874906B (en) * | 2017-01-17 | 2023-02-28 | 腾讯科技(上海)有限公司 | Image binarization method and device and terminal |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4562594A (en) * | 1983-09-29 | 1985-12-31 | International Business Machines Corp. (Ibm) | Method and apparatus for segmenting character images |
US5048100A (en) * | 1988-12-15 | 1991-09-10 | Michael Kuperstein | Self organizing neural network method and system for general classification of patterns |
US5040229A (en) * | 1990-02-02 | 1991-08-13 | Eastman Kodak Company | Contour feature-based method for identification and segmentation of touching characters |
US5500905A (en) * | 1991-06-12 | 1996-03-19 | Microelectronics And Computer Technology Corporation | Pattern recognition neural network with saccade-like operation |
US5440651A (en) * | 1991-06-12 | 1995-08-08 | Microelectronics And Computer Technology Corp. | Pattern recognition neural network |
US5542006A (en) * | 1994-06-21 | 1996-07-30 | Eastman Kodak Company | Neural network based character position detector for use in optical character recognition |
-
1995
- 1995-12-22 US US08/577,727 patent/US5768414A/en not_active Expired - Lifetime
-
1996
- 1996-11-19 DE DE69626182T patent/DE69626182T2/en not_active Expired - Lifetime
- 1996-11-19 EP EP96308353A patent/EP0780782B1/en not_active Expired - Lifetime
- 1996-12-04 CA CA002192023A patent/CA2192023C/en not_active Expired - Fee Related
- 1996-12-20 JP JP8354733A patent/JPH1027214A/en active Pending
Also Published As
Publication number | Publication date |
---|---|
EP0780782A2 (en) | 1997-06-25 |
CA2192023C (en) | 2000-04-04 |
EP0780782B1 (en) | 2003-02-12 |
JPH1027214A (en) | 1998-01-27 |
EP0780782A3 (en) | 1998-07-08 |
US5768414A (en) | 1998-06-16 |
DE69626182D1 (en) | 2003-03-20 |
DE69626182T2 (en) | 2003-11-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US4817171A (en) | Pattern recognition system | |
US7233697B2 (en) | Character recognition device and a method therefor | |
US5995659A (en) | Method of searching and extracting text information from drawings | |
CA2192023A1 (en) | Separation of Touching Characters in Optical Character Recognition | |
KR950001551A (en) | Image segmentation and how to classify image elements | |
WO1997015026A1 (en) | Processor based method for extracting tables from printed documents | |
JPH0721319A (en) | Automatic determination device of asian language | |
EP1357508B1 (en) | Layout analysis | |
CN111931769A (en) | Invoice processing device, invoice processing apparatus, invoice computing device and invoice storage medium combining RPA and AI | |
Rege et al. | Text-image separation in document images using boundary/perimeter detection | |
Chowdhury et al. | Automated segmentation of math-zones from document images | |
Jamil et al. | Multilingual artificial text extraction and script identification from video images | |
Rodrigues et al. | Cursive character recognition–a character segmentation method using projection profile-based technique | |
Sun | Page segmentation for Manhattan and non-Manhattan layout documents via selective CRLA | |
Aparna et al. | A complete OCR system development of Tamil magazine documents | |
Kwak et al. | Video caption image enhancement for an efficient character recognition | |
Vijayarani et al. | MULTI-SCRIPT LANGUAGE IDENTIFICATION FROM DOCUMENT IMAGES | |
Li | An implementation of ocr system based on skeleton matching | |
Wang et al. | Hierarchical content classification and script determination for automatic document image processing | |
O'Keefe et al. | Image labelling using an associative memory | |
Kurdy et al. | Omnifont Arabic optical character recognition system | |
Wang et al. | Document segmentation and classification with top-down approach | |
Dzyubanenko et al. | Method of the optical recognition of technical documentation and the transformation of graphic information into machine-readable form for cognitive analysis | |
Bailey et al. | Electronic schematic recognition | |
Liu et al. | Pixel-Level Segmentation of Handwritten and Printed Texts in Document Images with Deep Learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20161205 |