WO2001040918A1 - Method for identifying a data region of a document - Google Patents
Method for identifying a data region of a document Download PDFInfo
- Publication number
- WO2001040918A1 WO2001040918A1 PCT/US2000/032676 US0032676W WO0140918A1 WO 2001040918 A1 WO2001040918 A1 WO 2001040918A1 US 0032676 W US0032676 W US 0032676W WO 0140918 A1 WO0140918 A1 WO 0140918A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data region
- data
- logical
- footholder
- coordinate system
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the present invention relates generally to data processing and more particularly, to an apparatus, method and article of manufacture for defining a data region of a document.
- identifying and extracting data from selected regions of document images there are several well-known methods for identifying and extracting data from selected regions of document images.
- data regions of a document image are typically selected in order to acquire meaningful data located therein.
- template forms are utilized.
- predefined regions within a given document are identified by matching the document to a database of template forms.
- the template forms consist of constant data. This constant data is stripped from the document and the data contained in the data regions acquired.
- marks are set in certain places of the document in order to properly identify the coordinates of data regions. Marks are utilized as footholders.
- a footholder is any graphical object having some persistent properties, i.e. coordinates, text, child window identification number (ID), etc.
- ID child window identification number
- a footholder can be recognized in run-time according to its properties.
- a graphical object is any object displayed on the screen, for example, windows and their elements, such as, borders and captions, strings of text, parts of strings, such as words, bitmaps, etc. Each graphical object is characterized by a surrounding geometrical region. The geometrical region does not have to be rectangular in shape. This method of data region identification is generally hard coded in document recognition systems. Data is further acquired from the data region.
- one object of the present invention is to identify data regions in documents that can change their visual representation. This is solved in such a way that at least one region with informative data on the base area, for example, a full screen, is selected. Then at least one footholder for identification of the region is selected, so that the position of the region is related with the position of the footholder by an algorithm which does not depend on the footholder's and the region's coordinates on the base area. Algorithm for automated identification of the region, which is based on the footholder's position, is constructed. Automated identification of the region on the base area is executed and data from the region is acquired.
- an apparatus, method and product for selecting at least one footholder having a fixed relative position with respect to a data region to provide a footholder location identifier; providing a logical coordinate system related to the location of the footholder, the coordinate system being in the document and resizing therewith and surrounding the data region; locating the data region in relationship to the logical coordinate system; and transforming the preceding steps into a set of commands thereby creating an algorithm for locating the data region.
- an apparatus, method and product for executing, preferably automatically, the previously created algorithm for locating the data region.
- an apparatus, method and product for acquiring the data within the data region are provided.
- Figure 1 is a block diagram of one embodiment of a data processing system constructed in accordance with the teachings herein.
- Figure 2 is a block diagram of one memory layout of the data processing system of Figure 1.
- Figure 3 is a flow diagram of an exemplary embodiment of the overall data region identification process.
- Figure 4 illustrates a document with data to be processed according to the teachings herein.
- Figure 5 illustrates the document of Figure 4 wherein several footholders are selected.
- Figure 6 illustrates the document of Figure 5 wherein a logical reference grid based on the footholders is also shown.
- the present invention is embodied in the system configuration, method of operation and product or computer-readable medium, such as floppy disks, conventional hard disks, CD-ROMS, Flash ROMS, nonvolatile ROM, RAM and any other equivalent computer memory device, generally shown in Figures 1 - 6. It will be appreciated that the system, method of operation and product may vary as to the details of its configuration and operation without departing from the basic concepts disclosed herein.
- the system 1 comprises: memory 10, a central processing unit (CPU) 12, an input device 16, such as a keyboard and mouse; and a display device 18, such as a screen.
- the system may also include a secondary storage device 14, such as a hard disk drive.
- a representative system 1 is a stationary or portable computer designed according to widely understood principles, e.g., a desktop computer, a laptop computer, a hand-held computer, or any other known or future device having at least the necessary physical and processing capabilities that are required to implement aspects of the present invention.
- a multitasking operating system 32 such as Microsoft® Windows
- Client application 40 gains information 44 about graphical objects, which are drawn on the screen 18.
- Information about the graphical objects drawn by the applications 34, 36 and 38 are not essential to the present description and, accordingly, will not be discussed further.
- Client application 40 identifies data regions of a document during run-time even if the visual representation of the document has changed.
- the client application 40 In order to acquire data from a desired data region of a document, the client application 40 must automatically identify the data region during run-time. In order for the client application 40 to do so, a user, whether virtual or real, must first identify the data region during design time by performing/executing a set of operations. These operations, when combined, create an algorithm for accurately identifying the data region. Preferably, the algorithm is stored in memory to be used by the client application 40 during run-time.
- design time refers to the time spent for manual operations for region identification and creation of algorithm in accordance with this operations.
- Un-time refers to the period of time during which a client application executes the algorithm of identification that was constructed in design time.
- client application 40 in executing the algorithm during run-time, always identifies the previously identified data region, in order to, for example, acquire the data contained within the identified data region. Therefore, as an example, even if a window containing the data region is manipulated, i.e., it is scrolled, closed, minimized, opened or restored, and/or the screen resolution changes, client application 40 will still be able to identify the data region.
- a user would select one or more footholders that surround the desired data region. This action of selection corresponds to one or more operations which provide the first code segment of the data region identification algorithm.
- a logical coordinate system is constructed based on the footholders. This action of constructing a logical coordinate system corresponds to one or more operations which provide a second code segment comprising the data region identification algorithm.
- the forgoing code segments are combined to create the data region identification algorithm.
- the previously constructed data region identification algorithm executes and at 58, the data region is identified.
- Figure 4 illustrates a portion of a display device, such as a computer screen, whereupon a document containing data is displayed to the user.
- a user wishing to define a particular data region of the displayed document, must select one or more footholders surrounding the data region.
- the user causes a logical coordinate system to be constructed based on the relative position of the footholders 62, 64 and 66 to the region 60.
- the logical coordinate system is a logical reference grid having enumerated grid lines along the sides of the region 60. It should be noted that the actual placement of the grid lines depends on multiple variants. However, the grid lines should serve to localize the region 60. Thus, as depicted, the user sets the grid lines 70, 72 and 74 along the sides of the footholders 62, 64 and 66.
- the grid lines are enumerated with coordinate values in order to uniquely identify the region 60 in the logical coordinate system.
- horizontal grid lines 72 and 74 are numbered “1" and “2", respectively, and vertical grid line 70 is numbered "1".
- the region 60 is now determined by its logical coordinates, i.e. logical coordinates
- the manual choosing of footholders and the manual creating of the logical reference grid based on the boundaries of the selected footholders are considered a set of commands, which are transformed into a data region identification algorithm 54 (See Figure 3 A).
- the data region identification algorithm 54 does not depend on the screen coordinates of the selected footholders and the identified data region 60. Instead, the algorithm 54 relates the identified region 60 with the positions of footholders 62, 64 and 66.
- the screen coordinates of the identified region 60 may change due to window scrolling, resizing and/or other events, the logical coordinates of the identified region 60 will remain constant due to the fixed relative positions of the footholders to the region 60. In other words, the logical coordinate system moves with and changes with the window. See Figure 7.
- Client application 40 gets information about all graphical objects drawn by applications 34, 36 and 38, as was mentioned hereinabove. Footholders with words "site! 62 ( Figure 5), "address” 64 and "Click to continue" 66 are identified inside the text on the screen. Algorithms of searching predefined words in text are known in the art. The coordinate grid in screen coordinates is formed based on the boundaries of these footholders. The grid is constiucted automatically from its prototype, which was constructed in design-time . Region 60 ( Figure 6) is determined by its logical coordinates on the grid (logical coordinates ("1","2"), ("1 ")).
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2001541915A JP2004500630A (en) | 2000-11-30 | 2000-11-30 | Method for identifying the data area of a document |
CA002392988A CA2392988A1 (en) | 1999-11-30 | 2000-11-30 | Method for identifying a data region of a document |
AU19383/01A AU1938301A (en) | 1999-11-30 | 2000-11-30 | Method for identifying a data region of a document |
EP00982334A EP1242862A1 (en) | 1999-11-30 | 2000-11-30 | Method for identifying a data region of a document |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
IL133230 | 1999-11-30 | ||
IL13323099A IL133230A0 (en) | 1999-11-30 | 1999-11-30 | Method for identifying regions in documents on the screen |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/028,072 Continuation US20030004311A1 (en) | 1997-03-31 | 2001-12-19 | Secreted and transmembrane polypeptides and nucleic acids encoding the same |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2001040918A1 true WO2001040918A1 (en) | 2001-06-07 |
Family
ID=11073551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2000/032676 WO2001040918A1 (en) | 1999-11-30 | 2000-11-30 | Method for identifying a data region of a document |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP1242862A1 (en) |
AU (1) | AU1938301A (en) |
CA (1) | CA2392988A1 (en) |
IL (1) | IL133230A0 (en) |
WO (1) | WO2001040918A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080037051A1 (en) * | 2006-08-10 | 2008-02-14 | Fuji Xerox Co., Ltd. | Document display processor, computer readable medium storing document display processing program, computer data signal and document display processing method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5313581A (en) * | 1990-09-14 | 1994-05-17 | Digital Equipment Corporation | System and method for communication between windowing environments |
US5438661A (en) * | 1990-11-16 | 1995-08-01 | Fujitsu Limited | Version management method and apparatus in multi-window environment |
US5606674A (en) * | 1995-01-03 | 1997-02-25 | Intel Corporation | Graphical user interface for transferring data between applications that support different metaphors |
US5625809A (en) * | 1988-04-25 | 1997-04-29 | Hewlett-Packard Company | Method for constructing a data structure which allows data to be shared between programs |
US5680151A (en) * | 1990-06-12 | 1997-10-21 | Radius Inc. | Method and apparatus for transmitting video, data over a computer bus using block transfers |
US5694561A (en) * | 1994-12-12 | 1997-12-02 | Microsoft Corporation | Method and system for grouping and manipulating windows |
US5694544A (en) * | 1994-07-01 | 1997-12-02 | Hitachi, Ltd. | Conference support system which associates a shared object with data relating to said shared object |
-
1999
- 1999-11-30 IL IL13323099A patent/IL133230A0/en unknown
-
2000
- 2000-11-30 WO PCT/US2000/032676 patent/WO2001040918A1/en not_active Application Discontinuation
- 2000-11-30 AU AU19383/01A patent/AU1938301A/en not_active Abandoned
- 2000-11-30 EP EP00982334A patent/EP1242862A1/en not_active Withdrawn
- 2000-11-30 CA CA002392988A patent/CA2392988A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5625809A (en) * | 1988-04-25 | 1997-04-29 | Hewlett-Packard Company | Method for constructing a data structure which allows data to be shared between programs |
US5680151A (en) * | 1990-06-12 | 1997-10-21 | Radius Inc. | Method and apparatus for transmitting video, data over a computer bus using block transfers |
US5313581A (en) * | 1990-09-14 | 1994-05-17 | Digital Equipment Corporation | System and method for communication between windowing environments |
US5438661A (en) * | 1990-11-16 | 1995-08-01 | Fujitsu Limited | Version management method and apparatus in multi-window environment |
US5694544A (en) * | 1994-07-01 | 1997-12-02 | Hitachi, Ltd. | Conference support system which associates a shared object with data relating to said shared object |
US5694561A (en) * | 1994-12-12 | 1997-12-02 | Microsoft Corporation | Method and system for grouping and manipulating windows |
US5606674A (en) * | 1995-01-03 | 1997-02-25 | Intel Corporation | Graphical user interface for transferring data between applications that support different metaphors |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080037051A1 (en) * | 2006-08-10 | 2008-02-14 | Fuji Xerox Co., Ltd. | Document display processor, computer readable medium storing document display processing program, computer data signal and document display processing method |
Also Published As
Publication number | Publication date |
---|---|
EP1242862A1 (en) | 2002-09-25 |
CA2392988A1 (en) | 2001-06-07 |
IL133230A0 (en) | 2001-03-19 |
AU1938301A (en) | 2001-06-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR102134309B1 (en) | System and method for automated conversion of interactive sites and applications to support mobile and other display environments | |
US20210349615A1 (en) | Resizing graphical user interfaces | |
US6356281B1 (en) | Method and apparatus for displaying translucent overlapping graphical objects on a computer monitor | |
JP4637455B2 (en) | User interface utilization method and product including computer usable media | |
JP5792287B2 (en) | Spin control user interface for selecting options | |
US10031900B2 (en) | Range adjustment for text editing | |
US20130187948A1 (en) | 2d line data cursor | |
US10976899B2 (en) | Method for automatically applying page labels using extracted label contents from selected pages | |
JP2007503032A (en) | Document scanner | |
US20100158379A1 (en) | Image background removal | |
US7065705B1 (en) | Automatic labeling in images | |
CN113361525A (en) | Page generation method and device based on OCR, computer equipment and storage medium | |
JPH09231393A (en) | Instruction input device | |
JPWO2016170691A1 (en) | Input processing program, input processing apparatus, input processing method, character specifying program, character specifying apparatus, and character specifying method | |
EP1242862A1 (en) | Method for identifying a data region of a document | |
US20070263010A1 (en) | Large-scale visualization techniques | |
JP5066877B2 (en) | Image display device, image display method, and program | |
JP6696119B2 (en) | Conversion device, conversion method, and conversion program | |
JP2004500630A (en) | Method for identifying the data area of a document | |
EP4027253A1 (en) | Information processing apparatus and program | |
CN113609410B (en) | Method, device, equipment and computer readable storage medium for information navigation | |
US20220254141A1 (en) | Image processing device, image processing method, and program | |
JPH0816805A (en) | Graphic generation device | |
JPH0816800A (en) | Information processor | |
CN115509665A (en) | Method, device, medium and equipment for recording control in window |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE TR BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: IN/PCT/2002/00542/DE Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 2001 541915 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2392988 Country of ref document: CA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2000982334 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2002 2002114573 Country of ref document: RU Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 008187088 Country of ref document: CN |
|
WWP | Wipo information: published in national office |
Ref document number: 2000982334 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 2000982334 Country of ref document: EP |