US8897565B1 - Extracting documents from a natural scene image - Google Patents
Extracting documents from a natural scene image Download PDFInfo
- Publication number
- US8897565B1 US8897565B1 US13/538,145 US201213538145A US8897565B1 US 8897565 B1 US8897565 B1 US 8897565B1 US 201213538145 A US201213538145 A US 201213538145A US 8897565 B1 US8897565 B1 US 8897565B1
- Authority
- US
- United States
- Prior art keywords
- quadrilateral
- rectangular
- shaped
- shaped region
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/146—Aligning or centring of the image pick-up or image-field
- G06V30/1475—Inclination or skew detection or correction of characters or of image to be recognised
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Definitions
- the subject matter described herein generally relates to extracting documents from a natural scene image taken with a mobile device.
- aspects of the present technology may be advantageous for rapidly extracting forms and other types of documents from a natural scene image without human intervention or costly and specialized equipment.
- the natural scene image may be transformed resulting in an extracted document image that may be upright and properly aligned.
- text fields of the extracted document may be upright, aligned and locatable at predictable points.
- the method includes receiving a digital image that includes at least one document with a background scene, detecting with a processor a number of edges in the image, and selecting regions of the image corresponding to the detected edges. For each selected region, mapping coordinates may be computed based on characteristics of the selected region. The method also includes rectifying with the processor the selected regions based on the mapping coordinates and normalizing the selected regions that are rectified. In that regard, the background scene may be removed from the selected regions.
- detecting the number of edges may further include selecting a first edge and identifying other edges having a brightness level lower than the first edge that are within a predetermined range.
- the first edge has a brightness level above the predetermined range and the other edges are joined to the first edge.
- the method may also include determining with the processor that text is present in the selected regions and extracting the text.
- rectifying the selected regions may further include calculating an quadrilateral based on a given selected region, comparing with the processor an area of the quadrilateral to an area of the given selected region and multiplying, based on the comparison, pixels of the given selected region by pixels of the quadrilateral if a ratio of the areas do not meet a threshold value.
- the quadrilateral may include at least four corners that consist of intersecting edges. The results of the multiplication may be mapped using the mapping coordinates so as to remove the background scene in the given selected region.
- Another aspect of the present technology provides a system that includes a memory storing a digital image that includes at least one document with a background scene and a processor coupled to the memory.
- the processor may be configured to detect a number of edges in the image stored in memory, select regions of the image corresponding to the detected edges. For each selected region, mapping coordinates may be computed based on characteristics of the selected region.
- the processor may be further configured to rectify selected regions based on the mapping coordinates and normalize the selected regions that are rectified. In that regard, the background scene may be removed from the selected regions.
- Yet another aspect of the present technology provides a tangible computer-readable storage medium that includes instructions of a program, when executed by a processor, cause the processor to perform a method.
- the method includes receiving a digital image that includes at least one document with a background scene, detecting with a processor a number of edges in the image, and selecting regions of the image corresponding to the detected edges. For each selected region, mapping coordinates may be computed based on characteristics of the selected region.
- the method also includes rectifying with the processor the selected regions based on the mapping coordinates and normalizing the selected regions that are rectified. In that regard, the background scene may be removed from the selected regions.
- FIG. 1 is a block diagram of a system in accordance with one aspect of the present technology.
- FIG. 2 is an image of a document in a natural scene.
- FIG. 3 is a flow diagram of a method for extracting quadrilaterals from an image in accordance with one aspect of the present technology.
- FIG. 4 illustrates an example of extracting quadrilaterals from a source image according to aspects of the present technology.
- FIG. 5 is a flow diagram illustrating a method for extracting text from an image in accordance with one aspect of the present technology.
- FIG. 6 illustrates an example of a rectified document image according to aspects of the present technology.
- FIG. 7 illustrates an example of a rectified document image that has been first.
- a processor may be used to extract forms and other types of documents from a natural scene image, e.g., an image of document that includes a background scene.
- a natural scene image e.g., an image of document that includes a background scene.
- OCR optical character recognition
- FIG. 1 is a block diagram of a system 100 .
- the system 100 may include a server 110 coupled to a network 120 .
- the system may also include a client device 130 capable of wireless communication with the server 110 over the network 120 .
- the server 110 can contain a processor 112 , memory 114 , and other components typically present in general purpose computers.
- the memory 114 of server 110 can store information accessible by the processor 112 , including instructions 116 that may be executed by the processor 112 .
- Memory may also include data 118 that can be retrieved, manipulated or stored by the processor 112 .
- the memory 114 can be a type of non-transitory computer readable medium capable of storing information accessible by the processor 112 , such as a hard-drive, memory card, ROM, RAM, DVD, CD-ROM, write-capable, and read-only memories.
- the processor 112 can be a well-known processor, or other lesser-known types of processors. Alternatively, the processor 112 can be a dedicated controller such as an ASIC.
- the instructions 116 can be a set of instructions executed directly, such as machine code, or indirectly, such as scripts, by the processor 112 .
- the terms “instructions,” “steps” and “programs” may be used interchangeably herein.
- the instructions 116 can be stored in object code format for direct processing by the processor 112 , or other types of computer language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance. Functions, methods and routines of the instructions are explained in more detail below.
- the data 118 can be retrieved, stored or modified by the processor 112 in accordance with the instructions 116 .
- the data 118 can be stored in computer registers, in a relational database as a table having a number of different fields and records, or XML documents.
- the data 118 can also be formatted in a computer-readable format such as, but not limited to, binary values, ASCII or Unicode.
- the data 118 can include information sufficient to identify relevant information, such as numbers, descriptive text, proprietary codes, pointers, references to data stored in other memories including other network locations or information that is used by a function to calculate relevant data.
- the data 118 can include image data that may be encoded into various digital formats based on the instructions 132 .
- FIG. 1 functionally illustrates the processor 112 and memory 114 as being within the same block
- the processor 112 and memory 114 may actually include multiple processors and memories that may or may not be stored within the same physical housing.
- some of the instructions and data can be stored on a removable CD-ROM and others within a read-only computer chip. Some or all of the instructions and data can be stored in a location physically remote from, yet still accessible by, the processor 112 .
- the processor 112 may actually include a collection of processors, which may or may not operate in parallel.
- the server 110 can be at one node of network 120 and capable of directly and indirectly communicating with other nodes of the network 120 .
- the server 110 can include a web server that may be capable of communicating with the client device 130 using network 120 such that it uses the network 120 to transmit and display information to a user on display 138 of the client device 130 .
- Server 110 can also include a number of computers, e.g., a load balanced server farm, that exchange information with different nodes of a network for the purpose of receiving, processing and transmitting data to client devices. In this instance, the client devices will typically still be at different nodes of the network 120 than the computers making up server 110 .
- Each client device 130 can be configured similarly to server 110 , with a processor 132 , memory 134 , instructions 135 , and data 136 .
- Each client 130 may be a personal computer having all the internal components normally found in a computer such as a central processing unit (CPU), display device 138 , for example, a monitor having a screen, a projector, a touch-screen, a small LCD screen, a television, or another device such as an electrical device that can be operable to display information processed by the processor, a CD-ROM, hard drive, user input 137 , for example, a mouse, keyboard, touch screen or microphone, speakers, a modem and/or network interface device, such as a telephone, cable or otherwise, and all of the components used for connecting these elements to one another.
- computers in accordance with the subject matter described herein may include devices capable of processing instructions and transmitting data to and from humans and other computers including general purpose computers, PDAs, network computers lacking local storage capability, set top boxes for televisions, and other networked
- the device 130 can include a full-sized personal computer, the subject matter described herein may also be used in connection with mobile devices capable of wirelessly exchanging data over a network such as the Internet.
- client device 130 may be a wireless-enabled PDA, tablet PC, or a cellular phone capable of sending information using the Internet.
- the user can input information, for example, using a small keyboard, a keypad, or a touch screen.
- the client device 130 the server 110 , or by some combination thereof.
- the client device 130 may include a camera module 139 , which can be used to capture images of an object, such as a document in a natural scene.
- the client device 130 may be connected to a digital camera that may operate in conjunction with the client device 130 .
- the camera module 139 may also operate in conjunction with other image capturing systems known in the arts, such as a camera in a mobile phone or other devices with image capture features.
- the client device 130 is shown coupled to memory 134 , which can store captured natural scene images 133 . Images can also be stored on a removable medium, such as a disk, tape, SD Card or CD-ROM, which can be connected to system 100 .
- the client device 130 may digitally format the captured images 133 . More specifically, captured images 133 may be passed to the client device 130 where the processor 132 may convert the captured images 133 to a digital format that includes a large number of pixels.
- the system can include a large number of connected servers, with each different server being at a different node of the network 120 .
- the network 120 and intervening nodes, can include various configurations and protocols including the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, private networks using communication protocols proprietary to one or more companies, Ethernet, WiFi, such as 802.11, 802.11b, g, n, or other such standards, and HTTP, and various combinations of the foregoing.
- Such communication may be facilitated by a device capable of transmitting data to and from other computers, such as modems, dial-up, cable, fiber-optic, and wireless interfaces.
- FIG. 2 is an image 210 of a document 215 in a natural scene.
- the image 210 may have been captured using, for example, an image capturing device such as the client device described with respects to FIG. 1 .
- the image may include a document 215 that may be under some type of perspective deformation.
- the digital image may have been taken at an angle. Accordingly, text in the document can appear skewed or larger in portions with respects to others.
- a background scene and/or noise 220 e.g., variation of brightness or color information in images, can appear in the image 215 .
- the noise can appear, for example, as random speckles and lines 218 on an otherwise smooth surface 219 and may significantly degrade image quality.
- a processor may analyze the image 210 in order to identify portions that appear to include quadrilaterals, such as document 215 .
- the subject matter described below can process this image to extract zero or more images based on the detected quadrilaterals. For example, by analyzing both gradient strength and line ordination, multiple quadrilaterals can be detected where intersecting line segments have four corner vertices in close proximity to each other, such as the respective four corners of noisysy Quad and Clean Quad in FIG. 2 .
- the image processing techniques are described in further detail with respects to FIG. 3 .
- FIG. 3 is a flow diagram of a method 300 for extracting quadrilaterals from an image 310 .
- a source image 310 may be processed through an image processing pipeline resulting in zero or more outputted images 362 , 364 , 366 that may be possible quadrilaterals detected in the image 310 .
- Each block 320 - 360 of the image processing performed in the pipeline may include of a series of discrete operations. This may mean that the output of one process can be the input of the next process.
- the image processing method 300 described below can be performed in part or in its entirety on a mobile client device, such as a mobile phone, on a remote computing device such as a server, or on some combination thereof.
- a digital image may be received.
- the image can arrive in various supported formats, such as a string-encoded image format, PIX, e.g., a native pixel format, and a data structure, such as CvMAT.
- the digital images can be organized by various compression techniques and stored in a number of different formats. If an image arrives in an unsupported format, it can be converted into a supported format, such as CvMAT, using various image conversion tools.
- the input image may be converted to grayscale, to remove shot and mosquito noise, e.g., noticeable digital image distortions and artifacts, caused by technical features of an image capturing device such as its charge coupled device (CCD) and/or image compression technology.
- CCD charge coupled device
- the image may be segmented into regions.
- a source image may be analyzed in one or more parallel sub-operations also known as segmenters.
- the segmenters may output a set of regions from the image where region outlines or edges may roughly correspond to quadrilaterals in the image.
- Canny edge detection To detect a wide range of edges within an input image various techniques can be employed such as Canny edge detection. For example, by using Canny edge detection, points can be identified in an image where an image's brightness changes sharply or may discontinue.
- one way of discerning between edge strengths can be with a threshold value. For example, edges with pixel intensity values higher that the threshold may be marked as strong, very weak edges with pixel intensity values below the threshold can be suppressed, and edges between the two thresholds can be marked as weak.
- very strong edges are located and used as seeds that are a starting point for creating a quadrilateral outline. Other nearby edges can be joined through a closing on the edges.
- the seed edges can be adjusted by joining weaker edges that may neighbor existing edges, thus effectively tracing an outline of objects and quadrilaterals located in the image.
- regions with low intensity gradients may be imprudently detected as quadrilaterals. This may be regions affected by noise or distortions in an image. For example, variations in brightness or color information can make regions appear as if the distortions represent an actual object in the image. It may be possible that these regions are outputted by segmenters as potential quadrilaterals.
- Contiguous sets of edges can also be outputted by segmenters as potential quadrilateral regions. This may happen in situations where a detected quadrilateral is heavily textured, e.g., producing many tiny, irregular inter-edge regions, but the image background may not be. According to aspects, the outputted regions may be fairly quadrilateral in shape.
- coarse regions e.g., non-quadrilaterals may be filtered out based on a predetermined filter criterion such as whether the regions are too large, small or oblong. As discussed above, some of the regions outputted in block 330 may not be quadrilaterals. Typically, non-quadrilateral regions can be quickly discarded based on their size and oblongness.
- regions may be further analyzed to determine differences between a pair of regions based on, for example, their position and shape. If the differences reach a threshold value, one of the regions may be discarded, for example, this can be the more complex or larger of the two regions.
- quadrilaterals may be discovered by, for example, analyzing the regions output by the segmenters.
- a closest-fitting convex quadrilateral such as an ideal fitting model quadrilateral
- the region's external outline may be transformed using various techniques such as Radon transform. For example, by using these techniques the region's strongest line features may be identified and extracted. The strongest line features may be four sides of a region. At this point, it may be still possible that a selected region is not quadrilateral.
- intersections between all of the detected lines can be computed as well as their convex hull, e.g., points lying on an outer perimeter of the regions.
- Simplification techniques can be employed to reduce the convex hall to its salient points. For example, regions having less than approximately four salient points may be discarded.
- a difference map can be created.
- a binary operation such as XOR may be performed on corresponding bits from within a region and its corresponding model quadrilateral.
- the area of the difference map can be compared to the area of the model quadrilateral. If a ratio between the two areas does not meet a certain area threshold value, method 300 may determine that a selected region may be a quadrilateral and that it is possibly at a preferred location. Otherwise the selected regions may be discarded by method 300 because they are not quadrilaterals.
- Rectification is a process of calculating a homography matrix, which may be used to map homogenous coordinates between the extracted quadrilaterals and the calculated model quadrilaterals.
- the homography matrix can be used to determine an alignment necessary to correctly blend together the two regions.
- calculating a homography matrix can involve identifying common feature points between the regions, e.g., distinguishable image pixels.
- the homography matrix can be computed using functions based on openCV such as getPerspectiveTransform, which takes as inputs (1) four corners of a model quadrilateral and (2) four corners of an extracted quadrilateral.
- the function may attempt to find with least error a mapping between the two regions based on the inputted corners, e.g., common feature points.
- calculated results returned by the function may be a perspective transformation of the extracted quadrilateral represented by a homography matrix of 3 rows and 3 columns.
- every pixel point in an original image may be multiplied by every pixel point in the calculated homography matrix.
- the resulting matrix of their multiplication may be an n-by-p matrix where the number of columns m of the original image is equal to the number of rows of the calculated homography.
- a homography matrix calculated in this block 360 may be a 3 ⁇ 3 matrix.
- a fixed depth of 1 may be used since a matrix multiplication may require that each point of an original image conform to a vector with 3 elements, e.g., otherwise the multiplication results may be undefined.
- each point in the original image in a typical (x,y) coordinate system may become (x,y,1).
- multiplying this point by the homography matrix may generate a position of that point in a rectified frame.
- an equation for the above described matrix multiplication can be represented as follows:
- a perspective transformation of the extracted quadrilaterals can be specified by eight degrees of freedom or four (x,y) point mappings.
- a homography matrix may map every point in a quadrilateral to a corresponding point in a rectangular area in a source image, thus creating a rectified image of a quadrilateral.
- Each rectified quadrilateral image 362 , 364 , 366 and its corresponding homography matrix may be outputted in block 360 .
- Method 300 can take approximately 35 ms to process, e.g., a VGA formatted image.
- the homography matrix calculated in block 360 can also be used to build an Augmented Reality (AR) interface by rendering changes of a rectified image back onto an original image. For example, by enhancing the rectified image, such as replacing its text or drawing over it, and multiplying the enhancements by an inverse matrix, the enhancements may appear in the source image's perspective.
- AR Augmented Reality
- FIG. 4 illustrates an example of extracting quadrilaterals 362 , 364 , 366 from a source image 310 .
- rectified images of quadrilaterals 362 , 364 , 366 may be returned from the source image 310 using the subject matter disclosed herein.
- a source image 310 may be captured wherein a document 315 detected in the image 310 appears askew. This can be caused by many factors such as positioning of an image capture device, lighting, image compression technology and use of a low resolution lens.
- the source image 310 can be sent to one or more segmenters, which may determine outlines in the image corresponding to possible quadrilaterals.
- the segmenters may produce one or more segmented quadrilateral candidates 330 .
- segmented quadrilateral candidates 330 can be seen in FIG. 4 as inverted outlined areas.
- “ideal” quadrilaterals candidates 350 may be determined. For example, ideal quadrilateral can be detected regions where four corners of the region are made up of intersecting lines. In this example, “ideal” candidates 350 are shown in FIG. 4 as solid outlines.
- a rectified image of each quadrilateral 362 , 364 and 366 may be produced by fitting a quadrilateral region of the source image 310 into a rectified image space.
- the entire background, e.g., anything but the actual quadrilateral, of the original image 310 can be removed.
- this may mean that only the form is visible in the rectified image.
- techniques can then be used to improve the image's contrast such as thresholding pixels more than 50% white to 100% white.
- powerful color and contrast normalization method can be applied to the rectified images of the quadrilaterals 362 , 364 , 366 , which can increase image clarity during other processing steps, such as attempting to extract text using OCR techniques.
- FIG. 5 is a flow diagram illustrating a method 500 for extracting text from an image.
- a method 500 for extracting text from an image By rectifying quadrilaterals from a natural scene image using the above described procedures, textual information may be more easily extracted from the images.
- a source image containing at least one document and background may be received, for example, by capturing the image with a mobile phone camera. Due do the nature of a hand-held mobile device, it may be difficult and sometimes impossible to take a perfect, straight-on picture of the document. Typically, an OCR can often fail to recognize text when applied to such images.
- edges of the document within the source image may be detected.
- the edges of the document may be extracted from the image in order to map those edges to a rectified image.
- This stage may use characteristics such as size, shape and line features to select edges most likely to be outlines of the document against the image background.
- document corner locations can be estimated and used to generate a homography matrix.
- the source image may be rectified to produce an un-skewed view of the embedded document, thereby making OCR possibly easier.
- a rectified image may be produced by multiplying the source image by the homography matrix. By stretching corners of the rectified image to corners of the document image, background noise can be eliminated while keeping all of the document's area un-skewed and in view.
- coordinates in the document can be mapped from the rectified image, thus allowing for an extraction of text fields.
- the rectified image may be normalized. Because the rectified image may only contain a document, image optimization techniques can be employed at stage 530 to enhance an OCR's accuracy. For example, by stretching an image's histogram or by changing a range of pixel intensity values in the image its contrast may be normalized. This can have the desired effect of removing non-text noise and shading from the image, and may also facilitate in the separation of the document image from its background.
- method 500 may optionally prompt a user to determine whether an image should be saved.
- the prompt may direct a user to accept or reject an action to be employed on a set of images. For example, a display might indicate which images may be saved. If it is determined that an image may be saved then method 500 may proceed to stage 560 where the image may be saved, otherwise it may proceed to 570 .
- images can be saved.
- image saving can be employed by an image copying utility, a computer program or other types of image copying techniques.
- the saved images may be stored on a type of non-transitory computer readable medium capable of storing information that may be read by a processor such as a hard-drive, memory card, ROM, RAM, DVD, CD-ROM, write-capable, and read-only memories. Saving the rectified image may be beneficial because all background noise may have been removed, so only the interesting parts of an image are saved.
- the saved image data may be used by various applications, for example, an Augmented Reality (AR) interface.
- AR Augmented Reality
- a saved image file may be reduced in size by 88% without noticeably reducing quality, such as saving the image as a 1-bit black and white image rather than in 8 bit grayscale.
- various types of OCR techniques can be applied for extracting information from a rectified image. For example, a checkbox on a document image can be analyzed for marks, e.g., signs of ink, using an OCR to sample their locations, which may be at a constant position in a rectified image. Once a specific type of character is identified, it may be extracted from the document. To improve OCR accuracy when applied to a rectified image, rectification as described above should be employed on the image before it is optimized.
- FIG. 6 illustrates an example of a rectified document image 620 .
- the rectified image 620 may be produced from an original image 610 .
- the original image 610 illustrates a typical image that has been captured under some type of perspective deformation.
- the document image can be at an angle and text fields within the image can be slanted and distorted.
- the text fields in the rectified image 620 may appear upright, aligned and often locatable at predictable points. This can improve the overall accuracy of an OCR applied to the rectified image 620 .
- normalizing an image e.g., engaging a process that changes a range of pixel intensity values, before rectification can be employed on the image may produce undesired effects.
- FIG. 7 illustrates an example of a rectified document image 620 that has been first normalized.
- the source image 610 from FIG. 6 has been normalized, thereby producing normalized image 615 .
- the normalized image 615 some portions of text have been erased. This may be due to, for example, background changes configured by a user in the ratios of dark to light pixels. Rectification applied to this normalized image 615 can produce results which may cause multiple OCR errors. For example, after normalization an outline of the document may be gone, text can become blurred and noise 617 may have been added in the background of the image 615 . As a result, the overall accuracy of an OCR can significantly decrease if applied to a rectified image 625 of the normalized photo.
- the above-described aspects of the present technology may be advantageous for rapidly extracting forms and other types of documents from a natural scene image. According to some aspects, this can be accomplished without human intervention or costly and specialized equipment.
- an input image under some type of perspective effect may be efficiently transformed resulting in an extracted document image that may be upright and properly aligned.
- the various techniques and parameters disclosed within can be further reconfigured so that the overall runtime performance may decrease.
Abstract
Description
Claims (18)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/538,145 US8897565B1 (en) | 2012-06-29 | 2012-06-29 | Extracting documents from a natural scene image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/538,145 US8897565B1 (en) | 2012-06-29 | 2012-06-29 | Extracting documents from a natural scene image |
Publications (1)
Publication Number | Publication Date |
---|---|
US8897565B1 true US8897565B1 (en) | 2014-11-25 |
Family
ID=51901840
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/538,145 Active 2032-08-23 US8897565B1 (en) | 2012-06-29 | 2012-06-29 | Extracting documents from a natural scene image |
Country Status (1)
Country | Link |
---|---|
US (1) | US8897565B1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140279303A1 (en) * | 2013-03-15 | 2014-09-18 | Fiserv, Inc. | Image capture and processing for financial transactions |
US20160139783A1 (en) * | 2014-11-13 | 2016-05-19 | Microsoft Technology Licensing, Llc | Detecting sidebar in document |
US9521270B1 (en) * | 2013-05-14 | 2016-12-13 | Google Inc. | Changing in real-time the perspective of objects captured in images |
US9710806B2 (en) | 2013-02-27 | 2017-07-18 | Fiserv, Inc. | Systems and methods for electronic payment instrument repository |
US20170316275A1 (en) * | 2015-01-28 | 2017-11-02 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
US20200380643A1 (en) * | 2013-09-27 | 2020-12-03 | Kofax, Inc. | Content-based detection and three dimensional geometric reconstruction of objects in image and video data |
US20200394763A1 (en) * | 2013-03-13 | 2020-12-17 | Kofax, Inc. | Content-based object detection, 3d reconstruction, and data extraction from digital images |
US20210027431A1 (en) * | 2013-03-13 | 2021-01-28 | Kofax, Inc. | Content-based object detection, 3d reconstruction, and data extraction from digital images |
US11302109B2 (en) | 2015-07-20 | 2022-04-12 | Kofax, Inc. | Range and/or polarity-based thresholding for improved data extraction |
US11321772B2 (en) | 2012-01-12 | 2022-05-03 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
US11593585B2 (en) | 2017-11-30 | 2023-02-28 | Kofax, Inc. | Object detection and image cropping using a multi-detector approach |
CN117351495A (en) * | 2023-09-21 | 2024-01-05 | 山东睿芯半导体科技有限公司 | Text image correction method, device, chip and terminal |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030190145A1 (en) * | 1998-04-01 | 2003-10-09 | Max Copperman | Obtaining and using data associating annotating activities with portions of recordings |
US20030219149A1 (en) * | 2002-05-22 | 2003-11-27 | Aditya Vailaya | System and methods for extracting semantics from images |
US20040076342A1 (en) * | 2001-12-20 | 2004-04-22 | Ricoh Company, Ltd. | Automatic image placement and linking |
US20050201619A1 (en) | 2002-12-26 | 2005-09-15 | Fujitsu Limited | Video text processing apparatus |
US7085437B2 (en) * | 2000-01-20 | 2006-08-01 | Riso Kagaku Corporation | Document modification apparatus and image processing apparatus |
US7738706B2 (en) * | 2000-09-22 | 2010-06-15 | Sri International | Method and apparatus for recognition of symbols in images of three-dimensional scenes |
US8009928B1 (en) * | 2008-01-23 | 2011-08-30 | A9.Com, Inc. | Method and system for detecting and recognizing text in images |
US20120134588A1 (en) * | 2010-11-29 | 2012-05-31 | Microsoft Corporation | Rectification of characters and text as transform invariant low-rank textures |
US8320674B2 (en) | 2008-09-03 | 2012-11-27 | Sony Corporation | Text localization for image and video OCR |
US20130004076A1 (en) * | 2011-06-29 | 2013-01-03 | Qualcomm Incorporated | System and method for recognizing text information in object |
-
2012
- 2012-06-29 US US13/538,145 patent/US8897565B1/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030190145A1 (en) * | 1998-04-01 | 2003-10-09 | Max Copperman | Obtaining and using data associating annotating activities with portions of recordings |
US7085437B2 (en) * | 2000-01-20 | 2006-08-01 | Riso Kagaku Corporation | Document modification apparatus and image processing apparatus |
US7738706B2 (en) * | 2000-09-22 | 2010-06-15 | Sri International | Method and apparatus for recognition of symbols in images of three-dimensional scenes |
US20040076342A1 (en) * | 2001-12-20 | 2004-04-22 | Ricoh Company, Ltd. | Automatic image placement and linking |
US20030219149A1 (en) * | 2002-05-22 | 2003-11-27 | Aditya Vailaya | System and methods for extracting semantics from images |
US20050201619A1 (en) | 2002-12-26 | 2005-09-15 | Fujitsu Limited | Video text processing apparatus |
US8009928B1 (en) * | 2008-01-23 | 2011-08-30 | A9.Com, Inc. | Method and system for detecting and recognizing text in images |
US8320674B2 (en) | 2008-09-03 | 2012-11-27 | Sony Corporation | Text localization for image and video OCR |
US20120134588A1 (en) * | 2010-11-29 | 2012-05-31 | Microsoft Corporation | Rectification of characters and text as transform invariant low-rank textures |
US20130004076A1 (en) * | 2011-06-29 | 2013-01-03 | Qualcomm Incorporated | System and method for recognizing text information in object |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11321772B2 (en) | 2012-01-12 | 2022-05-03 | Kofax, Inc. | Systems and methods for identification document processing and business workflow integration |
US9710806B2 (en) | 2013-02-27 | 2017-07-18 | Fiserv, Inc. | Systems and methods for electronic payment instrument repository |
US10049354B2 (en) | 2013-02-27 | 2018-08-14 | Fiserv, Inc. | Systems and methods for electronic payment instrument repository |
US20210027431A1 (en) * | 2013-03-13 | 2021-01-28 | Kofax, Inc. | Content-based object detection, 3d reconstruction, and data extraction from digital images |
US11818303B2 (en) * | 2013-03-13 | 2023-11-14 | Kofax, Inc. | Content-based object detection, 3D reconstruction, and data extraction from digital images |
US11620733B2 (en) * | 2013-03-13 | 2023-04-04 | Kofax, Inc. | Content-based object detection, 3D reconstruction, and data extraction from digital images |
US20200394763A1 (en) * | 2013-03-13 | 2020-12-17 | Kofax, Inc. | Content-based object detection, 3d reconstruction, and data extraction from digital images |
US20140279303A1 (en) * | 2013-03-15 | 2014-09-18 | Fiserv, Inc. | Image capture and processing for financial transactions |
US9521270B1 (en) * | 2013-05-14 | 2016-12-13 | Google Inc. | Changing in real-time the perspective of objects captured in images |
US11481878B2 (en) * | 2013-09-27 | 2022-10-25 | Kofax, Inc. | Content-based detection and three dimensional geometric reconstruction of objects in image and video data |
US20200380643A1 (en) * | 2013-09-27 | 2020-12-03 | Kofax, Inc. | Content-based detection and three dimensional geometric reconstruction of objects in image and video data |
US20160139783A1 (en) * | 2014-11-13 | 2016-05-19 | Microsoft Technology Licensing, Llc | Detecting sidebar in document |
US10354162B2 (en) * | 2015-01-28 | 2019-07-16 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
US20170316275A1 (en) * | 2015-01-28 | 2017-11-02 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, and storage medium |
US11302109B2 (en) | 2015-07-20 | 2022-04-12 | Kofax, Inc. | Range and/or polarity-based thresholding for improved data extraction |
US11593585B2 (en) | 2017-11-30 | 2023-02-28 | Kofax, Inc. | Object detection and image cropping using a multi-detector approach |
US11640721B2 (en) | 2017-11-30 | 2023-05-02 | Kofax, Inc. | Object detection and image cropping using a multi-detector approach |
US11694456B2 (en) | 2017-11-30 | 2023-07-04 | Kofax, Inc. | Object detection and image cropping using a multi-detector approach |
CN117351495A (en) * | 2023-09-21 | 2024-01-05 | 山东睿芯半导体科技有限公司 | Text image correction method, device, chip and terminal |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8897565B1 (en) | Extracting documents from a natural scene image | |
US11107232B2 (en) | Method and apparatus for determining object posture in image, device, and storage medium | |
US9071745B2 (en) | Automatic capturing of documents having preliminarily specified geometric proportions | |
Tian et al. | Rectification and 3D reconstruction of curved document images | |
US9235759B2 (en) | Detecting text using stroke width based text detection | |
US8712188B2 (en) | System and method for document orientation detection | |
US9412164B2 (en) | Apparatus and methods for imaging system calibration | |
US8897598B1 (en) | Mosaicing documents for translation using video streams | |
US8811751B1 (en) | Method and system for correcting projective distortions with elimination steps on multiple levels | |
RU2631765C1 (en) | Method and system of correcting perspective distortions in images occupying double-page spread | |
US8897600B1 (en) | Method and system for determining vanishing point candidates for projective correction | |
MX2008011002A (en) | Model- based dewarping method and apparatus. | |
CN109948521B (en) | Image deviation rectifying method and device, equipment and storage medium | |
KR20130066819A (en) | Apparus and method for character recognition based on photograph image | |
US8913836B1 (en) | Method and system for correcting projective distortions using eigenpoints | |
US20120082372A1 (en) | Automatic document image extraction and comparison | |
CN111899270A (en) | Card frame detection method, device and equipment and readable storage medium | |
US9094617B2 (en) | Methods and systems for real-time image-capture feedback | |
Simon et al. | Correcting geometric and photometric distortion of document images on a smartphone | |
WO2015092059A1 (en) | Method and system for correcting projective distortions. | |
WO2017113290A1 (en) | Method and device for positioning one-dimensional code | |
CN114998347B (en) | Semiconductor panel corner positioning method and device | |
WO2008156686A2 (en) | Applying a segmentation engine to different mappings of a digital image | |
US20210281742A1 (en) | Document detections from video images | |
Bhaskar et al. | Implementing optical character recognition on the android operating system for business cards |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:PALM, LEON;ADAM, HARTWIG;REEL/FRAME:028554/0030 Effective date: 20120627 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044277/0001 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551) Year of fee payment: 4 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |