WO2010003192A1 - Integrated management and storage of hardcopy and softcopy documents - Google Patents

Integrated management and storage of hardcopy and softcopy documents Download PDF

Info

Publication number
WO2010003192A1
WO2010003192A1 PCT/AU2009/000894 AU2009000894W WO2010003192A1 WO 2010003192 A1 WO2010003192 A1 WO 2010003192A1 AU 2009000894 W AU2009000894 W AU 2009000894W WO 2010003192 A1 WO2010003192 A1 WO 2010003192A1
Authority
WO
WIPO (PCT)
Prior art keywords
document
hardcopy
softcopy
scanner
documents
Prior art date
Application number
PCT/AU2009/000894
Other languages
French (fr)
Inventor
Gil Hidas
Original Assignee
Mookika Pty Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2008903566A external-priority patent/AU2008903566A0/en
Application filed by Mookika Pty Ltd filed Critical Mookika Pty Ltd
Publication of WO2010003192A1 publication Critical patent/WO2010003192A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Definitions

  • the present invention relates to a system and method for integrated management and storage of hardcopy and softcopy documents.
  • the aim of the invention is to provide an improved method of handling hardcopy and softcopy documents that overcome or at least ameliorates one or more problems of the prior art.
  • a method for integrated management and storage of hardcopy and softcopy documents including the steps of: scanning a hardcopy document into a softcopy document; receiving the hardcopy document in a document box after scanning; recognising information in the softcopy document; classifying and indexing both the softcopy document and the hardcopy document based on the recognised information; storing the classification and index of both the softcopy document and the hardcopy document in a memory.
  • the indexing of the hardcopy document can include allocating a physical address to the hardcopy document.
  • the physical address can be a sequential physical position or a unique and identified physical file position in the document box.
  • the method can further include the step of automatically feeding the hardcopy document for scanning into the softcopy document.
  • the step of classifying and indexing the softcopy document can include categorising the ssooffttccooppyy ddooccuummeenntt iinn aa pprreeddeetteerrmmiinneedd ccaatteeggoorryy..
  • the steps of classifying and indexing both the hardcopy document and the softcopy document can be based on recognising information in the softcopy document.
  • the step of recognising information in the softcopy document can be performed by optical character recognition (OCR) and/or Intelligent Character Recognition (ICR) for hand written documents.
  • OCR optical character recognition
  • ICR Intelligent Character Recognition
  • the method can further include the step of combining the recognised information with a template.
  • the template can be selected from a predetermined template, a user-defined template, an intelligent template based at least in part on user behaviour, and combinations thereof.
  • the method can further include the step of processing both the hardcopy document and the softcopy document based on the template.
  • the step of receiving the hardcopy document in the document box can include applying an algorithm to control physical positioning of the hardcopy document in the document box.
  • the method can further include the step of retrieving the hardcopy document from the document box based on the indexing of the hardcopy document stored in the memory.
  • the method can further include the step of correlating the softcopy document with the hardcopy document based on the indexing of both the softcopy document and the hardcopy document stored in memory.
  • the present invention also provides a processor program product including processor readable instructions executable by a processor to perform the above method.
  • the present invention also provides a system for integrated management and storage of hardcopy and softcopy documents, the system including a scanner to scan a hardcopy document into a softcopy document, and a document box removably connected to the scanner to receive the hardcopy document from the scanner after scanning, wherein the system further includes a memory in communication with a processor programmed to: control the scanner to scan the hardcopy document independently of a computer; receive the hardcopy document in the document box after scanning; recognise information in the softcopy document; classify and index both the softcopy document and the hardcopy document based on the recognised information; store the classification and index of both the softcopy document and the hardcopy document in the memory.
  • ⁇ he indexing of the hardcopy document can include allocating a physical address to the hardcopy document.
  • the physical address can be a sequential physical position or a unique physical file position in the document box.
  • the document box can have an externally viewable index to visually indicate a plurality of sequential physical positions therein.
  • the system can further include a plurality of interchangeable document boxes, wherein the indexing of the hardcopy document is based on a unique identifier of a document box and a physical position therein.
  • the scanner can be removably connected above the document box, wherein the scanner and the document box form a cabinet when removably connected to one another.
  • the scanner can be a single-sided and/or a double-sided scanner.
  • the system can further include a user interface.
  • the system can further include a data interface.
  • the system can further include a document feeder to automatically feed the hardcopy document into the scanner.
  • the system can further include a desktop application executable on a computer.
  • the system can be accessible via a web browser.
  • the system can be accessible via a remote computer.
  • Figure 1 is flow chart of a method for integrated management of hardcopy and softcopy documents of one embodiment of the invention
  • Figure 2 is a photographic front view of a system for implementing the method
  • Figure 3 is a diagrammatic view of the use of the method for integrated management of hardcopy and softcopy documents of one embodiment of the invention
  • Figure 4 is a block diagram of hardware and software of the system in accordance with a first embodiment of the invention
  • Figure 5 is a block diagram of hardware and software of the system in accordance with a second embodiment of the invention
  • FIG. 6 is a block diagram of hardware and software of the system in accordance with a third embodiment of the invention.
  • FIG. 7 is a block diagram of electronics of the system in accordance with the first embodiment of the invention.
  • Figure 8 is a block diagrammatic view of the software (web interface ) of the system.
  • FIG. 9 is a diagrammatic Directed Acyclic Graph (DAG) view of the treatment of various inputs to the system in accordance with the invention.
  • Figure 10 is a diagrammatic view of the hierarchical indexing of inputted material in the system in accordance with the invention;
  • DAG Directed Acyclic Graph
  • Figure 11 is a diagrammatic view of the effect of the tagging of inputted material in the system in accordance with the invention.
  • Figure 12 is a diagrammatic view of the automatic collation of defined features of inputted material according to templates in the system in accordance with the invention.
  • one embodiment of a method 100 for integrated management of hardcopy and softcopy documents starts at step 102 by scanning a hardcopy document into a softcopy document.
  • the hardcopy document is, for example, a paper bill, invoice, bank statement, receipt, business card, notice, letter, photo, certificate, etc.
  • the scanning can be performed by, for example, optical scanning to provide an image of the hardcopy document.
  • the hardcopy document has, for example, one or more pages.
  • the method 100 moves to step 104 by receiving the hardcopy document in a document box after scanning.
  • the step 104 of receiving the hardcopy document in the document box is performed by applying an algorithm to control the physical placement of the hardcopy document in a physical position within the document box.
  • the document box provides physical storage of the hardcopy document, for example, for archival, record- keeping, tax, or accounting purposes.
  • step 106 information is recognised in the softcopy document, for example, by performing optical character recognition (OCR) on the softcopy document.
  • OCR optical character recognition
  • the method 100 then moves to step 108 by classifying and indexing both the softcopy document and the hardcopy document based on the recognised information.
  • the indexing of the softcopy document includes allocating a logical address of the softcopy document in the memory.
  • the indexing of the hardcopy document includes allocating a physical address to the hardcopy document.
  • the physical address can be a sequential physical position or a physical file position (or compartment) in the document box.
  • the classification and index of both the softcopy document and the hardcopy document in then stored in a memory, for example, an electronic memory storage device.
  • both the softcopy document and the hardcopy document are classified and indexed in predetermined categories, for example, bills, accounts, work, superannuation, car, shopping, business cards, bank statements, credit cards, medical certificates, legal documents, tax documents, etc.
  • the storing of the indexing and classifying of the softcopy document in the memory allows it to be managed, stored, and processed.
  • the information recognised in the softcopy document is selectively arranged into a template to enable searching, managing, tagging and processing of the softcopy document based on the template.
  • the template is, for example, a predetermined template, a user-defined template, an intelligent template based at least in part on user behaviour, and combinations thereof.
  • a softcopy document is indexed and categorised as a bill based on the recognised information.
  • the recognised information is then selectively arranged in a template, for example a bill payment template to enable processing, for example electronic bill payment, based on the template.
  • the storing of the indexing of the hardcopy document in the memory allows it to be managed, processed, and retrieved.
  • the hardcopy document is easily retrievable from the document box based on the indexing of the hardcopy document in the memory.
  • the storing of the indexing of both the hardcopy document and the softcopy document in the memory allows the hardcopy document to be easily retrieved from the document box and correlated with the softcopy document stored in the memory.
  • Figure 2 illustrate one embodiment of a system 200 for implementing the method 100.
  • the system 200 includes a scanner 202 removably connected above a document box 204.
  • the scanner 202 and the document box 204 are sized and dimensioned to form, for example, a desk-sized cabinet when connected to one another.
  • the housing of the scanner 202 and the document boxes 204 are formed, for example, as mouldings in plastics.
  • a user interface 206 for example a liquid crystal display (LCD) touch screen, is provided on the scanner 202.
  • LCD liquid crystal display
  • the scanner 202 is, for example, a double-sided (or duplex) scanner having an entry slot (not shown) fed by a document (or page) feeder 208.
  • the scanner 202 also has an exit slot (not shown) which feeds into the document box 204.
  • the document box 204 has an externally viewable index 210 to visually indicate sequential physical positions therein.
  • the system 200 includes a plurality of interchangeable document boxes 204 each having an identifier.
  • the interchangeable document boxes 204 have varying capacities depending on user requirements. For example, document boxes 204 for individual users have a storage capacity of, for example, 2000 hardcopy documents, while document boxes 204 for small businesses have a greater storage capacity of, for example, 5000 hardcopy documents.
  • the system 200 is configured to bypass the document box 204 after scanning to enable the hardcopy document to be returned to a user.
  • FIG. 3 there is a diagrammatic flow of procedure steps of the user, the scanner and the supporting "backend” software.
  • the paper is received and to paper collated and fed to the scanner.
  • the scanner undertakes its normal steps of feed paper, scan paper, Process and Storage. The result is an improved document processing and archiving not achievable??? by scanner or user alone.
  • the apparatus can use a scanner 202 which is a standalone in the sense that it is operable independently of a computer but includes or is attachable to a unit 212 that is designed around a printed circuit board 212 that includes a processor 214, a touch screen user interface 206, memory storage 216, memory 218, as well as wired and/or wireless data, network, and hardware interfaces 220,
  • the processor-memory subsystem of the system 200 may be implemented with other equivalent processor, memory, user interface, and data interface hardware to provide standalone operation.
  • the unit 212 can be an add-on unit suitable for retrofitting or operating scanners presently on the market.
  • the software used by the system 200 includes optics control software 226, image compression and filtering software 228, optical character recognition (OCR) software 230, document indexing logic software 232, page placement algorithm 234, as well as touchscreen interface software 236, web services application programming interfaces (APIs) software 238, and web-based interface software 240.
  • OCR optical character recognition
  • APIs application programming interfaces
  • a hardcopy (or paper) document 242 is automatically fed into the scanner 202 by the document feeder 208.
  • the hardcopy document 242 is scanned into a softcopy document by the processor 214 executing the optics control software 226 and the image compression and filtering software 228.
  • Both the hardcopy document and the softcopy document are then classified and indexed in a database in the memory 216 by the processor 214 executing the optical character recognition (OCR) software 230 and the document indexing logic software 232.
  • OCR optical character recognition
  • the softcopy document is classified, indexed and stored in the database stored in the memory 216 as a bill or a bank statement.
  • Information recognised in the softcopy document by OCR is also classified, indexed and stored in the database stored in the memory 216.
  • the processor 214 then executes the page placement algorithm 234 to control the physical placement of the hardcopy document 242 in a physical position within the document box 204.
  • the hardcopy document 240 is indexed in the document box 204 based on the physical position. Referring again to Figure 2, the indexing of the hardcopy document 242 in the document box 204 is visually indicated by the externally viewable index 210.
  • the processor 214 stores the indexing of the hardcopy document 240 in the database 246 to allow cross-indexing of the document box identifier and the hardcopy document 242 with the softcopy document.
  • the respective classifying and indexing of the hardcopy and softcopy documents integrates and streamlines their management, storage, processing, and retrieval.
  • the softcopy document, and the respective classifying and indexing of the hardcopy and softcopy documents, stored in the database 246 is accessed and manipulated via the touch screen 206 and touch screen interface software 236.
  • the software of the system 200 generates reports by date or document type based on the indexing in the database 246.
  • the software of the system 200 also allows information in the database 246 to be packaged, shared, and sent to other users or applications, for example, information relating to medical documents is packaged and shared with a doctor, information relating to accounting documents is packaged and sent to an accountant, information relating to legal documents is packaged and sent to a lawyer, etc.
  • the software further allows information stored in the database 246 to be exported to, and synchronised with, and backed up by, a computer via the data interfaces 220, 222, 224.
  • Information is exported from the database 246 in, for example, in Excel (xls) or comma separated value (csv) formats to an application program executing on the computer, for example, an accounting application or a web bill payment service.
  • the system 200 thereby acts as a gateway between hardcopy and software documents, and billing and accounting applications and web services.
  • the software of the system 200 allows faxing and emailing of softcopy documents and/or information recognised therefrom.
  • the software of the system 200 also includes a desktop application (not shown) executable on a computer (not shown) to synchronise and back up the contents of the memory storage 216 to a memory, for example a database, in the computer.
  • the system 200 is also accessible via a web browser executing on a computer.
  • the system 200 may be implemented with other equivalent software to provide standalone optical imaging, indexing, and web interface functionality.
  • the software of the system 200 provides virtual and logical storage capabilities for softcopy documents, similar to physical filing of hardcopy documents.
  • the software enables the indexing of softcopy documents in virtual cabinets having drawers and folders.
  • the software allows a user to create a cabinet called "Financial Year 2007/2008", with a "Bills” drawer and a -'Bank Statement " ' drawer. Within the Bills drawer the user has a "Phone Bills" folder where softcopy phone bills are placed, and a "Utilities Bills” folder where softcopy utilities bills are placed.
  • the software of the system 200 learns the behaviour of the user, for example, so that when a hardcopy phone bill is inserted into the system 200, the software prompts the user for permission to automatically place the softcopy bill in the "Phone Bills" folder.
  • Figures 5 and 6 provide two particular alternatives to the integrated configuration described in Figure 4.
  • Figure 6 shows an intelligent scanner having some aspects of the system integrated or in an attachable unit that undertakes the functions of the identification of document through the optical character recognition (OCR) software 230 as well as the document indexing logic software 232, and the touch screen interface software 236.
  • OCR optical character recognition
  • the unit can connect to a web based communication through web based interface 240 that connects to a remote "backend” software and fulfils the processing and management of the system and method for integrated management and storage of hardcopy and softcopy documents by remote database 246, with web service APIs 238 and soft copy files repository 250 with index/search engine 244.
  • backend software can therefore communicate back to the intelligent scanner and effect the page placement algorithm 234 for storage of the hardcopies in replaceable paper archive boxes 204.
  • Figure 6 allows the system and method for integrated management and storage of hardcopy and softcopy documents to be achieved by a standard scanner connected to communication means to allow file transfer and synchronisation algorithm 250 to feed to a remote server,.
  • the remote server maintains the functions of assessment of the hard copy and manipulation and storage of the soft copy and instructions or recordal?? of the manipulation of the hard copy.
  • the remote server there is the image compression and filtering 228, optical character recognition (OCR) software 230 and document indexing intelligent logic software 232.
  • OCR optical character recognition
  • the remote server includes softcopy documents by remote database 246, with web service APIs 238 and soft copy files repository 250 with index/search engine 244.
  • Such backend software can therefore communicate back to .the scanner and apparatus to effect the page placement algorithm 234 for storage of the hardcopies in replaceable paper archive boxes 204.
  • FIG. 7 there is a particular embodiment of the printed circuit board for the unit 212 that can be integral or connectable to the scanner to provide the intelligent scanner.
  • the process that provides an improved method of handling hardcopy and softcopy documents includes the process steps of: a. Inputting b. Storing i. identifying ii. indexing iii. Tagging c. Placement (of hard copy document) d. Retrieval (of soft copy and hard copy document)
  • a scanner can be used in a modified form to provide an integrated hardcopy and softcopy positional storage.
  • USB Univeral Serial Bus Intreface
  • ⁇ display the Univeral Serial Bus Intreface
  • this software program launches the scanadf program with the appropriate parameters in-order to scan the document and place the results in the right directory within the file repository.
  • Figure 9 shows a Directed Acyclic Graph (DAG) which describes a number of types of documents that can have processing broken down into basic units (actions). Each action depends on the original file type or on the results of previous actions, and provides an input to the subsequent actions.
  • DAG Directed Acyclic Graph
  • a range of visual formats can be inputted and the process is to convert to provide (but not limited to): i. Html form - which is used for text and search; ii. PDF form - which is used for presentation, communication and search; iii. Image form - which is used for presentation and iv. Thumbnail form - which is used as a presentation tool.
  • the inputted document is automatically processed by OCR or other means to determine content and likely category and classification of the document by fulfilling a predetermined assessment of the document.
  • This can include providing precedence to: o Headers; o Location of text; o Type of font; o Semantic of document content; o Other formatting - I j -
  • the documents in the repository are all referenced using universal unique identifiers. These identifiers (UUIDs) are allocated by the process manager or the web application (see below). Files representing the same document in different formats share the same UUID. The files are named using their UUID and an extension according to their format. They are also placed in a directory according to their format and a few other criteria.
  • UUIDs universal unique identifiers
  • various different kinds of notification mechanisms can be employed by the document-source monitors in order to notify the process manager of new documents.
  • the two main options are sockets and the inotify mechanism.
  • the receiver asks for notifications about changes in certain directories.
  • the sender places the new files in these agreed-upon directories.
  • the main differences between these two options are:
  • Sockets have the advantage of being able to pass additional parameters together with the name of the new file (e.g. whether a scanned document is part of a two- sided (duplex) scan or not).
  • the inotify mechanism is simpler to implement.
  • the simplicity of inotify means, among other things, that certain sources can operate as is, without having to explicitly send notifications and without having to add additional modules or code to detect the presence of new files. For example - the gui can simply place new files in the incoming_gui directory. If sockets are used, the gui will have to send a message over the socket, or another monitor will need to detect new files from the gui (using inotify) and send messages over the socket. The same applies to a shared filesystem or ftp source that may be added in the future.
  • a document can have a "parent" document.
  • an email attachment will have the body of the email as its parent.
  • the system includes a "parent" task that creates and launches child tasks for handling email attachments, multiple page documents or a double-sided document.
  • a predefined structure and conventions for storing the units documents allows us to easily locate different formats of the same document, and together with the database hold all the data that the user has entered into the system for quick and easy retrieval.
  • tags are introduced to the softcopy and hardcopy documents.
  • the system is able to add, remove and associate multiple 'Tags' per document.
  • Figure 11 document can be placed, searched and classified based on tags values.
  • Add Tags to document Edit Tags, and List Tags.
  • the software of the system 200 allows searching of both OCR values and/or tags of softcopy documents that are electronically stored in the database 246 of the system 200 by use of the index/search engine 244
  • embodiments of the invention bridge the disconnection between hardcopy and softcopy documents, improve the digital conversion process, and hence streamline the management and storage of both hardcopy and softcopy documents for individuals and small businesses.
  • the System consists of a plurality of modules including Image module, Search module, User module, View Document, Cabinet Management, Folder Management, Document Management, Intelligent Categorisation Module, Communication Module, Reporting Module, Application Programming Interface and Security Management.
  • the web-based interface software 240 can be an embedded device for smart document management. The unit can perform the following functions:
  • the web-based interface software 240 is composed of the following components:
  • Database holds information about the documents, how they are grouped together, who they belong to, etc.
  • Repository a directory structure with a convention for storing the documents.
  • User Interface able to present the available documents to the user in a hierarchic fashion, allows them to change data about the documents, their groupings, add tags, etc.
  • the backend - receives documents from various sources, applies a set of actions to them and places the results in the database and the repository.
  • the backend is composed of the following modules:
  • Document-sources and document source monitors (scanner-monitor, file system monitor and email-monitor). These modules monitor a specific source of documents and, when a document is available - it places it in a specific pre-defined directory within the repository. The process manager (see below) is then notified about a new document ready to be processed.
  • the following sources/monitors are implemented: a.
  • the scanner monitor (scanmond); b.
  • the email monitor (email-monitor); and c.
  • the process manager is the heart of the backend. It receives notifications about new documents and performs a predefined set of operations (tasks) on each document depending on it source, type and possibly other attributes.
  • the repository is a directory tree with a predefined structure and conventions for storing the units " documents. This structure and conventions allows us to easily locate different formats of the same document, and together with the database holds all the data that the user has entered into the system.
  • the process manager is composed of the following components:
  • Tasks - these are represented by objects of a class derived from the SCTask class.
  • the task contains all the information that relates to a particular document during processing, except for information that only relates to a single stage of processing
  • Actions - these object contain a reference to a task object and an "execute" method which performs a particular stage of processing - typically conversion from format A to format B, extraction of data X and storage on the task object, or insertion/modification of a database entry.
  • action classes should not depend on being used by a particular task class, and should be re-usable by different task classes.
  • Task factory This is a function that receives all the information about a new document (i.e its location, name, extension and source) when one arrives. This function then creates a task object that will handle the processing of the new document.
  • the SCQueue class implements a queue of actions (actually SCAction pointers) that are waiting for execution. This class also takes care of providing atomic access to the queue (i.e. it locks the queue when elements are being added or removed - in order to avoid race-conditions).
  • Main loop This monitors the repository for new documents. When one arrives it calls the task factory to create a new task object for handling the document. It then calls the task's execute method, which takes care of the rest (basically - putting the relevant actions on the relevant queue). The main loop is not part of the process manager class. 5. Execution threads - These are implemented by the SCThread class. The SCThread constructor receives a SCQueue as a parameter.
  • Process manager class The glue that connects all the above components, except for the main loop (the main loop works by calling process manager methods). It is a singleton that is implemented by the process manager class and provides the task factory, a thread factory, initialization code, schedule function, and database, repository and configuration file access functions.
  • the task factory determines what processing needs to be done for a document.
  • the specific task manages the dependencies of a particular set of actions. It also provides the context in which they operate, and determines how the processing will be done. 3.
  • the process manager class, and the particular configuration of queues and threads determine the scheduling policy (i.e. when each action will be performed).
  • the process manager class and the particular configuration of queues and threads determine how the actions will be scheduled.
  • the web-based interface software 240 is structured in System Modules as shown in Figure
  • Image Module - Image manipulation i.e. crop
  • GUI graphical user interface
  • Implementation of basic image manipulation operations such as, i. Zoom In/Zoom Out (and Implementation of slider with percentage showing on it).
  • GUI graphical user interface
  • GUI graphical user interface
  • b Implementation of Search based on document's properties, metadata and content.
  • c Implementation for highlighting keyword in return search summary text.
  • d graphical user interface (GUI) Implementation for "Advance Search”.
  • e Database implementation for Advance Search with following criteria, i. Boolean Search - It includes search with 'AND' and 'OR' tags. 'AND' performs search with All words matching to criteria while 'OR' performs search with one or more words, ii. Document type
  • GUI graphical user interface
  • GUI graphical user interface
  • Thumbnail view to see document with it.
  • Database implementation for Thumbnail view.
  • Cabinet Management a. Add new Cabinet -Database implementation for adding new Cabinet. b. Remove existing Cabinet. -Database implementation to remove existing Cabinet. c. Rename Cabinet-Database implementation to rename Cabinet.
  • Folder Management i. Add new Folder.- Database implementation to add new Folder, ii. Remove existing Folder.- Database implementation for the removal of existing Folder. iii. Rename Folder.- Database implementation to rename Folder. iv. Change Folder order. - Database implementation to change folders order.
  • Document is added into system.
  • - c. Remove Document i. Database implementation in order to remove document from the system,
  • d. Rename Document -Edit Document Properties (part of Edit Doc Properties)
  • i. Database implementation to rename a document.
  • ii. Dragging and dropping of documents with Thumbnail view
  • iii. Implementation for dragging and dropping of document with Thumbnail view.
  • GUI graphical user interface

Abstract

A system (200) for integrated management and storage of hardcopy and softcopy documents including a scanner (202) removably connected above a document box (204) wherein the system includes scanning a hardcopy document into a softcopy document, receiving the hardcopy document in a document box after scanning, recognising information in the softcopy document, classifying and indexing and storing the classification and index of both the softcopy document and the hardcopy document in a memory.

Description

INTEGRATED MANAGEMENT AND STORAGE OF HARDCOPY AND
SOFTCOPY DOCUMENTS
FIELD OF THE INVENTION The present invention relates to a system and method for integrated management and storage of hardcopy and softcopy documents.
BACKGROUND OF THE INVENTION
The widespread availability of scanners has enabled individuals and small businesses to digitally convert hardcopy documents - for example bills, invoices, bank statements, receipts, letters, certificates, notices, etc. - into softcopy documents that can be managed and stored digitally. However, individuals and small businesses must generally still manage and store a multitude of hardcopy documents individually for record-keeping purposes. This is time consuming and costly to the business.
Further, at present, an inefficient disconnection exists between softcopy and hardcopy documents. Managing, storing and retrieving hardcopy documents and matching them to softcopy documents involves time-consuming searches and paper shuffling. This places a heavy administrative burden on individuals and small businesses. In addition, the current process of digitising hardcopy documents is tedious, inefficient, and time consuming.
What is needed is a solution that bridges the disconnection between hardcopy and softcopy documents, improves the digital conversion process, and hence streamlines the management and storage of both hardcopy and softcopy documents for individuals and small businesses.
The aim of the invention is to provide an improved method of handling hardcopy and softcopy documents that overcome or at least ameliorates one or more problems of the prior art. SUMMARY OF THE INVENTION
According to one aspect of the present invention, there is provided a method for integrated management and storage of hardcopy and softcopy documents, the method including the steps of: scanning a hardcopy document into a softcopy document; receiving the hardcopy document in a document box after scanning; recognising information in the softcopy document; classifying and indexing both the softcopy document and the hardcopy document based on the recognised information; storing the classification and index of both the softcopy document and the hardcopy document in a memory.
The indexing of the hardcopy document can include allocating a physical address to the hardcopy document. The physical address can be a sequential physical position or a unique and identified physical file position in the document box.
The method can further include the step of automatically feeding the hardcopy document for scanning into the softcopy document.
The step of classifying and indexing the softcopy document can include categorising the ssooffttccooppyy ddooccuummeenntt iinn aa pprreeddeetteerrmmiinneedd ccaatteeggoorryy..
The steps of classifying and indexing both the hardcopy document and the softcopy document can be based on recognising information in the softcopy document. The step of recognising information in the softcopy document can be performed by optical character recognition (OCR) and/or Intelligent Character Recognition (ICR) for hand written documents.
The method can further include the step of combining the recognised information with a template. The template can be selected from a predetermined template, a user-defined template, an intelligent template based at least in part on user behaviour, and combinations thereof.
The method can further include the step of processing both the hardcopy document and the softcopy document based on the template.
The step of receiving the hardcopy document in the document box can include applying an algorithm to control physical positioning of the hardcopy document in the document box.
The method can further include the step of retrieving the hardcopy document from the document box based on the indexing of the hardcopy document stored in the memory.
The method can further include the step of correlating the softcopy document with the hardcopy document based on the indexing of both the softcopy document and the hardcopy document stored in memory.
The present invention also provides a processor program product including processor readable instructions executable by a processor to perform the above method.
The present invention also provides a system for integrated management and storage of hardcopy and softcopy documents, the system including a scanner to scan a hardcopy document into a softcopy document, and a document box removably connected to the scanner to receive the hardcopy document from the scanner after scanning, wherein the system further includes a memory in communication with a processor programmed to: control the scanner to scan the hardcopy document independently of a computer; receive the hardcopy document in the document box after scanning; recognise information in the softcopy document; classify and index both the softcopy document and the hardcopy document based on the recognised information; store the classification and index of both the softcopy document and the hardcopy document in the memory. ϊhe indexing of the hardcopy document can include allocating a physical address to the hardcopy document. The physical address can be a sequential physical position or a unique physical file position in the document box.
The document box can have an externally viewable index to visually indicate a plurality of sequential physical positions therein.
The system can further include a plurality of interchangeable document boxes, wherein the indexing of the hardcopy document is based on a unique identifier of a document box and a physical position therein.
The scanner can be removably connected above the document box, wherein the scanner and the document box form a cabinet when removably connected to one another.
The scanner can be a single-sided and/or a double-sided scanner. The system can further include a user interface. The system can further include a data interface. The system can further include a document feeder to automatically feed the hardcopy document into the scanner.
The system can further include a desktop application executable on a computer. The system can be accessible via a web browser. The system can be accessible via a remote computer.
BRIEF DESCRIPTION OF THE DRAWINGS
The invention will be further described by way of example only with reference to the accompanying drawings, in which:
Figure 1 is flow chart of a method for integrated management of hardcopy and softcopy documents of one embodiment of the invention; Figure 2 is a photographic front view of a system for implementing the method; Figure 3 is a diagrammatic view of the use of the method for integrated management of hardcopy and softcopy documents of one embodiment of the invention;
Figure 4 is a block diagram of hardware and software of the system in accordance with a first embodiment of the invention; Figure 5 is a block diagram of hardware and software of the system in accordance with a second embodiment of the invention;
Figure 6 is a block diagram of hardware and software of the system in accordance with a third embodiment of the invention;
Figure 7 is a block diagram of electronics of the system in accordance with the first embodiment of the invention;
Figure 8 is a block diagrammatic view of the software (web interface ) of the system;
Figure 9 is a diagrammatic Directed Acyclic Graph (DAG) view of the treatment of various inputs to the system in accordance with the invention; Figure 10 is a diagrammatic view of the hierarchical indexing of inputted material in the system in accordance with the invention;
Figure 11 is a diagrammatic view of the effect of the tagging of inputted material in the system in accordance with the invention; and
Figure 12 is a diagrammatic view of the automatic collation of defined features of inputted material according to templates in the system in accordance with the invention.
DETAILED DESCRIPTION
Referring to Figure 1, one embodiment of a method 100 for integrated management of hardcopy and softcopy documents starts at step 102 by scanning a hardcopy document into a softcopy document. The hardcopy document is, for example, a paper bill, invoice, bank statement, receipt, business card, notice, letter, photo, certificate, etc. The scanning can be performed by, for example, optical scanning to provide an image of the hardcopy document. The hardcopy document has, for example, one or more pages.
Next, the method 100 moves to step 104 by receiving the hardcopy document in a document box after scanning. The step 104 of receiving the hardcopy document in the document box is performed by applying an algorithm to control the physical placement of the hardcopy document in a physical position within the document box. The document box provides physical storage of the hardcopy document, for example, for archival, record- keeping, tax, or accounting purposes.
In step 106, information is recognised in the softcopy document, for example, by performing optical character recognition (OCR) on the softcopy document. The method 100 then moves to step 108 by classifying and indexing both the softcopy document and the hardcopy document based on the recognised information. The indexing of the softcopy document includes allocating a logical address of the softcopy document in the memory. The indexing of the hardcopy document includes allocating a physical address to the hardcopy document. The physical address can be a sequential physical position or a physical file position (or compartment) in the document box. At step 110, the classification and index of both the softcopy document and the hardcopy document in then stored in a memory, for example, an electronic memory storage device.
For example, based on the recognised information, both the softcopy document and the hardcopy document are classified and indexed in predetermined categories, for example, bills, accounts, work, superannuation, car, shopping, business cards, bank statements, credit cards, medical certificates, legal documents, tax documents, etc. The storing of the indexing and classifying of the softcopy document in the memory allows it to be managed, stored, and processed. The information recognised in the softcopy document is selectively arranged into a template to enable searching, managing, tagging and processing of the softcopy document based on the template. The template is, for example, a predetermined template, a user-defined template, an intelligent template based at least in part on user behaviour, and combinations thereof. For example, a softcopy document is indexed and categorised as a bill based on the recognised information. The recognised information is then selectively arranged in a template, for example a bill payment template to enable processing, for example electronic bill payment, based on the template. The storing of the indexing of the hardcopy document in the memory allows it to be managed, processed, and retrieved. For example, the hardcopy document is easily retrievable from the document box based on the indexing of the hardcopy document in the memory. The storing of the indexing of both the hardcopy document and the softcopy document in the memory allows the hardcopy document to be easily retrieved from the document box and correlated with the softcopy document stored in the memory.
Figure 2 illustrate one embodiment of a system 200 for implementing the method 100. Referring to Figure 2, the system 200 includes a scanner 202 removably connected above a document box 204. The scanner 202 and the document box 204 are sized and dimensioned to form, for example, a desk-sized cabinet when connected to one another. The housing of the scanner 202 and the document boxes 204 are formed, for example, as mouldings in plastics. A user interface 206, for example a liquid crystal display (LCD) touch screen, is provided on the scanner 202.
The scanner 202 is, for example, a double-sided (or duplex) scanner having an entry slot (not shown) fed by a document (or page) feeder 208. The scanner 202 also has an exit slot (not shown) which feeds into the document box 204. The document box 204 has an externally viewable index 210 to visually indicate sequential physical positions therein. The system 200 includes a plurality of interchangeable document boxes 204 each having an identifier. The interchangeable document boxes 204 have varying capacities depending on user requirements. For example, document boxes 204 for individual users have a storage capacity of, for example, 2000 hardcopy documents, while document boxes 204 for small businesses have a greater storage capacity of, for example, 5000 hardcopy documents. Optionally, the system 200 is configured to bypass the document box 204 after scanning to enable the hardcopy document to be returned to a user.
Referring to Figure 3, there is a diagrammatic flow of procedure steps of the user, the scanner and the supporting "backend" software. In particular the paper is received and to paper collated and fed to the scanner. The scanner undertakes its normal steps of feed paper, scan paper, Process and Storage. The result is an improved document processing and archiving not achievable??? by scanner or user alone.
In one form of the integrated unit as shown in Figure 4 the apparatus can use a scanner 202 which is a standalone in the sense that it is operable independently of a computer but includes or is attachable to a unit 212 that is designed around a printed circuit board 212 that includes a processor 214, a touch screen user interface 206, memory storage 216, memory 218, as well as wired and/or wireless data, network, and hardware interfaces 220,
222, 224, for example, Ethernet, WiFi, universal serial bus (USB) interfaces, etc.. The processor-memory subsystem of the system 200 may be implemented with other equivalent processor, memory, user interface, and data interface hardware to provide standalone operation.
Clearly it can be seen that the unit 212 can be an add-on unit suitable for retrofitting or operating scanners presently on the market.
The software used by the system 200 includes optics control software 226, image compression and filtering software 228, optical character recognition (OCR) software 230, document indexing logic software 232, page placement algorithm 234, as well as touchscreen interface software 236, web services application programming interfaces (APIs) software 238, and web-based interface software 240.
In use, a hardcopy (or paper) document 242 is automatically fed into the scanner 202 by the document feeder 208. The hardcopy document 242 is scanned into a softcopy document by the processor 214 executing the optics control software 226 and the image compression and filtering software 228. Both the hardcopy document and the softcopy document are then classified and indexed in a database in the memory 216 by the processor 214 executing the optical character recognition (OCR) software 230 and the document indexing logic software 232. For example, the softcopy document is classified, indexed and stored in the database stored in the memory 216 as a bill or a bank statement. Information recognised in the softcopy document by OCR is also classified, indexed and stored in the database stored in the memory 216. The processor 214 then executes the page placement algorithm 234 to control the physical placement of the hardcopy document 242 in a physical position within the document box 204. The hardcopy document 240 is indexed in the document box 204 based on the physical position. Referring again to Figure 2, the indexing of the hardcopy document 242 in the document box 204 is visually indicated by the externally viewable index 210. The processor 214 stores the indexing of the hardcopy document 240 in the database 246 to allow cross-indexing of the document box identifier and the hardcopy document 242 with the softcopy document. As discussed above, the respective classifying and indexing of the hardcopy and softcopy documents integrates and streamlines their management, storage, processing, and retrieval.
The softcopy document, and the respective classifying and indexing of the hardcopy and softcopy documents, stored in the database 246 is accessed and manipulated via the touch screen 206 and touch screen interface software 236. The software of the system 200 generates reports by date or document type based on the indexing in the database 246. The software of the system 200 also allows information in the database 246 to be packaged, shared, and sent to other users or applications, for example, information relating to medical documents is packaged and shared with a doctor, information relating to accounting documents is packaged and sent to an accountant, information relating to legal documents is packaged and sent to a lawyer, etc. The software further allows information stored in the database 246 to be exported to, and synchronised with, and backed up by, a computer via the data interfaces 220, 222, 224. Information is exported from the database 246 in, for example, in Excel (xls) or comma separated value (csv) formats to an application program executing on the computer, for example, an accounting application or a web bill payment service. The system 200 thereby acts as a gateway between hardcopy and software documents, and billing and accounting applications and web services. In addition, the software of the system 200 allows faxing and emailing of softcopy documents and/or information recognised therefrom.
The software of the system 200 also includes a desktop application (not shown) executable on a computer (not shown) to synchronise and back up the contents of the memory storage 216 to a memory, for example a database, in the computer. The system 200 is also accessible via a web browser executing on a computer. The system 200 may be implemented with other equivalent software to provide standalone optical imaging, indexing, and web interface functionality.
The software of the system 200, including the web application, provides virtual and logical storage capabilities for softcopy documents, similar to physical filing of hardcopy documents. For example, the software enables the indexing of softcopy documents in virtual cabinets having drawers and folders. For example, the software allows a user to create a cabinet called "Financial Year 2007/2008", with a "Bills" drawer and a -'Bank Statement"' drawer. Within the Bills drawer the user has a "Phone Bills" folder where softcopy phone bills are placed, and a "Utilities Bills" folder where softcopy utilities bills are placed. The software of the system 200 learns the behaviour of the user, for example, so that when a hardcopy phone bill is inserted into the system 200, the software prompts the user for permission to automatically place the softcopy bill in the "Phone Bills" folder. Figures 5 and 6 provide two particular alternatives to the integrated configuration described in Figure 4. In particular Figure 6 shows an intelligent scanner having some aspects of the system integrated or in an attachable unit that undertakes the functions of the identification of document through the optical character recognition (OCR) software 230 as well as the document indexing logic software 232, and the touch screen interface software 236. However the unit can connect to a web based communication through web based interface 240 that connects to a remote "backend" software and fulfils the processing and management of the system and method for integrated management and storage of hardcopy and softcopy documents by remote database 246, with web service APIs 238 and soft copy files repository 250 with index/search engine 244. Such backend software can therefore communicate back to the intelligent scanner and effect the page placement algorithm 234 for storage of the hardcopies in replaceable paper archive boxes 204.
Figure 6 allows the system and method for integrated management and storage of hardcopy and softcopy documents to be achieved by a standard scanner connected to communication means to allow file transfer and synchronisation algorithm 250 to feed to a remote server,. The remote server maintains the functions of assessment of the hard copy and manipulation and storage of the soft copy and instructions or recordal?? of the manipulation of the hard copy. In particular at the remote server there is the image compression and filtering 228, optical character recognition (OCR) software 230 and document indexing intelligent logic software 232. Further the remote server includes softcopy documents by remote database 246, with web service APIs 238 and soft copy files repository 250 with index/search engine 244. Such backend software can therefore communicate back to .the scanner and apparatus to effect the page placement algorithm 234 for storage of the hardcopies in replaceable paper archive boxes 204.
Referring to Figure 7 there is a particular embodiment of the printed circuit board for the unit 212 that can be integral or connectable to the scanner to provide the intelligent scanner.
The process that provides an improved method of handling hardcopy and softcopy documents includes the process steps of: a. Inputting b. Storing i. identifying ii. indexing iii. Tagging c. Placement (of hard copy document) d. Retrieval (of soft copy and hard copy document)
There are a number of types of documents that can be inputted into the system. This includes:
I. scanner input where hard documents are scanned II. email input where emailed documents are provided; and III. electronic input documents
In particular a scanner can be used in a modified form to provide an integrated hardcopy and softcopy positional storage.
For a scanner input the system monitors and reads "message" options arriving on the Univeral Serial Bus Intreface (USB) which contain a string whose first character is the value of the "function" button (÷display) on the scanner, followed by a separator and a '"simplex" or "duplex" indicator, depending on the scanner button that was pressed. When such a press is detected, this software program launches the scanadf program with the appropriate parameters in-order to scan the document and place the results in the right directory within the file repository.
Figure 9 shows a Directed Acyclic Graph (DAG) which describes a number of types of documents that can have processing broken down into basic units (actions). Each action depends on the original file type or on the results of previous actions, and provides an input to the subsequent actions.
In particular a range of visual formats can be inputted and the process is to convert to provide (but not limited to): i. Html form - which is used for text and search; ii. Pdf form - which is used for presentation, communication and search; iii. Image form - which is used for presentation and iv. Thumbnail form - which is used as a presentation tool.
In the identifying step the inputted document is automatically processed by OCR or other means to determine content and likely category and classification of the document by fulfilling a predetermined assessment of the document. This can include providing precedence to: o Headers; o Location of text; o Type of font; o Semantic of document content; o Other formatting - I j -
However it can also include review of preexisting index, documents templates, categories and comparison of document wording to indexes or other semantic search engine determinations.
The documents in the repository are all referenced using universal unique identifiers. These identifiers (UUIDs) are allocated by the process manager or the web application (see below). Files representing the same document in different formats share the same UUID. The files are named using their UUID and an extension according to their format. They are also placed in a directory according to their format and a few other criteria.
Conceptually, various different kinds of notification mechanisms can be employed by the document-source monitors in order to notify the process manager of new documents. The two main options are sockets and the inotify mechanism. When using the inotify mechanism, the receiver asks for notifications about changes in certain directories. The sender places the new files in these agreed-upon directories. The main differences between these two options are:
1. Sockets have the advantage of being able to pass additional parameters together with the name of the new file (e.g. whether a scanned document is part of a two- sided (duplex) scan or not). The inotify mechanism is simpler to implement. 2. The simplicity of inotify means, among other things, that certain sources can operate as is, without having to explicitly send notifications and without having to add additional modules or code to detect the presence of new files. For example - the gui can simply place new files in the incoming_gui directory. If sockets are used, the gui will have to send a message over the socket, or another monitor will need to detect new files from the gui (using inotify) and send messages over the socket. The same applies to a shared filesystem or ftp source that may be added in the future.
In some cases (namely email and, possibly, duplex scanned documents) a document can have a "parent" document. For example an email attachment will have the body of the email as its parent. The system includes a "parent" task that creates and launches child tasks for handling email attachments, multiple page documents or a double-sided document.
A predefined structure and conventions for storing the units documents allows us to easily locate different formats of the same document, and together with the database hold all the data that the user has entered into the system for quick and easy retrieval.
Also in the classify and index step 108 tags are introduced to the softcopy and hardcopy documents. The system is able to add, remove and associate multiple 'Tags' per document. As shown in Figure 11 document can be placed, searched and classified based on tags values. In a particular use in the contexts of tagging documents which relates to sub use cases including Add Tags to document, Edit Tags, and List Tags.
The software of the system 200 allows searching of both OCR values and/or tags of softcopy documents that are electronically stored in the database 246 of the system 200 by use of the index/search engine 244
It will be appreciated that embodiments of the invention bridge the disconnection between hardcopy and softcopy documents, improve the digital conversion process, and hence streamline the management and storage of both hardcopy and softcopy documents for individuals and small businesses.
Looking at the web-based interface software 240, the System consists of a plurality of modules including Image module, Search module, User module, View Document, Cabinet Management, Folder Management, Document Management, Intelligent Categorisation Module, Communication Module, Reporting Module, Application Programming Interface and Security Management. The web-based interface software 240 can be an embedded device for smart document management. The unit can perform the following functions:
1. receive documents from various sources 2. extract various pieces of information from each document
3. convert them to different formats 4. present them to the users through a web interface
5. allow the users to organize, search and manipulate information about these documents.
In order to achieve these goals the web-based interface software 240 is composed of the following components:
1. Database: holds information about the documents, how they are grouped together, who they belong to, etc.
2. Repository: a directory structure with a convention for storing the documents. 3. User Interface: able to present the available documents to the user in a hierarchic fashion, allows them to change data about the documents, their groupings, add tags, etc.
4. A search engine
5. The backend - receives documents from various sources, applies a set of actions to them and places the results in the database and the repository.
The backend is composed of the following modules:
1. Document-sources and document source monitors: (scanner-monitor, file system monitor and email-monitor). These modules monitor a specific source of documents and, when a document is available - it places it in a specific pre-defined directory within the repository. The process manager (see below) is then notified about a new document ready to be processed.
2. The following sources/monitors are implemented: a. The scanner monitor (scanmond); b. The email monitor (email-monitor); and c. The File System monitor.
3. The process manager is the heart of the backend. It receives notifications about new documents and performs a predefined set of operations (tasks) on each document depending on it source, type and possibly other attributes.
4. System scripts are used for starting and stopping the various modules, ensuring network connectivity, logging, periodic maintenance chores (crond) and any other general "glue" or supportive functions not done by one of the other modules. The repository is a directory tree with a predefined structure and conventions for storing the units" documents. This structure and conventions allows us to easily locate different formats of the same document, and together with the database holds all the data that the user has entered into the system.
The process manager is composed of the following components:
1. Tasks - these are represented by objects of a class derived from the SCTask class. The task contains all the information that relates to a particular document during processing, except for information that only relates to a single stage of processing
(a single action). Actions - these object contain a reference to a task object and an "execute" method which performs a particular stage of processing - typically conversion from format A to format B, extraction of data X and storage on the task object, or insertion/modification of a database entry. Ideally action classes should not depend on being used by a particular task class, and should be re-usable by different task classes.
2. Task factory - This is a function that receives all the information about a new document (i.e its location, name, extension and source) when one arrives. This function then creates a task object that will handle the processing of the new document.
3. Queue - The SCQueue class implements a queue of actions (actually SCAction pointers) that are waiting for execution. This class also takes care of providing atomic access to the queue (i.e. it locks the queue when elements are being added or removed - in order to avoid race-conditions). 4. Main loop - This monitors the repository for new documents. When one arrives it calls the task factory to create a new task object for handling the document. It then calls the task's execute method, which takes care of the rest (basically - putting the relevant actions on the relevant queue). The main loop is not part of the process manager class. 5. Execution threads - These are implemented by the SCThread class. The SCThread constructor receives a SCQueue as a parameter. The thread then removes actions from this queue and calls their execute() method. When the execute() method returns - the thread deletes the action object (this, in turn, causes the task to be notified. The task then schedules the next action(s) or deletes itself . 6. Process manager class - The glue that connects all the above components, except for the main loop (the main loop works by calling process manager methods). It is a singleton that is implemented by the process manager class and provides the task factory, a thread factory, initialization code, schedule function, and database, repository and configuration file access functions.
This design decouples the different process manager functionalities:
1. The task factory determines what processing needs to be done for a document.
2. The specific task manages the dependencies of a particular set of actions. It also provides the context in which they operate, and determines how the processing will be done. 3. The process manager class, and the particular configuration of queues and threads determine the scheduling policy (i.e. when each action will be performed).
The process manager class and the particular configuration of queues and threads determine how the actions will be scheduled.
The web-based interface software 240 is structured in System Modules as shown in Figure
8. Some of the requirements of the modules of the web-based interface software 240 are as follows:
1. Image Module - Image manipulation (i.e. crop) a. b. graphical user interface (GUI) implementation for document with multiple images. c. Implementation of basic image manipulation operations such as, i. Zoom In/Zoom Out (and Implementation of slider with percentage showing on it). ii. Print iii. Save iv. Implementation of different views (Thumbnail and List) to view image
2. Search Module a. graphical user interface (GUI) to make it readily searchable. b. Implementation of Search based on document's properties, metadata and content. c. Implementation for highlighting keyword in return search summary text. d. graphical user interface (GUI) Implementation for "Advance Search". e. Database implementation for Advance Search with following criteria, i. Boolean Search - It includes search with 'AND' and 'OR' tags. 'AND' performs search with All words matching to criteria while 'OR' performs search with one or more words, ii. Document type f. graphical user interface (GUI) implementation for quick view/snapshot of image for displayed next to the text search results.
3. View Documents a. graphical user interface (GUI) implementation for Thumbnail view to see document with it. b. Database implementation for Thumbnail view. c. Implementation to set a view as a default option when the application first starts up. If the user does not select a default option, the application would start with the last visited view that the user used.
4. Cabinet Management a. Add new Cabinet -Database implementation for adding new Cabinet. b. Remove existing Cabinet. -Database implementation to remove existing Cabinet. c. Rename Cabinet-Database implementation to rename Cabinet.
5. Folder Management i. Add new Folder.- Database implementation to add new Folder, ii. Remove existing Folder.- Database implementation for the removal of existing Folder. iii. Rename Folder.- Database implementation to rename Folder. iv. Change Folder order. - Database implementation to change folders order.
6. Document Management: a. Add Document -: Add Document from File i. Database implementation to add new Document. ii. graphical user interface (GUI) implementation to add new document's properties, iii. Implementation to upload files, b. Implementation for getting OCR value and intelligent tags while new
Document is added into system. - c. Remove Document i. Database implementation in order to remove document from the system, d. Rename Document -Edit Document Properties (part of Edit Doc Properties) i. Database implementation to rename a document. ii. Dragging and dropping of documents with Thumbnail view iii. Implementation for dragging and dropping of document with Thumbnail view.
7. User Module - Access Administration a. graphical user interface (GUI) implementation for managing user account to manage her/his virtual Cabinets/Folders and Documents. b. Implementation of type of User i. Two type of users like,
- Admin
- Staff ii. Assigning user rights according to type like 'Admin', 'Staff. c. Database implementation in order to manage user account such as, i. Add new User ii. Delete existing User iii. Edit profile iv. Change Password
Note: -Set user rights according to type of user like 'Admin', 'Staff . 8. Security Management - Security Administration a. Session check and other security measure before giving access to document (url). b. Password security management i. Implementation for password encryption algorithm. c. Forgot password management i. graphical user interface (GUI) implementation for "Forgot password". ii. Database implementation for Forgot Password, iii. Email implementation for sending new password via Email.
The embodiments have been described by way of example only and modifications within the spirit and understanding of the invention are included within the scope of the claims which follow.

Claims

1. A method for integrated management and storage of hardcopy and softcopy documents, the method including the steps of: scanning a hardcopy document into a softcopy document; placing the hardcopy document in a document box after scanning; recognising information in the softcopy document; J classifying and indexing both the softcopy document and the hardcopy document based on the recognised information; storing the classification and index of both the softcopy document and the hardcopy document in a memory; and allowing for retrieving of hard copy document from document box due to the classifying and indexing of both the softcopy document and the hardcopy document
2. A method according to claim 1, wherein the indexing of the hardcopy document includes allocating a physical address to the hardcopy document.
3. A method according to claim 2, wherein the physical address is a sequential physical position or a unique physical file position in the document box.
4. A method according to any preceding claim, further including the step of automatically feeding the hardcopy document for scanning into the softcopy document.
5. A method according to any preceding claim, wherein the step of classifying and indexing the softcopy document includes categorising and classifying the softcopy document in a predetermined categories.
6. A method according to any preceding claim, wherein the steps of classifying and indexing both the hardcopy document and the softcopy document are based on recognising information in the softcopy document. - ?? -
7. A method according to claim 6, wherein the step of recognising information in the softcopy document is performed by optical character recognition (OCR).
8. A method according to claim 6 or 7, further including the step of combining the recognised information with a template.
9. A method according to claim 8, wherein the template is selected from a predetermined template, a user-defined template, an intelligent template based at least in part on learnt user behaviour, and combinations thereof.
10. A method according to claim 8 or 9, further including the step of processing both the hardcopy document and the softcopy document based on the template.
11. A method according to any preceding claim, wherein the step of receiving the hardcopy document in the document box includes applying an algorithm to control physical positioning of the hardcopy document in the document box.
12. A method according to any preceding claim, further including the step of retrieving the hardcopy document from the document box based on the indexing of the hardcopy document stored in the memory.
13. A method according to any preceding claim, further including the step of correlating the softcopy document with the hardcopy document based on the indexing of both the softcopy document and the hardcopy document stored in the memory.
14. A processor program product including processor readable instructions executable by a processor to perform a method according to any preceding claim.
15. A system for integrated management and storage of hardcopy arid softcopy documents, the system including a scanner to scan a hardcopy document into a softcopy document, and a document box removably connected to the scanner to receive the hardcopy document from the scanner after scanning, wherein the system further includes a memory in communication with a processor programmed to: control the scanner to scan the hardcopy document independently of a computer; place the hardcopy document in the document box after scanning; recognise information in the softcopy document; classify and index both the softcopy document and the hardcopy document based on the recognised information; store the classification and index of both the softcopy document and the hardcopy document in the memory.
16. A system according to claim 15, wherein the recognising, classifying and storing of softcopies is integral with the scanner.
17. A system according to claim 15, wherein the recognising, classifying and storing of softcopies is by an attachment to the scanner.
18. A system according to claim 15, wherein the recognising, classifying and storing of softcopies is by communication with remote server wirelessly or over internet or telecommunication line.
19. A system according to claim 15, wherein the indexing of the hardcopy document includes allocating a physical address to the hardcopy document.
20. A system according to claim 19, wherein the physical address is a sequential physical position or a unique physical file position in the document box.
21. A system according to any one of claims 15 to 20, wherein the document box has an externally viewable index to visually indicate a plurality of sequential physical positions therein.
22. A system according to any one of claims 15 to 21, further including a plurality of interchangeable document boxes, wherein the indexing of the hardcopy document is based on a unique identifier of a document box and a physical position therein.
23. A system according to any one of claims 15 to 22, wherein the scanner is removably connected above the document box, and wherein the scanner and the document box form a cabinet when removably connected to one another.
24. A system according to any one of claims 15 to 23, wherein the scanner is a single and/or double-sided scanner.
25. A system according to any one of claims 15 to 24, further including a user interface.
26. A system according to any one of claims 15 to 25, further including a data interface.
27. A system according to any one of claims 15 to 26, further including a document feeder to automatically feed the hardcopy document into the scanner.
28. A system according to any one of claims 15 to 27, further including a desktop application executable on a computer.
29. A system according to any one of claims 15 to 28, wherein the system is accessible via a web browser.
30. An apparatus that can enable any scanner to become an 'intelligent' scanner according to claims 15-29
31. A system according to any one of claims 15 to 29 that can perform method for integrated management and storage of hardcopy and softcopy documents according to any one of claims 1 to 14 on a 'system cloud' or remote server.
32. A system according to any one of claims 15 to 29 that can perform method for integrated management and storage of hardcopy and softcopy documents according to any one of claims 1 to 14 to be performed substantially on a standalone apparatus.
33. A method for integrated management and storage of hardcopy and softcopy documents substantially as hereinbefore described with reference to the drawings.
34. A system for integrated management and storage of hardcopy and softcopy documents substantially as hereinbefore described with reference to the drawings.
35. An apparatus for integrated management and storage of hardcopy and softcopy documents substantially as hereinbefore described with reference to the drawings.
PCT/AU2009/000894 2008-07-11 2009-07-13 Integrated management and storage of hardcopy and softcopy documents WO2010003192A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2008903566 2008-07-11
AU2008903566A AU2008903566A0 (en) 2008-07-11 System and method for integrated management and storage of hardcopy and softcopy documents

Publications (1)

Publication Number Publication Date
WO2010003192A1 true WO2010003192A1 (en) 2010-01-14

Family

ID=41506598

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2009/000894 WO2010003192A1 (en) 2008-07-11 2009-07-13 Integrated management and storage of hardcopy and softcopy documents

Country Status (1)

Country Link
WO (1) WO2010003192A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015154946A1 (en) * 2014-04-11 2015-10-15 Fileee Gmbh Method for electronically and physically archiving documents
WO2016173571A1 (en) * 2015-04-28 2016-11-03 Qbo Gmbh Archiving system
US11138658B2 (en) * 2018-03-02 2021-10-05 Ranieri Ip, Llc Methods and apparatus for mortgage loan securitization based upon blockchain verified ledger entries
US11244391B2 (en) 2018-03-02 2022-02-08 Ranier IP, LLC Methods and apparatus for ingestion of legacy records into a mortgage servicing blockchain

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5344132A (en) * 1990-01-16 1994-09-06 Digital Image Systems Image based document processing and information management system and apparatus
US6263121B1 (en) * 1998-09-16 2001-07-17 Canon Kabushiki Kaisha Archival and retrieval of similar documents
US6744936B2 (en) * 1997-12-30 2004-06-01 Imagetag, Inc. Apparatus and method for simultaneously managing paper-based documents and digital images of the same
US6775422B1 (en) * 1996-06-27 2004-08-10 Papercomp, Inc. Systems, processes, and products for storage and retrieval of physical paper documents, electro-optically generated electronic documents, and computer generated electronic documents
US20040202386A1 (en) * 2003-04-11 2004-10-14 Pitney Bowes Incorporated Automatic paper to digital converter and indexer
US7373365B2 (en) * 2004-04-13 2008-05-13 Satyam Computer Services, Ltd. System and method for automatic indexing and archiving of paper documents

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5344132A (en) * 1990-01-16 1994-09-06 Digital Image Systems Image based document processing and information management system and apparatus
US6775422B1 (en) * 1996-06-27 2004-08-10 Papercomp, Inc. Systems, processes, and products for storage and retrieval of physical paper documents, electro-optically generated electronic documents, and computer generated electronic documents
US6744936B2 (en) * 1997-12-30 2004-06-01 Imagetag, Inc. Apparatus and method for simultaneously managing paper-based documents and digital images of the same
US6263121B1 (en) * 1998-09-16 2001-07-17 Canon Kabushiki Kaisha Archival and retrieval of similar documents
US20040202386A1 (en) * 2003-04-11 2004-10-14 Pitney Bowes Incorporated Automatic paper to digital converter and indexer
US7373365B2 (en) * 2004-04-13 2008-05-13 Satyam Computer Services, Ltd. System and method for automatic indexing and archiving of paper documents

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015154946A1 (en) * 2014-04-11 2015-10-15 Fileee Gmbh Method for electronically and physically archiving documents
CN106164910A (en) * 2014-04-11 2016-11-23 弗立有限公司 The electronically and physically archiving method of document
WO2016173571A1 (en) * 2015-04-28 2016-11-03 Qbo Gmbh Archiving system
US11138658B2 (en) * 2018-03-02 2021-10-05 Ranieri Ip, Llc Methods and apparatus for mortgage loan securitization based upon blockchain verified ledger entries
US20220027988A1 (en) * 2018-03-02 2022-01-27 Ranieri Ip, Llc Methods and apparatus for mortgage loan securitization based upon mortgage servicing stored on blockchain
US11244391B2 (en) 2018-03-02 2022-02-08 Ranier IP, LLC Methods and apparatus for ingestion of legacy records into a mortgage servicing blockchain
US11727484B2 (en) 2018-03-02 2023-08-15 Ranieri Ip, Llc Methods and apparatus for mortgage loan securitization based upon mortgage servicing stored on blockchain

Similar Documents

Publication Publication Date Title
US11734335B2 (en) Method and system for organizing digital files
EP2286340B1 (en) Content managing device and content managing method
US9002838B2 (en) Distributed capture system for use with a legacy enterprise content management system
US10114821B2 (en) Method and system to access to electronic business documents
US20080033969A1 (en) Electronic document management method and system
US9507758B2 (en) Collaborative matter management and analysis
US7788218B2 (en) Handling digital documents in a networked system using an e-mail server
US9390089B2 (en) Distributed capture system for use with a legacy enterprise content management system
US10110769B2 (en) Computer implemented system and method for managing a stack containing a plurality of documents
US9466025B2 (en) Method, apparatus and computer program product for loading content items
US9552377B2 (en) Method for naming image file
WO2010085428A1 (en) A system and method for managing a business process and business process content
CN110737630A (en) Method and device for processing electronic archive file, computer equipment and storage medium
US20140198350A1 (en) Methods and systems for handling multiple documents while scanning
WO2010003192A1 (en) Integrated management and storage of hardcopy and softcopy documents
US10110771B2 (en) Managing printed documents in a document processing system
JP6127597B2 (en) Information processing apparatus, control method thereof, and program
US10264159B2 (en) Managing printed documents in a document processing system
JP5245143B2 (en) Document management system and method
US20090063416A1 (en) Methods and systems for tagging a variety of applications
US10353989B1 (en) Method to allow switching of user interface layout based on context and configuration
JP6852390B2 (en) Information management equipment and programs
JP2008546068A (en) Apparatus and method that allow a user to manage multiple objects, especially paper documents
JP2001075954A (en) Electronic filing system and data registering method
KR101109425B1 (en) System of managing documents

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09793718

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 09793718

Country of ref document: EP

Kind code of ref document: A1