WO2004063863A2 - Document management apparatus, system and method - Google Patents
Document management apparatus, system and method Download PDFInfo
- Publication number
- WO2004063863A2 WO2004063863A2 PCT/US2004/000168 US2004000168W WO2004063863A2 WO 2004063863 A2 WO2004063863 A2 WO 2004063863A2 US 2004000168 W US2004000168 W US 2004000168W WO 2004063863 A2 WO2004063863 A2 WO 2004063863A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- document
- native
- user
- documents
- search
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/34—Browsing; Visualisation therefor
- G06F16/345—Summarisation for human users
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
Definitions
- the invention relates to an apparatus and method for storing, searching, retrieving, and delivering electronic documents, and a program product for implementing the same, for the purpose of managing a multiplicity of documents.
- a method for managing a plurality of native documents to be uploaded to a document management computer system includes determining a file type for each native document of the plurality of native documents, creating a fingerprint for each native document, de-duplicating each native document in accordance with the fingerprint, extracting data from each native document, associating extracted data with a corresponding native document, and distributing the plurality of native documents and extracted data substantially equally amongst a plurality of nodes of the document management computer system. By distributing native documents and extracted data substantially equally amongst the nodes, search processing time may be reduced.
- Another novel aspect includes a method for searching a plurality of native documents stored in a document management computer system having a plurality of computer nodes storing the plurality of native documents.
- the steps include defining search criteria for searching the plurality of native documents, executing in parallel searches in accordance with the search criteria for each computer cluster of the plurality of nodes, wherein each computer cluster scores each search result in accordance with the search criteria, ranking the search results in accordance with the score determined in each computer cluster, omitting certain documents represented by the search results in accordance with a user's predefined permission level, and displaying final search results to a user.
- a user's predefined permission level may protect documents that should not be viewed by the user conducting the search.
- a method for managing attributes of at least one native document produced from a search of a plurality of native documents stored in a document management computer system includes defining search criteria for searching the plurality of native documents, executing a search in accordance with the defined search criteria, displaying search results, and modifying document attributes of at least one document represented by the search results, and storing modified document attributes associated with the at least one document, wherein the modified document attributes are maintained for future searches.
- a user may apply a user-defined classification to be displayed when the corresponding document(s) is subsequently viewed.
- a method is disclosed for searching a plurality of native documents stored in a document management computer system.
- the steps include defining search criteria for searching the plurality of native documents, executing a search in accordance with the defined search criteria, displaying search results as links to data files representative of associated native documents, and selectively viewing a native document represented by at least one link of the search results displayed to the user. Accordingly, because information may be lost when the native document is converted to a data file, the native document nevertheless may be viewed for its original format.
- Other novel aspects include a method for producing search results of a plurality of native documents stored in a computer system in accordance with a user-defined search query.
- the server receives the user-defined search query, and sends a search query to the computer system in accordance with the user- defined search query. Search results are received from the computer system corresponding to the user-defined search query. Therefore, by attributing at least one user defined
- WDC99 862289-1.053217.0011 classification to at least one document represented by the search results received the user defined classification is displayed when the at least one document is later viewed.
- a method for producing search results of a plurality of native documents stored in a computer system in accordance with a user-defined search query There is provided a Website hosted by a server that interfaces with the computer system and a user connected via a user interface over a communication network. Under control of the user interface, search results of the plurality of native documents are displayed in accordance with the user-defined search query. In response to at least one user-defined classification selected by the user, the user-defined classification is attributed to at least one native document represented by the search results.
- an electronic document management system includes a plurality of computer nodes for storing a plurality of native documents, and a computer in communication with the plurality of computer nodes for receiving a plurality of input files to be uploaded to the plurality of computer nodes.
- the computer is configured to determine the type of native document for each of the plurality of input files, to assign a unique identification tag to each native document, and to eliminate duplicate native documents based on the unique identification tags, for producing a subset of input files to be uploaded to the plurality of computer nodes. Also, the subset of input files are distributed substantially equally amongst the plurality of computer nodes.
- an electronic document management system comprising a PC type computer connected in a parallel cluster, said computer using an operating system that stores electronic documents in a hard disk drive throughout the cluster, said operating system defining a document identification tag where each document is identified by its files extension that is converted to ASCII text and given a unique identification number, each of a plurality of documents having at least one of either meta-data, text or attachments identified for retrieval that are indexed for web-based retrieval from the cluster database, said
- FIG. 1 is a schematic diagram of a computer system used to implement the disclosed concepts.
- FIG. 2 illustrates a system for managing a plurality of documents to be loaded in the computer system of Fig. 1.
- FIG. 3 illustrates a flow diagram of a search to be implemented by the computer system of Fig. 1.
- Fig. 4 illustrates an exemplary webpage in which search criteria may be entered.
- FIG. 5 illustrates another exemplary webpage.
- Figs.6a-c illustrates pull-down menus of an exemplary webpage.
- Fig. 7 illustrates a flow diagram of a user initiated search.
- Fig. 8 illustrates an exemplary webpage and a search to be conducted.
- Fig. 9 illustrates an exemplary webpage displaying search results in accordance with the search criteria entered in the webpage of Fig. 8.
- Figs lOa-b illustrates a document selected from results of a search.
- Fig. 11 illustrates a flow diagram of various user-defined classification that may be applied to document(s) represented from a search.
- FIG. 1 illustrates an example of a computer system 10 in a cluster arrangement.
- the hardware of computer 12, computer 22, server 20, processor 18 and RAID-5 arrays Nl-Nn, each of which are connected to the computer system 10, are general purpose in nature, albeit with an appropriate network connection for communication via an intranet, the internet and/or other data networks.
- each such general-purpose computer typically comprises a central processor, an internal communication bus, various types of memory (RAM, ROM, EEPROM, cache memory, etc.), disk drives or other code and data storage systems, and one or more network interface cards or ports for communication purposes.
- RAID-5 arrays may be best suited for storing and managing a multiplicity of documents for at least one client. While the computer system 10 may include only one RAID-5 disk array, Fig. 1 illustrates the computer system 10 with one or more RAID-5 disk arrays, node Nl - node Nn, each of which includes a plurality of disk drives 14. In the alternative, each node Nl-Nn may be a single disk drive 14 or a grouping of disk drives 14 from one or more nodes Nl - Nn. Databases 16a-c may also be connected to the computer system 10. Other types of devices may be included in the computer system 10 that are not specifically shown in Fig. 1. The diversity of data storage devices used in data storage management systems lends itself to different user designs, specifications and customization. The computer system 10 illustrated by Fig. 1 shall not be limiting to the concepts discussed herein.
- Computer 12 and processor 18 may employ a Linux operating system, an open source code operating system.
- Processors 18 are connected to RAID-5 arrays, nodes Nl-Nn, in a
- WDC99 862289-1.053217.001 1 parallel manner, and each controls a respective RAID array.
- the total combined processing speed may be increased to super-computing levels by increasing the number of processors 18.
- Software operating on each node, Nl - Nn functions in such a manner that each hard disk drive 14 processes information as if it were part of a single large disk drive, and each computer processor functions as if it were a single processor. As a result, any data that may be lost due to malfunction of any one computer disk is automatically recovered by the other disks 14 of the raid array.
- the software functionalities of the computer system 10 involve programming, including executable code.
- the software code is executable by the general-purpose computer, explained above.
- the code and possibly the associated data records are stored within the general-purpose computer platform.
- the software may be stored at other locations and/or transported for loading into the appropriate general-purpose computer systems.
- the embodiments discussed further herein involve one or more software products in the form of one or more modules of code carried by at least one machine-readable medium. Execution of such code by a processor of the computer system 10 enables the platform to implement the catalog and/or software downloading functions, in essentially the manner performed in the embodiments discussed and illustrated herein.
- Non-volatile media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) operating as one of the server platform, discussed above.
- Volatile media include dynamic memory, such as main memory of such a computer platform.
- Physical transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media can take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and
- WDC99 862289-1.053217.001 1 infrared (IR) data communications Common forms of computer-readable media therefore include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, less commonly used media such as punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 10 may be accessible by an administrator via a stand-alone work station represented by computer 12.
- An internet server 20 interfaces with the computer system 10 to permit end-user access the system via the internet 24 through at least one user terminal 22.
- the computer system 10 is configured to manage large sets of documents for multiple clients, but limits user access to documents supplied by the associated client.
- Documents supplied by a client are uploaded to the computer system 10 using work station 12.
- Documents may be supplied in electronic form or in hard copy form. If in electronic form, a suitable drive 26 corresponding to the medium type is used to upload electronic documents to the computer system 10. Also, if documents are in hard copy, they may be scanned using scanner 28 and uploaded to the computer system 10.
- Fig. 2 illustrates a system for managing documents uploaded to the computer system 10.
- data is loaded into the computer system 10 via workstation 12.
- the file type discriminator 212 determines file types based on the file extension of each input file 210. If the file type is an archive, such as .zip, .tar, etc., archive extractor 214 extracts archived file(s). Again, the file type of the extracted documents are determined by the file type discriminator 212.
- each document is de-duplicated. More particularly, de- duplicator 218 compares the fingerprint of each input file 210 with other fingerprints corresponding to the other input files 210, and compares with the fingerprints of documents already stored in the computer system 10. If a match is found, the document to be uploaded is discarded, so as to prevent multiple documents from residing in the computer system 10.
- extractor 220 converts each native document 222 (corresponding to the input files 210 in original format) to at least a text file 224.
- Other files that may be generated include meta data files 226, XML files 228, and HTML files 230.
- Well known third party software packages may be used in this conversion process.
- Index er 232 creates a file association table for each native document that maintains the associations between each native document 222, converted documents 224-230, and attachments, if any, to the native document. These attachments commonly referred to as "children files.” While the file association table may be stored in any of the nodes Nl-Nn, other databases 16a-c may be used to maintain file association tables. Distributor 234 distributes native documents and converted documents substantially equally amongst the nodes of the computer system 10, after which time, the documents may be searched. [39] Referring back to Fig. 1, a three cluster arrangement is shown. In this example, about a third of the documents to be uploaded would be distributed to each node Nl - Nn of the computer system 10.
- Each processor 18 interfacing within each node Nn executes a search daemon for searching files in each node. Therefore, when a search is initiated by server 20, multiple processors 18 execute the search in parallel.
- the search daemon scores each document based on search criteria specified. Results from each search daemon can be
- WDC99 862289-1.053217.001 1 compared against results from other search daemons.
- Table 1 provides an example of search results produced by each search daemon.
- Server 20 receives search results from each processor 18 and merges the search results accordingly. Assuming that only the top five search results were requested, the search results may be compiled in the following manner.
- Fig. 3 illustrates a flow diagram of the search process initiated by server 20.
- server 20 receives a search query from a user via a user interface 22 over the internet 24.
- server 20 initiates the parallel query tool, i.e., server 20 causes each processor 18 to execute respective search queries in accordance with the search criteria received by server 20.
- server 20 receives the search results from each processor 18 of each cluster, e.g., as shown in Table 1.
- Users accessing the computer system may have pre-defined permission levels, e.g., on a scale of 1 to 5; 1 being the lowest level and 5 being the highest. Also, documents classifications may be assigned to each document on the same scale. Therefore, only documents that have a document classification equal to or less than the user's pre-defined
- WDC99 862289-1.053217.001 1 permission level may be viewed by the user. This allows one to restrict access to certain documents, especially those that are highly confidential.
- Table 3 provides an example of search results identical to those of Table 1, but with document classifications for each document.
- Step 316 server 20 compares each document classification with the user's predefined permission level, and in step 318 determines whether or not the user is permitted to view the document. If the user is restricted from reviewing a respective document, the document is ignored, Step 320. Conversely, if the user is permitted to view the document, the search result is categorized, in step 322. Steps 316 - 322 are repeated until the document classification for each document is compared against the user permission level.
- Table 4 lists search results compiled by server 20 in accordance with comparison with document classifications. Comparison with Table 2, discussed above, reveals starkly different search results due to the pre-defined user permission level. The italicized search results shown in Table 3 identifies the documents that would be ignored in Step 320 because of user permission level.
- Table 2 provides an example of the search results that would be sent to a user with a permission level 5 in Step 324.
- Step 326 a user may request to modify document attributes or display associated file types.
- Step 328 if received, an attribute table is modified accordingly and/or the associated file type, e.g. a native document, may be sent to the user.
- the attribute table may be created by the file type categorizer 216 of Fig. 2 when uploading native documents. In the alternative, the attribute table may be created when an attribute is first modified. Attribute tables may be stored in databases 16a-c or Raid arrays
- Fig. 4 illustrates a webpage displayed on a user interface 22 once a user has logged onto the computer system 10 via the internet 24 and server 20.
- the webpage includes field
- tabs may always be displayed and may include a Search tab 416, My Files tab 418, Inbox 420, Outbox 422 and Case
- FIG. 5 illustrates an example of a webpage displayed when the My Files tab 418 has been selected. As shown, both user-associated files, as well as files categorized in public folders.
- FIG. 6a illustrates criteria specified in the "My Files" pull down menu 610.
- document(s) may be associated with public folders.
- Fig. 6b shows selections for "Send copy to" pull down menu 612.
- various users are listed. By selecting another user, a link to the document will be sent to the other user's inbox for future viewing.
- Fig. 6c shows the attribute menu.
- various attributes may be assigned to documents selected.
- Fig. 7 illustrates a flow chart of a search from the end-user perspective.
- an end-user accesses the document management website, and downloads to a browser the webpage such as shown in Fig. 4.
- Step 712 a end-user enters search criteria in field 410,
- Step 714 search criteria is sent by the end-user interface 22 to server 20.
- server 20 Upon executing the query, server 20 produces search results in accordance with Steps 310 - 324 of Fig. 3 described above.
- Step 716 the search results are displayed to the end-user.
- the end-user has various options for categorizing, forwarding, or assigning an attribute to each document produced from the search.
- the end- user may select one or more documents from the search results (Step 718), and categorize the selected documents from the pull-down menu illustrated in Fig. 6a.
- the end-user may send selected documents to another end-user's inbox for future viewing, by selecting a end-user from the pull-down menu illustrated in Fig. 6b.
- the end- user may assign one or more attributes to the selected documents from the pull-down menu illustrated in Fig. 6c. In this manner, the end-user need not select individual documents for each modification.
- End-user actions at least represented by Figs 6a-6c are each generally referred to as "user defined classification.”
- Fig. 8 provides an example of search for documents concerning "split and business plan," entered by a end-user in the search criteria field 410.
- This search would be implemented in accordance with steps 710-714 of Fig. 7.
- Fig. 9 illustrates the search results displayed to the user in accordance with Step 716 of Fig. 7, and in accordance with Steps 310-324 of Fig. 3. Three links are displayed.
- a user may check one or more of the documents, and categorize, send a copy to another user, and/or assign attributes to the one or more checked documents using the pulldown menus. This is a highly effective way to manage large sets of documents without the need to view each individual document.
- a user may link to a document by selecting an associated link.
- Figs. lOa-b illustrate a document entitled "Compete and Privacy.doc" selected from a search.
- the converted text, html, or xml file is displayed.
- Fig. 11 a flow chart for attributing a user defined classification. More particularly, the user may add a comment (Step 1110) to be displayed when the document is later viewed.
- the user may designate the comment as either public or private, so that it may be viewed by all users associated with the respective account, or only by the user entering the comment, respectively (Step 1112). Also shown are the attributes already assigned to the document, 1010. In Step 1114, the user may modify already assigned attributes 1010 or designate new attributes 1012. The user may send a link to the selected document to ones inbox using the "Send copy to" pull-down menu. Also, the user may categorize the selected document using the "My Files" pull-down menu.
- the attribute table discussed above may be updated with user defined classifications. Subsequent searches and document retrieval will identify user defined classifications previously designated. As a result, large sets of documents may be searched and classified accordingly. In this manner, the need to repeatedly review each and every document, during a litigation, can be limited.
Abstract
The concepts herein address a novel document management computer system (10) including a cluster computer arrangement in which native documents may be stored substantially equally amongst each node. Also disclosed are methods for performing a search based on userdefined search criteria, as well as user-defined classifications that may be applied to documents represented by the search results.
Description
53217-013
DOCUMENT MANAGEMENT APPARATUS. SYSTEM AND METHOD
Related Applications
[01] This application claims priority from Provisional Application Serial No. 60/438,508 filed on January 8, 2003, entitled: "ELECTRONIC DOCUMENT MANAGEMENT", the entire disclosure of which is hereby incorporated by reference herein.
Field of the Invention
[02] The invention relates to an apparatus and method for storing, searching, retrieving, and delivering electronic documents, and a program product for implementing the same, for the purpose of managing a multiplicity of documents.
Background
[03] Many of today's businesses employ sophisticated document management systems for managing existing electronic documents. Despite this, there has not been developed a document management system for providing management services to both existing electronic documents and paper documents. Of particular importance is the need to provide an effective search tool for documents, for example, produced during litigation. Current products on the market permit users to scan paper-based documents or convert electronic documents to a standard format, such as TIFF. However, conversion of tremendous amounts of documents can be time consuming, and expensive. Moreover, document conversion does not reliably maintain all information in a respective document across the many types of file types that may be examined.
[04] Also, in court litigation and regulatory proceedings, prior electronic document management structures and methods to store, search, retrieve and deliver electronic
documents generally require a constrained format to accomplish the necessary functions to achieve effective electronic document management. This is particularly the case in litigation matters where a party before a court needs to organize a multiplicity of documents into a manageable electronic document system. In such litigation, the documents take a variety of formats and structures ranging from letters to detailed reports so that a rigid format may not provide the accessibility and precise recall of critical information for litigation. [05] A document management system is needed to alleviate the above mentioned problems.
Summary
[06] The concepts disclosed herein alleviate the above noted problems. [07] More particularly, a method for managing a plurality of native documents to be uploaded to a document management computer system, includes determining a file type for each native document of the plurality of native documents, creating a fingerprint for each native document, de-duplicating each native document in accordance with the fingerprint, extracting data from each native document, associating extracted data with a corresponding native document, and distributing the plurality of native documents and extracted data substantially equally amongst a plurality of nodes of the document management computer system. By distributing native documents and extracted data substantially equally amongst the nodes, search processing time may be reduced.
[08] Another novel aspect includes a method for searching a plurality of native documents stored in a document management computer system having a plurality of computer nodes storing the plurality of native documents. The steps include defining search criteria for searching the plurality of native documents, executing in parallel searches in accordance with the search criteria for each computer cluster of the plurality of nodes, wherein each computer cluster scores each search result in accordance with the search criteria, ranking the search results in accordance with the score determined in each computer cluster, omitting certain documents represented by the search results in accordance with a user's predefined
permission level, and displaying final search results to a user. As a result, depending on a user's predefined permission level may protect documents that should not be viewed by the user conducting the search.
[09] In yet another novel aspect, disclosed is a method for managing attributes of at least one native document produced from a search of a plurality of native documents stored in a document management computer system. The steps include defining search criteria for searching the plurality of native documents, executing a search in accordance with the defined search criteria, displaying search results, and modifying document attributes of at least one document represented by the search results, and storing modified document attributes associated with the at least one document, wherein the modified document attributes are maintained for future searches. As a result, a user may apply a user-defined classification to be displayed when the corresponding document(s) is subsequently viewed. [10] In even yet another novel aspect, a method is disclosed for searching a plurality of native documents stored in a document management computer system. The steps include defining search criteria for searching the plurality of native documents, executing a search in accordance with the defined search criteria, displaying search results as links to data files representative of associated native documents, and selectively viewing a native document represented by at least one link of the search results displayed to the user. Accordingly, because information may be lost when the native document is converted to a data file, the native document nevertheless may be viewed for its original format.
[11] Other novel aspects include a method for producing search results of a plurality of native documents stored in a computer system in accordance with a user-defined search query. There is provided at least one server in communication with the computer system for storing the plurality of native documents to be searched. The server receives the user-defined search query, and sends a search query to the computer system in accordance with the user- defined search query. Search results are received from the computer system corresponding to the user-defined search query. Therefore, by attributing at least one user defined
WDC99 862289-1.053217.0011
classification to at least one document represented by the search results received, the user defined classification is displayed when the at least one document is later viewed. [12] Moreover, there is disclosed a method for producing search results of a plurality of native documents stored in a computer system in accordance with a user-defined search query. There is provided a Website hosted by a server that interfaces with the computer system and a user connected via a user interface over a communication network. Under control of the user interface, search results of the plurality of native documents are displayed in accordance with the user-defined search query. In response to at least one user-defined classification selected by the user, the user-defined classification is attributed to at least one native document represented by the search results. Thus, the user-defined attribute is displayed when the link representing the at least one native document is later viewed. [13] In still another novel aspect, an electronic document management system is disclosed. It includes a plurality of computer nodes for storing a plurality of native documents, and a computer in communication with the plurality of computer nodes for receiving a plurality of input files to be uploaded to the plurality of computer nodes. The computer is configured to determine the type of native document for each of the plurality of input files, to assign a unique identification tag to each native document, and to eliminate duplicate native documents based on the unique identification tags, for producing a subset of input files to be uploaded to the plurality of computer nodes. Also, the subset of input files are distributed substantially equally amongst the plurality of computer nodes.
[14] In yet another novel aspect, an electronic document management system comprising a PC type computer connected in a parallel cluster, said computer using an operating system that stores electronic documents in a hard disk drive throughout the cluster, said operating system defining a document identification tag where each document is identified by its files extension that is converted to ASCII text and given a unique identification number, each of a plurality of documents having at least one of either meta-data, text or attachments identified for retrieval that are indexed for web-based retrieval from the cluster database, said
WDC99 862289- 1.053217.0011
identification of the plurality of documents forming a cluster data base that is web-searchable by use of a predetermined descriptive term.
[15] The foregoing and other features, aspects, and advantages of the present invention will become more apparent from the following detailed description of the present invention when taken in conjunction with the accompanying drawings.
Brief Description of the Drawings
[16] Fig. 1 is a schematic diagram of a computer system used to implement the disclosed concepts.
[17] Fig. 2 illustrates a system for managing a plurality of documents to be loaded in the computer system of Fig. 1.
[18] Fig. 3 illustrates a flow diagram of a search to be implemented by the computer system of Fig. 1.
[19] Fig. 4 illustrates an exemplary webpage in which search criteria may be entered.
[20] Fig. 5 illustrates another exemplary webpage.
[21] Figs.6a-c illustrates pull-down menus of an exemplary webpage.
[22] Fig. 7 illustrates a flow diagram of a user initiated search.
[23] Fig. 8 illustrates an exemplary webpage and a search to be conducted.
[24] Fig. 9 illustrates an exemplary webpage displaying search results in accordance with the search criteria entered in the webpage of Fig. 8.
[25] Figs lOa-b illustrates a document selected from results of a search.
[26] Fig. 11 illustrates a flow diagram of various user-defined classification that may be applied to document(s) represented from a search.
Description
[27] Management of large amounts of documents may require a sophisticated computer system. While a PC or server may be used to manage^ relatively small set of documents, storage and computing capacity becomes a major limitation when managing a large set of documents, especially if enhanced searching capabilities are implemented. In accordance
WDC99 862289-1.053217.001 1
with the novel concepts discussed herein, electronic documents may be maintained by a computer cluster. Computer systems of this nature are easily scalable, allowing the addition of new nodes including one or more computer clusters when more storage capacity and computing power is needed. Also, these types of computing systems are redundant. If a cluster fails, the computer system remains functional. Other advantages of cluster computing will be discussed further herein.
[28] FIG. 1 illustrates an example of a computer system 10 in a cluster arrangement. The hardware of computer 12, computer 22, server 20, processor 18 and RAID-5 arrays Nl-Nn, each of which are connected to the computer system 10, are general purpose in nature, albeit with an appropriate network connection for communication via an intranet, the internet and/or other data networks. As known in the data processing and communications arts, each such general-purpose computer typically comprises a central processor, an internal communication bus, various types of memory (RAM, ROM, EEPROM, cache memory, etc.), disk drives or other code and data storage systems, and one or more network interface cards or ports for communication purposes.
[29] RAID-5 arrays may be best suited for storing and managing a multiplicity of documents for at least one client. While the computer system 10 may include only one RAID-5 disk array, Fig. 1 illustrates the computer system 10 with one or more RAID-5 disk arrays, node Nl - node Nn, each of which includes a plurality of disk drives 14. In the alternative, each node Nl-Nn may be a single disk drive 14 or a grouping of disk drives 14 from one or more nodes Nl - Nn. Databases 16a-c may also be connected to the computer system 10. Other types of devices may be included in the computer system 10 that are not specifically shown in Fig. 1. The diversity of data storage devices used in data storage management systems lends itself to different user designs, specifications and customization. The computer system 10 illustrated by Fig. 1 shall not be limiting to the concepts discussed herein.
[30] Computer 12 and processor 18 may employ a Linux operating system, an open source code operating system. Processors 18 are connected to RAID-5 arrays, nodes Nl-Nn, in a
WDC99 862289-1.053217.001 1
parallel manner, and each controls a respective RAID array. The total combined processing speed may be increased to super-computing levels by increasing the number of processors 18. Software operating on each node, Nl - Nn, functions in such a manner that each hard disk drive 14 processes information as if it were part of a single large disk drive, and each computer processor functions as if it were a single processor. As a result, any data that may be lost due to malfunction of any one computer disk is automatically recovered by the other disks 14 of the raid array.
[31] The software functionalities of the computer system 10 involve programming, including executable code. The software code is executable by the general-purpose computer, explained above. In operation, the code and possibly the associated data records are stored within the general-purpose computer platform. At other times, however, the software may be stored at other locations and/or transported for loading into the appropriate general-purpose computer systems. Hence, the embodiments discussed further herein involve one or more software products in the form of one or more modules of code carried by at least one machine-readable medium. Execution of such code by a processor of the computer system 10 enables the platform to implement the catalog and/or software downloading functions, in essentially the manner performed in the embodiments discussed and illustrated herein.
[32] As used herein, terms such as computer or machine "readable medium" refer to any medium that participates in providing instructions to a processor for execution. Such a medium may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media. Non-volatile media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) operating as one of the server platform, discussed above. Volatile media include dynamic memory, such as main memory of such a computer platform. Physical transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system. Carrier-wave transmission media can take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and
WDC99 862289-1.053217.001 1
infrared (IR) data communications. Common forms of computer-readable media therefore include, for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD, any other optical medium, less commonly used media such as punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer can read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
[33] Referring again to Fig. 1, the computer system 10 may be accessible by an administrator via a stand-alone work station represented by computer 12. An internet server 20 interfaces with the computer system 10 to permit end-user access the system via the internet 24 through at least one user terminal 22.
[34] The computer system 10 is configured to manage large sets of documents for multiple clients, but limits user access to documents supplied by the associated client. Documents supplied by a client are uploaded to the computer system 10 using work station 12. Documents may be supplied in electronic form or in hard copy form. If in electronic form, a suitable drive 26 corresponding to the medium type is used to upload electronic documents to the computer system 10. Also, if documents are in hard copy, they may be scanned using scanner 28 and uploaded to the computer system 10.
[35] Fig. 2 illustrates a system for managing documents uploaded to the computer system 10. First, data is loaded into the computer system 10 via workstation 12. Next, the file type discriminator 212 determines file types based on the file extension of each input file 210. If the file type is an archive, such as .zip, .tar, etc., archive extractor 214 extracts archived file(s). Again, the file type of the extracted documents are determined by the file type discriminator 212.
[36] Often clients periodically upload documents to the computer system 10 and provide large document sets to be uploaded at any one time. As a result, duplicate documents may be
DC99 862289-1.053217.001 1
stored in the computer system 10. Also, duplicate documents may exist amongst the documents to be uploaded. Before distributing input files 210 in the computer system 10, file categorizer 216 creates a fingerprint of each file. Well known cryptographic algorithms, such as the MD5 checksum, may be used to create a fingerprint unique to each file. In accordance with the fingerprint, each document is de-duplicated. More particularly, de- duplicator 218 compares the fingerprint of each input file 210 with other fingerprints corresponding to the other input files 210, and compares with the fingerprints of documents already stored in the computer system 10. If a match is found, the document to be uploaded is discarded, so as to prevent multiple documents from residing in the computer system 10. [37] After the documents to be uploaded have been de-duplicated, extractor 220 converts each native document 222 (corresponding to the input files 210 in original format) to at least a text file 224. Other files that may be generated include meta data files 226, XML files 228, and HTML files 230. Well known third party software packages may be used in this conversion process.
[38] Index er 232 creates a file association table for each native document that maintains the associations between each native document 222, converted documents 224-230, and attachments, if any, to the native document. These attachments commonly referred to as "children files." While the file association table may be stored in any of the nodes Nl-Nn, other databases 16a-c may be used to maintain file association tables. Distributor 234 distributes native documents and converted documents substantially equally amongst the nodes of the computer system 10, after which time, the documents may be searched. [39] Referring back to Fig. 1, a three cluster arrangement is shown. In this example, about a third of the documents to be uploaded would be distributed to each node Nl - Nn of the computer system 10. Each processor 18 interfacing within each node Nn executes a search daemon for searching files in each node. Therefore, when a search is initiated by server 20, multiple processors 18 execute the search in parallel. The search daemon scores each document based on search criteria specified. Results from each search daemon can be
WDC99 862289-1.053217.001 1
compared against results from other search daemons. For example, Table 1 provides an example of search results produced by each search daemon.
Table 1
[40] Server 20 receives search results from each processor 18 and merges the search results accordingly. Assuming that only the top five search results were requested, the search results may be compiled in the following manner.
Table 2
[41] In more detail, Fig. 3 illustrates a flow diagram of the search process initiated by server 20. First, in Step 310, server 20 receives a search query from a user via a user interface 22 over the internet 24. In Step 312, server 20 initiates the parallel query tool, i.e., server 20 causes each processor 18 to execute respective search queries in accordance with the search criteria received by server 20. In Step 314, server 20 receives the search results from each processor 18 of each cluster, e.g., as shown in Table 1.
[42] Users accessing the computer system may have pre-defined permission levels, e.g., on a scale of 1 to 5; 1 being the lowest level and 5 being the highest. Also, documents classifications may be assigned to each document on the same scale. Therefore, only documents that have a document classification equal to or less than the user's pre-defined
WDC99 862289-1.053217.001 1
permission level may be viewed by the user. This allows one to restrict access to certain documents, especially those that are highly confidential. Table 3 provides an example of search results identical to those of Table 1, but with document classifications for each document.
Table 3
[43] In Step 316, server 20 compares each document classification with the user's predefined permission level, and in step 318 determines whether or not the user is permitted to view the document. If the user is restricted from reviewing a respective document, the document is ignored, Step 320. Conversely, if the user is permitted to view the document, the search result is categorized, in step 322. Steps 316 - 322 are repeated until the document classification for each document is compared against the user permission level. [44] Assuming that a user has a permission level of 3, Table 4 lists search results compiled by server 20 in accordance with comparison with document classifications. Comparison with Table 2, discussed above, reveals starkly different search results due to the pre-defined user permission level. The italicized search results shown in Table 3 identifies the documents that would be ignored in Step 320 because of user permission level.
Table 4
DC99 862289-1.053217.0011
[45] Conversely, Table 2 provides an example of the search results that would be sent to a user with a permission level 5 in Step 324.
[46] Described in more detail below, in Step 326, a user may request to modify document attributes or display associated file types. In Step 328, if received, an attribute table is modified accordingly and/or the associated file type, e.g. a native document, may be sent to the user. The attribute table may be created by the file type categorizer 216 of Fig. 2 when uploading native documents. In the alternative, the attribute table may be created when an attribute is first modified. Attribute tables may be stored in databases 16a-c or Raid arrays
Nl-Nn.
[47] Fig. 4 illustrates a webpage displayed on a user interface 22 once a user has logged onto the computer system 10 via the internet 24 and server 20. The webpage includes field
410, in which the user may enter search criteria for initiating a search. Also provided are links to an advanced search 412 and comparison search 414 for different types of searches.
Regardless of the page in which the user links, numerous tabs may always be displayed and may include a Search tab 416, My Files tab 418, Inbox 420, Outbox 422 and Case
Summaries 424.
[48] Fig. 5 illustrates an example of a webpage displayed when the My Files tab 418 has been selected. As shown, both user-associated files, as well as files categorized in public folders.
[49] Three pull down menus are available, and permit various user actions on selected documents. Fig. 6a illustrates criteria specified in the "My Files" pull down menu 610.
Here, document(s) may be associated with public folders. Fig. 6b shows selections for "Send copy to" pull down menu 612. Here, various users are listed. By selecting another user, a link to the document will be sent to the other user's inbox for future viewing. Fig. 6c shows the attribute menu. Here, various attributes may be assigned to documents selected.
[50] Fig. 7 illustrates a flow chart of a search from the end-user perspective. In Step 710, an end-user accesses the document management website, and downloads to a browser the webpage such as shown in Fig. 4. In Step 712, a end-user enters search criteria in field 410,
WDC99 862289-1.053217.001 1
and in Step 714, search criteria is sent by the end-user interface 22 to server 20. Upon executing the query, server 20 produces search results in accordance with Steps 310 - 324 of Fig. 3 described above. In Step 716, the search results are displayed to the end-user. As mentioned in connection with Figs. 6a-c, the end-user has various options for categorizing, forwarding, or assigning an attribute to each document produced from the search. The end- user may select one or more documents from the search results (Step 718), and categorize the selected documents from the pull-down menu illustrated in Fig. 6a. Also, the end-user may send selected documents to another end-user's inbox for future viewing, by selecting a end- user from the pull-down menu illustrated in Fig. 6b. Moreover, the end- user may assign one or more attributes to the selected documents from the pull-down menu illustrated in Fig. 6c. In this manner, the end-user need not select individual documents for each modification. End-user actions at least represented by Figs 6a-6c are each generally referred to as "user defined classification."
[51] For example, Fig. 8 provides an example of search for documents concerning "split and business plan," entered by a end-user in the search criteria field 410. This search would be implemented in accordance with steps 710-714 of Fig. 7. Fig. 9 illustrates the search results displayed to the user in accordance with Step 716 of Fig. 7, and in accordance with Steps 310-324 of Fig. 3. Three links are displayed. Instead of selecting the documents individually, a user may check one or more of the documents, and categorize, send a copy to another user, and/or assign attributes to the one or more checked documents using the pulldown menus. This is a highly effective way to manage large sets of documents without the need to view each individual document.
[52] If more information is needed for any particular document, a user may link to a document by selecting an associated link. Figs. lOa-b illustrate a document entitled "Compete and Privacy.doc" selected from a search. When a user selects the document, the converted text, html, or xml file is displayed.
[53] Fig. 11 a flow chart for attributing a user defined classification. More particularly, the user may add a comment (Step 1110) to be displayed when the document is later viewed.
VVDC99 862289-1.053217.001 1
Also, the user may designate the comment as either public or private, so that it may be viewed by all users associated with the respective account, or only by the user entering the comment, respectively (Step 1112). Also shown are the attributes already assigned to the document, 1010. In Step 1114, the user may modify already assigned attributes 1010 or designate new attributes 1012. The user may send a link to the selected document to ones inbox using the "Send copy to" pull-down menu. Also, the user may categorize the selected document using the "My Files" pull-down menu.
[54] Also displayed are links 1014 to children files, i.e., files that were attached to the native document 1016, which the user may select. Even yet another novel characteristics is the ability to retrieve the native document 1016, i.e., the document in its original format. The user need only click on the "View Native Format" button 1016, and at this time, the native format is downloaded to the user's computer. For security and integrity, the user may not upload the copy downloaded.
[55] The attribute table discussed above may be updated with user defined classifications. Subsequent searches and document retrieval will identify user defined classifications previously designated. As a result, large sets of documents may be searched and classified accordingly. In this manner, the need to repeatedly review each and every document, during a litigation, can be limited.
[56] Although the present invention has been described and illustrated in detail, it is to be clearly understood that the same is by way of illustration and example only and is not to be taken by way of limitation, the scope of the present invention being limited only by the terms of the appended claims.
WDC99 862289-1.053217.0011
Claims
1. A method for managing a plurality of native documents to be uploaded to a document management computer system, the steps comprising: a) determining a file type for each native document of the plurality of native documents; b) creating a fingerprint for each native document; c) de-duplicating each native document in accordance with the fingerprint; d) extracting data from each native document; e) associating extracted data with a corresponding native document; and f) distributing the plurality of native documents and extracted data substantially equally amongst a plurality of nodes of the document management computer system.
2. The method for managing the plurality of native documents according to claim 1, further comprising the step of extracting native document(s) included in the plurality of documents from an archive file.
3. The method for managing the plurality of native documents according to claim 1, wherein the fingerprint for each native document is created using a MD5 checksum.
4. The method for managing the plurality of native documents according to claim 1, wherein step (c) further comprises comparing the fingerprint of each native document with a plurality of fingerprints comprised of the fingerprints for each native document to be uploaded.
5. The method for managing the plurality of native documents according to claim 1, wherein step (c) further comprises comparing the fingerprint of each native
WDC99 862289-1.053217.001 1 document with at least one fingerprint corresponding to a native document stored in the document management computer system.
6. The method for managing the plurality of native documents according to claim 4, further comprising discarding native documents that are determined to be the same in accordance with the comparison of fingerprints.
7. The method for managing the plurality of native documents according to claim 5, further comprising discarding native documents that are determined to be the same in accordance with the comparison of fingerprints.
8. The method for managing the plurality of native documents according to claim 1, wherein step (d) further comprises creating at least one data file corresponding to the extracted data for each native document.
9. The method for managing the plurality of native documents according to claim 1, wherein step (d) further comprises creating a plurality of data files corresponding to the extracted data for each native document.
10. The method for managing the plurality of native documents according to claim 9, wherein the plurality of data files includes files selected from a group consisting of a text file, a meta data file, an XML file and a HTML file.
11. The method for managing the plurality of native documents according to claim 10, wherein in step (e), a data table is created for at least one native document for defining an association with the plurality of data files.
WDC99 862289-1.053217.001 1
12. The method for managing the plurality of native documents according to claim 1, wherem in step (e), a data table is created for at least one native document for defining an association with extracted data.
13. A program product, comprising executable code transportable by at least one machine readable medium, wherein execution of the code by at least one programmable computer causes the at least one programmable computer to perform a sequence of steps, comprising the steps recited in claim 1.
14. A method for searching a plurality of native documents stored in a document management computer system having a plurality of computer nodes storing the plurality of native documents, the steps comprising: a) defining search criteria for searching the plurality of native documents; b) executing in parallel searches in accordance with the search criteria for each of the plurality of nodes, wherein each computer node scores each search result in accordance with the search criteria; c) ranking the search results in accordance with the score determined in each computer node; and d) omitting certain documents represented by the search results in accordance with a user's predefined permission level; and e) displaying final search results to a user.
15. The method for searching a plurality of native documents according to claim
14, further comprising comparing the user's predefined permission level with a document classification for each native document represented by the search results.
16. The method for searching a plurality of native documents according to claim
15, further comprising determining whether or not a user is permitted to view each native
DC99 862289-1.053217.001 1 document represented by the search results in accordance with the comparison of the user's predefined classification and the document classifications.
17. A program product, comprising executable code transportable by at least one machine readable medium, wherein execution of the code by at least one programmable computer causes the at least one programmable computer to perform a sequence of steps, comprising the steps recited by claim 14.
18. A method for managing attributes of at least one native document produced from a search of a plurality of native documents stored in a document management computer system, the steps comprising: a) defining search criteria for searching the plurality of native documents; b) executing a search in accordance with the defined search criteria; c) displaying search results; d) modifying document attributes of at least one document represented by the search results to create a user defined classification; and e) storing the user defined classification associated with the at least one document, wherein the user defined classification maintained for future searches.
19. The method for managing attributes of at least one native document according to claim 18, wherein modifying document attributes includes adding a comment to be displayed when the at least one document is later viewed.
20. The method for managing attributes of at least one native document according to claim 19, further comprising designating the comment as public so as to be displayed to users in addition to the user who authored the comment when later viewing the document.
WDC99 862289-1.053217.001 1
21. The method for managing attributes of at least one native document according o claim 19, further comprising designating the comment as private so as to be displayed only to the user who authored the comment when later viewing the document.
22. The method for managing attributes of at least one native document according to claim 18, further comprising selectively sending a link to at least one document of the search results to another user.
23. The method for managing attributes of at least one native document according to claim 18, wherein modifying document attributes includes selectively categorizing the at least one document represented by the search results.
24. The method for managing attributes of at least one native document according to claim 18, wherein modifying document attributes includes selectively sending a link to the at least one document represented by the search results to a user.
25. A method for searching a plurality of native documents stored in a document management computer system, the steps comprising: a) defining search criteria for searching the plurality of native documents; b) executing a search in accordance with the defined search criteria; c) displaying search results as links to data files representative of associated native documents; and d) selectively viewing a native document represented by at least one link of the search results displayed to the user.
WDC99 862289-1.053217.001 1
26. The method for searching a plurality of native documents stored in a document management computer system according to claim 25, wherein the native document is downloaded to a user interface that sent a request to selectively view the native document.
27. A method for producing search results of a plurality of native documents stored in a computer system in accordance with a user-defined search query, comprising: a) providing at least one server in communication with the computer system for storing the plurality of native documents to be searched; b) receiving the user-defined search query; c) sending a search query to the computer system in accordance with the user- defined search query; d) based on results of step (c), receiving search results from the computer system corresponding to the user-defined search query; e) attributing at least one user defined classification to at least one document represented by the search results received in step (d), wherein the user defined classification is displayed when the at least one document is later viewed.
28. A method for producing search results of a plurality of native documents stored in a computer system in accordance with a user-defined search query comprising: a) providing a Website hosted by a server interfacing with the computer system and a user connected via a user interface over a communication network; b) under control of the user interface, displaying the search results of the plurality of native documents in accordance with the user-defined search query; and c) in response to at least one user-defined classification selected by the user, attributing the user-defined classification to at least one native document represented by the search results, wherein the user-defined attribute is displayed when the link representing the at least one native document is later viewed.
VVDC99 862289-1.053217.001 1
29. An electronic document management system comprising: a plurality of computer nodes for storing a plurality of native documents; and a computer in communication with the plurality of computer nodes for receiving a plurality of input files to be uploaded to the plurality of computer nodes, wherein the computer is configured to determine the type of native document for each of the plurality of input files, to assign a unique identification tag to each native document, and to eliminate duplicate native documents based on the unique identification tags, for producing a subset of input files to be uploaded to the plurality of computer nodes, wherein the subset of input files are distributed substantially equally amongst the plurality of computer nodes.
30. The electronic document management system according to claim 29, wherein the computer is further configured to extract data from each native document.
31. An electronic document management system according to claim 30, wherein the computer creates a text file corresponding to the extracted data.
32. An electronic document management system according to claim 29, wherein the computer creates a data file selected from a group consisting of a text file, a meta data file, a XML file, and a HTML file.
33. An electronic document management system according to claim 29, wherein the subset of input files and associated data extracted therefrom are distributed substantially equally amongst the plurality of computer nodes.
34. An electronic document management system comprising a PC type computer connected in a parallel cluster, said computer using an operating system that stores electronic documents in a hard disk drive throughout the cluster, said operating system defining a
WDC99 862289-1.053217.001 1 document identification tag where each document is identified by its files extension that is converted to ASCII text and given a unique identification number, each of a plurality of documents having at least one of either meta-data, text or attachments identified for retrieval that are indexed for web-based retrieval from the cluster database, said identification of the plurality of documents forming a cluster data base that is web-searchable by use of a predetermined descriptive term.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US43850803P | 2003-01-08 | 2003-01-08 | |
US60/438,508 | 2003-01-08 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2004063863A2 true WO2004063863A2 (en) | 2004-07-29 |
WO2004063863A3 WO2004063863A3 (en) | 2005-03-24 |
Family
ID=32713338
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2004/000168 WO2004063863A2 (en) | 2003-01-08 | 2004-01-07 | Document management apparatus, system and method |
Country Status (2)
Country | Link |
---|---|
US (1) | US20040187075A1 (en) |
WO (1) | WO2004063863A2 (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2375347A3 (en) * | 2005-11-28 | 2012-12-19 | Commvault Systems, Inc. | Systems and methods for classifying and transferring information in a storage network |
US9639529B2 (en) | 2006-12-22 | 2017-05-02 | Commvault Systems, Inc. | Method and system for searching stored data |
US9996430B2 (en) | 2005-12-19 | 2018-06-12 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US10372675B2 (en) | 2011-03-31 | 2019-08-06 | Commvault Systems, Inc. | Creating secondary copies of data based on searches for content |
US10372672B2 (en) | 2012-06-08 | 2019-08-06 | Commvault Systems, Inc. | Auto summarization of content |
US10540516B2 (en) | 2016-10-13 | 2020-01-21 | Commvault Systems, Inc. | Data protection within an unsecured storage environment |
US10642886B2 (en) | 2018-02-14 | 2020-05-05 | Commvault Systems, Inc. | Targeted search of backup data using facial recognition |
US10708353B2 (en) | 2008-08-29 | 2020-07-07 | Commvault Systems, Inc. | Method and system for displaying similar email messages based on message contents |
US10783129B2 (en) | 2006-10-17 | 2020-09-22 | Commvault Systems, Inc. | Method and system for offline indexing of content and classifying stored data |
US10984041B2 (en) | 2017-05-11 | 2021-04-20 | Commvault Systems, Inc. | Natural language processing integrated with database and data storage management |
US11159469B2 (en) | 2018-09-12 | 2021-10-26 | Commvault Systems, Inc. | Using machine learning to modify presentation of mailbox objects |
US11442820B2 (en) | 2005-12-19 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US11494417B2 (en) | 2020-08-07 | 2022-11-08 | Commvault Systems, Inc. | Automated email classification in an information management system |
Families Citing this family (57)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070055582A1 (en) | 1996-11-12 | 2007-03-08 | Hahn-Carlson Dean W | Transaction processing with core and distributor processor implementations |
US8396811B1 (en) | 1999-02-26 | 2013-03-12 | Syncada Llc | Validation approach for auditing a vendor-based transaction |
US8392285B2 (en) | 1996-11-12 | 2013-03-05 | Syncada Llc | Multi-supplier transaction and payment programmed processing approach with at least one supplier |
US20080172314A1 (en) | 1996-11-12 | 2008-07-17 | Hahn-Carlson Dean W | Financial institution-based transaction processing system and approach |
US7761427B2 (en) * | 2003-04-11 | 2010-07-20 | Cricket Technologies, Llc | Method, system, and computer program product for processing and converting electronically-stored data for electronic discovery and support of litigation using a processor-based device located at a user-site |
US8954420B1 (en) | 2003-12-31 | 2015-02-10 | Google Inc. | Methods and systems for improving a search ranking using article information |
US20050149498A1 (en) * | 2003-12-31 | 2005-07-07 | Stephen Lawrence | Methods and systems for improving a search ranking using article information |
US20050177555A1 (en) * | 2004-02-11 | 2005-08-11 | Alpert Sherman R. | System and method for providing information on a set of search returned documents |
US8161053B1 (en) | 2004-03-31 | 2012-04-17 | Google Inc. | Methods and systems for eliminating duplicate events |
US7581227B1 (en) | 2004-03-31 | 2009-08-25 | Google Inc. | Systems and methods of synchronizing indexes |
US8346777B1 (en) | 2004-03-31 | 2013-01-01 | Google Inc. | Systems and methods for selectively storing event data |
US8631076B1 (en) | 2004-03-31 | 2014-01-14 | Google Inc. | Methods and systems for associating instant messenger events |
US8275839B2 (en) | 2004-03-31 | 2012-09-25 | Google Inc. | Methods and systems for processing email messages |
US20050223027A1 (en) * | 2004-03-31 | 2005-10-06 | Lawrence Stephen R | Methods and systems for structuring event data in a database for location and retrieval |
US7412708B1 (en) | 2004-03-31 | 2008-08-12 | Google Inc. | Methods and systems for capturing information |
US7941439B1 (en) | 2004-03-31 | 2011-05-10 | Google Inc. | Methods and systems for information capture |
US8386728B1 (en) | 2004-03-31 | 2013-02-26 | Google Inc. | Methods and systems for prioritizing a crawl |
US8099407B2 (en) | 2004-03-31 | 2012-01-17 | Google Inc. | Methods and systems for processing media files |
US7680888B1 (en) | 2004-03-31 | 2010-03-16 | Google Inc. | Methods and systems for processing instant messenger messages |
US7725508B2 (en) | 2004-03-31 | 2010-05-25 | Google Inc. | Methods and systems for information capture and retrieval |
US7333976B1 (en) | 2004-03-31 | 2008-02-19 | Google Inc. | Methods and systems for processing contact information |
US7254588B2 (en) * | 2004-04-26 | 2007-08-07 | Taiwan Semiconductor Manufacturing Company, Ltd. | Document management and access control by document's attributes for document query system |
CA2569338A1 (en) | 2004-06-09 | 2005-12-29 | U.S. Bancorp Licensing, Inc. | Financial institution-based transaction processing system and approach |
US7574386B2 (en) | 2004-06-09 | 2009-08-11 | U.S. Bank National Association | Transaction accounting auditing approach and system therefor |
US7925551B2 (en) * | 2004-06-09 | 2011-04-12 | Syncada Llc | Automated transaction processing system and approach |
US8762238B2 (en) | 2004-06-09 | 2014-06-24 | Syncada Llc | Recurring transaction processing system and approach |
CA2569346A1 (en) | 2004-06-09 | 2005-12-29 | U.S. Bancorp Licensing, Inc. | Order-resource fulfillment and management system and approach |
US20060041503A1 (en) * | 2004-08-21 | 2006-02-23 | Blair William R | Collaborative negotiation methods, systems, and apparatuses for extended commerce |
JP4421502B2 (en) * | 2005-03-25 | 2010-02-24 | 株式会社東芝 | Document management system |
CN100470544C (en) * | 2005-05-24 | 2009-03-18 | 国际商业机器公司 | Method, equipment and system for chaiming file |
US9262446B1 (en) | 2005-12-29 | 2016-02-16 | Google Inc. | Dynamically ranking entries in a personal data book |
US8712884B2 (en) | 2006-10-06 | 2014-04-29 | Syncada Llc | Transaction finance processing system and approach |
US7840537B2 (en) * | 2006-12-22 | 2010-11-23 | Commvault Systems, Inc. | System and method for storing redundant information |
US7680765B2 (en) * | 2006-12-27 | 2010-03-16 | Microsoft Corporation | Iterate-aggregate query parallelization |
US7962452B2 (en) * | 2007-12-28 | 2011-06-14 | International Business Machines Corporation | Data deduplication by separating data from meta data |
US8751337B2 (en) | 2008-01-25 | 2014-06-10 | Syncada Llc | Inventory-based payment processing system and approach |
US8577894B2 (en) * | 2008-01-25 | 2013-11-05 | Chacha Search, Inc | Method and system for access to restricted resources |
US20090240628A1 (en) * | 2008-03-20 | 2009-09-24 | Co-Exprise, Inc. | Method and System for Facilitating a Negotiation |
US8240554B2 (en) | 2008-03-28 | 2012-08-14 | Keycorp | System and method of financial instrument processing with duplicate item detection |
US20090300527A1 (en) * | 2008-06-02 | 2009-12-03 | Microsoft Corporation | User interface for bulk operations on documents |
US8832034B1 (en) | 2008-07-03 | 2014-09-09 | Riverbed Technology, Inc. | Space-efficient, revision-tolerant data de-duplication |
US8370309B1 (en) | 2008-07-03 | 2013-02-05 | Infineta Systems, Inc. | Revision-tolerant data de-duplication |
US8078593B1 (en) * | 2008-08-28 | 2011-12-13 | Infineta Systems, Inc. | Dictionary architecture and methodology for revision-tolerant data de-duplication |
US8620778B2 (en) * | 2009-01-20 | 2013-12-31 | Microsoft Corporation | Document vault and application platform |
US8229909B2 (en) * | 2009-03-31 | 2012-07-24 | Oracle International Corporation | Multi-dimensional algorithm for contextual search |
US8166261B1 (en) | 2009-03-31 | 2012-04-24 | Symantec Corporation | Systems and methods for seeding a fingerprint cache for data deduplication |
US8407186B1 (en) * | 2009-03-31 | 2013-03-26 | Symantec Corporation | Systems and methods for data-selection-specific data deduplication |
US8060715B2 (en) | 2009-03-31 | 2011-11-15 | Symantec Corporation | Systems and methods for controlling initialization of a fingerprint cache for data deduplication |
US8578120B2 (en) | 2009-05-22 | 2013-11-05 | Commvault Systems, Inc. | Block-level single instancing |
US8442983B2 (en) | 2009-12-31 | 2013-05-14 | Commvault Systems, Inc. | Asynchronous methods of data classification using change journals and other data structures |
US9933978B2 (en) | 2010-12-16 | 2018-04-03 | International Business Machines Corporation | Method and system for processing data |
US8332372B2 (en) * | 2010-12-16 | 2012-12-11 | International Business Machines Corporation | Method and system for processing data |
CN104462141B (en) * | 2013-09-24 | 2018-05-22 | 中国移动通信集团重庆有限公司 | Method, system and the storage engines device of a kind of data storage and inquiry |
US10324914B2 (en) | 2015-05-20 | 2019-06-18 | Commvalut Systems, Inc. | Handling user queries against production and archive storage systems, such as for enterprise customers having large and/or numerous files |
US10360264B2 (en) | 2016-04-08 | 2019-07-23 | Wmware, Inc. | Access control for user accounts using a bidirectional search approach |
US10104087B2 (en) * | 2016-04-08 | 2018-10-16 | Vmware, Inc. | Access control for user accounts using a parallel search approach |
WO2021137689A1 (en) * | 2019-12-31 | 2021-07-08 | Mimos Berhad | System for library materials classification and a method thereof |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5745900A (en) * | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing duplicate database records using a full-record fingerprint |
US6233631B1 (en) * | 1998-12-07 | 2001-05-15 | Xerox Corporation | Upload/Download of Auditron information to PC or phone line |
US20010011350A1 (en) * | 1996-07-03 | 2001-08-02 | Mahboud Zabetian | Apparatus and method for electronic document certification and verification |
US20010025287A1 (en) * | 2000-03-16 | 2001-09-27 | Toshiaki Okabe | Document integrated management apparatus and method |
US6493721B1 (en) * | 1999-03-31 | 2002-12-10 | Verizon Laboratories Inc. | Techniques for performing incremental data updates |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5444840A (en) * | 1990-06-12 | 1995-08-22 | Froessl; Horst | Multiple image font processing |
US6070191A (en) * | 1997-10-17 | 2000-05-30 | Lucent Technologies Inc. | Data distribution techniques for load-balanced fault-tolerant web access |
-
2004
- 2004-01-07 US US10/752,432 patent/US20040187075A1/en not_active Abandoned
- 2004-01-07 WO PCT/US2004/000168 patent/WO2004063863A2/en active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20010011350A1 (en) * | 1996-07-03 | 2001-08-02 | Mahboud Zabetian | Apparatus and method for electronic document certification and verification |
US5745900A (en) * | 1996-08-09 | 1998-04-28 | Digital Equipment Corporation | Method for indexing duplicate database records using a full-record fingerprint |
US6233631B1 (en) * | 1998-12-07 | 2001-05-15 | Xerox Corporation | Upload/Download of Auditron information to PC or phone line |
US6493721B1 (en) * | 1999-03-31 | 2002-12-10 | Verizon Laboratories Inc. | Techniques for performing incremental data updates |
US20010025287A1 (en) * | 2000-03-16 | 2001-09-27 | Toshiaki Okabe | Document integrated management apparatus and method |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2375347A3 (en) * | 2005-11-28 | 2012-12-19 | Commvault Systems, Inc. | Systems and methods for classifying and transferring information in a storage network |
US9606994B2 (en) | 2005-11-28 | 2017-03-28 | Commvault Systems, Inc. | Systems and methods for using metadata to enhance data identification operations |
US10198451B2 (en) | 2005-11-28 | 2019-02-05 | Commvault Systems, Inc. | Systems and methods for using metadata to enhance data identification operations |
US11256665B2 (en) | 2005-11-28 | 2022-02-22 | Commvault Systems, Inc. | Systems and methods for using metadata to enhance data identification operations |
US11442820B2 (en) | 2005-12-19 | 2022-09-13 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US9996430B2 (en) | 2005-12-19 | 2018-06-12 | Commvault Systems, Inc. | Systems and methods of unified reconstruction in storage systems |
US10783129B2 (en) | 2006-10-17 | 2020-09-22 | Commvault Systems, Inc. | Method and system for offline indexing of content and classifying stored data |
US9639529B2 (en) | 2006-12-22 | 2017-05-02 | Commvault Systems, Inc. | Method and system for searching stored data |
US11082489B2 (en) | 2008-08-29 | 2021-08-03 | Commvault Systems, Inc. | Method and system for displaying similar email messages based on message contents |
US10708353B2 (en) | 2008-08-29 | 2020-07-07 | Commvault Systems, Inc. | Method and system for displaying similar email messages based on message contents |
US11516289B2 (en) | 2008-08-29 | 2022-11-29 | Commvault Systems, Inc. | Method and system for displaying similar email messages based on message contents |
US11003626B2 (en) | 2011-03-31 | 2021-05-11 | Commvault Systems, Inc. | Creating secondary copies of data based on searches for content |
US10372675B2 (en) | 2011-03-31 | 2019-08-06 | Commvault Systems, Inc. | Creating secondary copies of data based on searches for content |
US10372672B2 (en) | 2012-06-08 | 2019-08-06 | Commvault Systems, Inc. | Auto summarization of content |
US11580066B2 (en) | 2012-06-08 | 2023-02-14 | Commvault Systems, Inc. | Auto summarization of content for use in new storage policies |
US10540516B2 (en) | 2016-10-13 | 2020-01-21 | Commvault Systems, Inc. | Data protection within an unsecured storage environment |
US11443061B2 (en) | 2016-10-13 | 2022-09-13 | Commvault Systems, Inc. | Data protection within an unsecured storage environment |
US10984041B2 (en) | 2017-05-11 | 2021-04-20 | Commvault Systems, Inc. | Natural language processing integrated with database and data storage management |
US10642886B2 (en) | 2018-02-14 | 2020-05-05 | Commvault Systems, Inc. | Targeted search of backup data using facial recognition |
US11159469B2 (en) | 2018-09-12 | 2021-10-26 | Commvault Systems, Inc. | Using machine learning to modify presentation of mailbox objects |
US11494417B2 (en) | 2020-08-07 | 2022-11-08 | Commvault Systems, Inc. | Automated email classification in an information management system |
Also Published As
Publication number | Publication date |
---|---|
WO2004063863A3 (en) | 2005-03-24 |
US20040187075A1 (en) | 2004-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20040187075A1 (en) | Document management apparatus, system and method | |
US11615101B2 (en) | Anomaly detection in data ingested to a data intake and query system | |
US11620157B2 (en) | Data ingestion pipeline anomaly detection | |
US7949660B2 (en) | Method and apparatus for searching and resource discovery in a distributed enterprise system | |
US11657057B2 (en) | Revising catalog metadata based on parsing queries | |
US9563820B2 (en) | Presentation and organization of content | |
US11409756B1 (en) | Creating and communicating data analyses using data visualization pipelines | |
US7730113B1 (en) | Network-based system and method for accessing and processing emails and other electronic legal documents that may include duplicate information | |
US9298782B2 (en) | Combinators | |
US9063976B1 (en) | Dynamic tree determination for data processing | |
US11886455B1 (en) | Networked cloud service monitoring | |
WO2021222395A1 (en) | Dual textual/graphical programming interfaces for streaming data processing pipelines | |
US20030069803A1 (en) | Method of displaying content | |
US20140282901A1 (en) | Managing shared content with a content management system | |
US8762325B2 (en) | Processing of files for electronic content management | |
US11392578B1 (en) | Automatically generating metadata for a metadata catalog based on detected changes to the metadata catalog | |
US11675816B1 (en) | Grouping evens into episodes using a streaming data processor | |
US11573955B1 (en) | Data-determinant query terms | |
US11450419B1 (en) | Medication security and healthcare privacy systems | |
US11676072B1 (en) | Interface for incorporating user feedback into training of clustering model | |
US11379670B1 (en) | Automatically populating responses using artificial intelligence | |
US11720824B1 (en) | Visualizing outliers from timestamped event data using machine learning-based models | |
US11789950B1 (en) | Dynamic storage and deferred analysis of data stream events | |
US20090019021A1 (en) | Method and apparatus for creating an index of network data for a set of messages | |
US9984108B2 (en) | Database joins using uncertain criteria |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
122 | Ep: pct application non-entry in european phase |