US20040193628A1 - Information presentation apparatus with meta-information management function - Google Patents

Information presentation apparatus with meta-information management function Download PDF

Info

Publication number
US20040193628A1
US20040193628A1 US10/819,150 US81915004A US2004193628A1 US 20040193628 A1 US20040193628 A1 US 20040193628A1 US 81915004 A US81915004 A US 81915004A US 2004193628 A1 US2004193628 A1 US 2004193628A1
Authority
US
United States
Prior art keywords
web
information
web page
page server
meta
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US10/819,150
Inventor
Shinichi Hiraiwa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to US10/819,150 priority Critical patent/US20040193628A1/en
Publication of US20040193628A1 publication Critical patent/US20040193628A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Definitions

  • the present invention concerns a meta-information presentation apparatus equipped with an information management function which allows an information collection apparatus to efficiently collect information.
  • the WWW comprises a plurality of WWW servers which provide information, and clients (termed “browsers”) which are used to access the information.
  • a single WWW server typically manages a plurality of “web pages” joined together by links.
  • a user accesses (or “surfs”) information on the WWW using a browser by following the links to different web pages.
  • a “search engine” is often used to search information in the web pages on the WWW.
  • a search engine effects a search function by using an information collection apparatus, termed a “web robot”, which collects information provided by a WWW server and then prepares an index on the collected information.
  • a web robot collects web page information by accessing all of the web pages on the WWW (managed by numerous WWW servers) one page at a time by following the links in each page. Because WWW server information is updated daily, a web robot must periodically access each WWW server to gather the information required for a search. Heretofore, information has been collected by accessing all web pages regardless of whether the content of the web page has been updated. In other words, the web robot retrieves each and every page each time it is run regardless of whether it had retrieved the page before.
  • FIG. 1 is a high level block diagram of the present invention
  • FIG. 2 is a block diagram of a preferred embodiment of the present invention.
  • FIG. 3 is an example of a meta-information table for use with the preferred embodiment of the present invention.
  • FIG. 4 is another example of a meta-information table for use with the preferred embodiment of the present invention.
  • FIG. 5 is yet another example of a meta-information table for use with the preferred embodiment of the present invention.
  • FIG. 6 is an example of a collected documents table for use with the preferred embodiment of the present invention.
  • FIG. 7 is an example of an information collection apparatus table for use with the preferred embodiment of the present invention.
  • FIG. 8 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention.
  • FIG. 9 is a flowchart of another information collection process in accordance with a preferred embodiment of the present invention.
  • FIG. 1 is a high level block diagram of the present invention.
  • the present invention generally comprises an information presentation apparatus 1 connected to an information collection apparatus 2 and a client application 3 via a network 4 .
  • the information presentation apparatus 1 generally comprises: a meta-information management unit 1 a , a meta-information table 1 b which stores update information pertaining to documents managed by the information presentation apparatus 1 ; a document storage unit 1 d that stores the documents; and an information collection table 1 c .
  • the meta-information management unit 1 a manages document update information for documents stored in the document storage unit 1 d and references the meta-information table 1 b when an information collection request is made by the information collection apparatus 2 .
  • Update information pertaining to documents stored in the meta-information table 1 b may include: the time of document updating, version information, and serial numbers indicating an update sequence.
  • the meta-information table 1 b indicates which documents have been modified and at which time the documents were so modified.
  • the term “modify” refers to the creation, updating or deletion of a document.
  • the meta-information management unit 1 a generates and transmits a stored document list (including update information or, more simply, a list of documents updated since the previous request) in response to an information collection request from the information collection apparatus 2 .
  • the information collection table 1 c registers the name of each information collection apparatus 2 which issues information collection requests. Using the names of information collection apparatus 2 registered in the information collection table 1 c , the meta-information management unit 1 a may request each information collection apparatus 2 to issue an information collection request when there is any change in meta-information table 1 b . In other words, the information collection table 1 c allows information collection to be carried more efficiently by allowing the meta-information management unit 1 a to request registered information collection apparatus 2 to collect information when there is any change in the meta-information table 1 b.
  • the update information in the meta-information table 1 b allows information to be collected efficiently by the information collection apparatus 2 .
  • the information collection apparatus 2 no longer carries out an information collection process for information that has not been updated (i.e., information and documents already collected by the information collection apparatus 2 ) thereby relieving a large burden on the information presentation apparatus 1 .
  • a search engine based on the information collection apparatus 2 can also provide newer information to the user.
  • FIG. 2 is a block diagram of a preferred embodiment of the present invention.
  • the preferred system generally comprises an information presentation apparatus 11 (such as a WWW server), an information collection apparatus 12 (such as a web robot) and a client application 13 (such as a “web browser”).
  • the information presentation apparatus 11 , the information collection apparatus 12 , and the client application 13 are connected via a network 14 (such as the Internet or an Intranet).
  • the information presentation apparatus 11 generally comprises a document storage unit 21 (such as a hard disk, magneto-optical disk or CD-ROM), a meta-information management unit 22 (typically embodied in software), a meta-information storage unit 23 (such as a hard disk, magneto-optical disk or CD-ROM) having a meta-information table 23 a (typically formed in software), an information collection apparatus storage unit 24 (such as a hard disk, magneto-optical disk or CD-ROM) having an information collection table 24 a (typically formed in software), and a data transmission/reception unit 25 (typically comprising a network adaptor, modem and/or associated software).
  • a document storage unit 21 such as a hard disk, magneto-optical disk or CD-ROM
  • a meta-information management unit 22 typically embodied in software
  • a meta-information storage unit 23 such as a hard disk, magneto-optical disk or CD-ROM having a meta-information table 23 a (typically formed in software)
  • the meta-information management unit 22 accesses a document saved in the document storage unit 21 and saves the document name along with document update information in the meta-information table 23 a of the meta-information storage unit 23 .
  • FIG. 3, FIG. 4, and FIG. 5 are examples of the meta-information table 23 a for use with the preferred embodiment of the present invention.
  • FIG. 3 shows an example in which the meta-information table 23 a is generated periodically and in which the update information comprises a date and time.
  • the update date/time of each document 1 , 2 , . . . is checked at each check date, and the update date/time of each document is written into the meta-information table 23 a.
  • FIG. 4 shows an example in which the meta-information table 23 a is generated periodically and in which update information comprises version information pertaining to individual documents.
  • the version information pertaining to each document 1 , 2 , . . . is checked at each check date, and written into the meta-information table 23 a.
  • FIG. 5 shows an example in which the meta-information table 23 a is modified when a document has been updated.
  • File names associated with the serial numbers and a differentiation (for example: updated/deleted/new) are written into the meta-information table 23 a in order of update date/time sequence (as indicated by the serial number).
  • the meta-information table 23 a forms a modification history of the documents on said document storage unit 21 .
  • the data transmission/reception units 25 process of transmission and reception of data with the information collection apparatus 12 and the client application 13 will be explained.
  • the client application 13 makes an information acquisition request (such as for a web page) to the information presentation apparatus 11
  • the data transmission/reception unit 25 retrieves the relevant information from document storage unit 21 and sends the retrieved information to the client application 13 .
  • the information collection apparatus 12 makes a collection request to the information presentation apparatus 11 , the time (if the meta-information table 23 a as shown in FIG. 3 is used) when the previous collection was carried out is also specified.
  • the data transmission/reception unit 25 receives and formats the request to the meta-information management unit 22 .
  • the meta-information management unit 22 searches the meta-information table 23 a for documents modified since the specified time and generates a collected documents table.
  • the data transmission/reception unit 25 returns the created collected documents table to the information collection apparatus 12 .
  • FIG. 6 is an example of a collected documents table for use with the preferred embodiment of the present invention.
  • the collected documents table registers updated documents, deleted documents, and newly created documents.
  • FIG. 7 is an example of an optional information collection apparatus table 24 a for use with the preferred embodiment of the present invention.
  • the name of information collection apparatuses 12 may be registered in the information collection table 24 a of the information collection apparatus storage unit 24 .
  • the information presentation apparatus 11 requests registered information collection apparatus 12 to issue a collection request to collect information when the meta-information table 23 a has been modified. This can further decrease network traffic in that Web robots only accesses WWW servers when informed that new information is present.
  • FIG. 8 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention.
  • This particular process utilizes the meta-information table 23 a shown in FIG. 3.
  • the process starts in step S 0 .
  • the information collection apparatus 12 makes a collection request to the information presentation apparatus 11 , specifying a time T when a previous collection was made. This request is issued periodically at set intervals or at the request of the information presentation apparatus 11 , as discussed above.
  • the data transmission/reception unit 25 receives the request, it transmits the request to the meta-information management unit 22 .
  • step S 2 the meta-information management unit 22 checks whether information received prior to the time T remains in meta-information table 23 a (see FIG. 3). If such information does not remain, the process goes to step S 3 and the meta-information management unit 22 acquires all documents on behalf of information collection apparatus 12 .
  • Step S 4 If information received prior to the time T remains in the meta-information table 23 a , the process goes to Step S 4 and “I” is set to equal 1. Next, in Step S 5 , the collected documents table (shown in FIG. 6) is generated.
  • Step S 6 assuming a number of documents N, a check is made as to whether I ⁇ N (i.e., to determine if all documents have been processed). If I ⁇ N, the process goes to Step S 7 and after the meta-information table 23 a is referenced, a check is made as to whether document I has been modified since time T (Step S 7 ). If document I has not been modified since time T, the process goes to Step S 9 and I is incremented to I+1. Thereafter, the process repeats from Step S 6 .
  • I ⁇ N i.e., to determine if all documents have been processed. If I ⁇ N, the process goes to Step S 7 and after the meta-information table 23 a is referenced, a check is made as to whether document I has been modified since time T (Step S 7 ). If document I has not been modified since time T, the process goes to Step S 9 and I is incremented to I+1. Thereafter, the process repeats from Step S 6 .
  • Step S 6 When I becomes greater than N in Step S 6 , (all documents have been processed), the process goes to Step S 10 and the data transmission/reception unit 25 returns the completed collected documents table to information collection apparatus 12 . The process ends in Step S 11 . Thereafter, based on the collected documents table sent from the information presentation apparatus 11 , the information collection apparatus 12 acquires only the required updated or new documents.
  • the flowchart in FIG. 8 pertains to a situation in which the meta-information table 23 a as shown in FIG. 3 was used.
  • a collected documents table can also be generated using a similar process when version information of individual documents, as shown in the meta-information table 23 a in FIG. 4 is used.
  • FIG. 9 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention.
  • This particular process utilizes the meta-information table 23 a shown in FIG. 5.
  • the process starts in Step S 100 .
  • the information collection apparatus 12 makes a collection request, to the information presentation apparatus 11 , specifying a serial number A for the previous collection.
  • the data transmission/reception unit 25 receives the request, it transmits the request to the meta-information management unit 22 .
  • Step S 102 the meta-information management unit 22 checks whether information received prior to the modification indicated by serial number A remains in the meta-information table 23 a (see FIG. 5). If such information does not remain, the process goes to Step S 103 and the meta-information management unit 22 responds by acquiring all documents on behalf of the information collection apparatus 12 .
  • Step S 102 If, in Step S 102 , information received prior to the modification indicated by serial number A remains in the meta-information table 23 a the process goes to Step S 104 and “I” is set to equal 1. Thereafter, in Step S 105 , the collected documents table (shown in FIG. 6) is generated.
  • Step S 106 assuming a number of documents N, a check is made as to whether I ⁇ N, and if I ⁇ N the process goes to Step S 107 .
  • Step S 106 When in Step S 106 , I becomes greater than N, the process goes to Step S 110 and the data transmission/reception unit 25 returns the generated collected documents table to the information collection apparatus 12 . The process ends in Step S 111 . Thereafter, based on the collected documents table sent from the information presentation apparatus 11 , the information collection apparatus 12 acquires only the documents which are new or updated.
  • the collected documents table When, in either foregoing processes (FIG. 8 or FIG. 9), the collected documents table is returned, the collected documents table may be compressed, thereby decreasing transmission time to the information collection apparatus 12 .
  • the up-to-date documents stored in the document storage unit 21 may be sent to the information collection apparatus 12 together with the collected documents table.
  • a meta-information management function is provided in an information presentation apparatus.
  • the meta-information enables an information collection apparatus to carry out an information collection process efficiently by indicating which documents have been modified since a previous request.
  • the burden on the information presentation apparatus as well as on a network is alleviated. Consequently, the number of information updates can be increased, allowing searches to be carried out with newer data.

Abstract

A high-speed information collection utilizing a meta-information management unit which manages update information pertaining to documents which are retrieved by web robots. An information presentation apparatus, an information collection apparatus including a web robot, and a client are connected via a network. The information presentation apparatus presents document information stored in a document storage unit to an information collection apparatus. The information presentation apparatus has a meta-information management unit, which generates update information pertaining to individual documents, and a meta-information table which records the update information. When the web robot makes a collection request to the information presentation apparatus, the meta-information management unit references the meta-information table and generates a list of updated documents from all of the stored documents and/or collection targets, and presents the list to the web which subsequently retrieves only the updated documents.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • The present application claims priority under 35 USC §119 from Japanese Patent Publication No. 10-008416, the disclosure of which is incorporated herein by reference. The present application is a continuation of U.S. application Ser. No. 09/127,954, filed Aug. 3, 1998, now pending, the disclosure of which is incorporated herein by reference. [0001]
  • BACKGROUND OF THE INVENTION
  • The present invention concerns a meta-information presentation apparatus equipped with an information management function which allows an information collection apparatus to efficiently collect information. [0002]
  • In recent years, a decentralized hypertext system known as the World Wide Web (“WWW”) has become popular and has proliferated rapidly. The main, public portion of the WWW is carried by the increasingly popular Internet, but smaller private subsets may be formed in LANs and are called Intranets. The growth of the public WWW has been exponential, such that an enormous amount of information is now presented on the WWW. The WWW comprises a plurality of WWW servers which provide information, and clients (termed “browsers”) which are used to access the information. A single WWW server typically manages a plurality of “web pages” joined together by links. A user accesses (or “surfs”) information on the WWW using a browser by following the links to different web pages. [0003]
  • A “search engine” is often used to search information in the web pages on the WWW. A search engine effects a search function by using an information collection apparatus, termed a “web robot”, which collects information provided by a WWW server and then prepares an index on the collected information. [0004]
  • Typically, a web robot collects web page information by accessing all of the web pages on the WWW (managed by numerous WWW servers) one page at a time by following the links in each page. Because WWW server information is updated daily, a web robot must periodically access each WWW server to gather the information required for a search. Heretofore, information has been collected by accessing all web pages regardless of whether the content of the web page has been updated. In other words, the web robot retrieves each and every page each time it is run regardless of whether it had retrieved the page before. [0005]
  • When a web robot sequentially accesses all the web pages that a WWW server manages, a web robot places a large burden on the WWW server by continuously connecting to and accessing WWW pages on the WWW server. At the same time, the web robot collects a great quantity of information and therefore causes increased network traffic. Additionally, when the server stores a great number of WWW pages, an enormous amount of time is required to cycle through all of the web pages, causing a delay in updating the data used by the search engine. Thus, depending on the search engine, it is impossible to search the most recent information. [0006]
  • SUMMARY OF THE INVENTION
  • It is an object of the present invention to alleviate the burden on information presentation apparatus and associated networks while affording information collection at a high speed by furnishing a meta-information management unit which manages update information on documents in an information presentation apparatus. [0007]
  • Additional objects and advantages of the invention will be set forth in part in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the invention.[0008]
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • These and other objects and advantages of the invention will become apparent and more readily appreciated from the following description of the preferred embodiments, taken in conjunction with the accompanying drawings of which: [0009]
  • FIG. 1 is a high level block diagram of the present invention; [0010]
  • FIG. 2 is a block diagram of a preferred embodiment of the present invention; [0011]
  • FIG. 3 is an example of a meta-information table for use with the preferred embodiment of the present invention; [0012]
  • FIG. 4 is another example of a meta-information table for use with the preferred embodiment of the present invention; [0013]
  • FIG. 5 is yet another example of a meta-information table for use with the preferred embodiment of the present invention; [0014]
  • FIG. 6 is an example of a collected documents table for use with the preferred embodiment of the present invention; [0015]
  • FIG. 7 is an example of an information collection apparatus table for use with the preferred embodiment of the present invention; [0016]
  • FIG. 8 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention; and [0017]
  • FIG. 9 is a flowchart of another information collection process in accordance with a preferred embodiment of the present invention.[0018]
  • DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Reference will now be made in detail to the present preferred embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to like elements throughout. [0019]
  • FIG. 1 is a high level block diagram of the present invention. The present invention generally comprises an [0020] information presentation apparatus 1 connected to an information collection apparatus 2 and a client application 3 via a network 4.
  • The [0021] information presentation apparatus 1 generally comprises: a meta-information management unit 1 a, a meta-information table 1 b which stores update information pertaining to documents managed by the information presentation apparatus 1; a document storage unit 1 d that stores the documents; and an information collection table 1 c. The meta-information management unit 1 a manages document update information for documents stored in the document storage unit 1 d and references the meta-information table 1 b when an information collection request is made by the information collection apparatus 2.
  • Update information pertaining to documents stored in the meta-information table [0022] 1 b may include: the time of document updating, version information, and serial numbers indicating an update sequence. In general, the meta-information table 1 b indicates which documents have been modified and at which time the documents were so modified. The term “modify” refers to the creation, updating or deletion of a document. The meta-information management unit 1 a generates and transmits a stored document list (including update information or, more simply, a list of documents updated since the previous request) in response to an information collection request from the information collection apparatus 2.
  • The information collection table [0023] 1 c registers the name of each information collection apparatus 2 which issues information collection requests. Using the names of information collection apparatus 2 registered in the information collection table 1 c, the meta-information management unit 1 a may request each information collection apparatus 2 to issue an information collection request when there is any change in meta-information table 1 b. In other words, the information collection table 1 c allows information collection to be carried more efficiently by allowing the meta-information management unit 1 a to request registered information collection apparatus 2 to collect information when there is any change in the meta-information table 1 b.
  • In the present invention, the update information in the meta-information table [0024] 1 b allows information to be collected efficiently by the information collection apparatus 2. The information collection apparatus 2 no longer carries out an information collection process for information that has not been updated (i.e., information and documents already collected by the information collection apparatus 2) thereby relieving a large burden on the information presentation apparatus 1. Additionally, because the effectiveness of information collection is raised, a search engine based on the information collection apparatus 2 can also provide newer information to the user.
  • FIG. 2 is a block diagram of a preferred embodiment of the present invention. The preferred system generally comprises an information presentation apparatus [0025] 11 (such as a WWW server), an information collection apparatus 12 (such as a web robot) and a client application 13 (such as a “web browser”). The information presentation apparatus 11, the information collection apparatus 12, and the client application 13 are connected via a network 14 (such as the Internet or an Intranet).
  • The [0026] information presentation apparatus 11 generally comprises a document storage unit 21 (such as a hard disk, magneto-optical disk or CD-ROM), a meta-information management unit 22 (typically embodied in software), a meta-information storage unit 23 (such as a hard disk, magneto-optical disk or CD-ROM) having a meta-information table 23 a (typically formed in software), an information collection apparatus storage unit 24 (such as a hard disk, magneto-optical disk or CD-ROM) having an information collection table 24 a (typically formed in software), and a data transmission/reception unit 25 (typically comprising a network adaptor, modem and/or associated software).
  • During normal operation, periodically, or when a document has been updated, the meta-[0027] information management unit 22 accesses a document saved in the document storage unit 21 and saves the document name along with document update information in the meta-information table 23 a of the meta-information storage unit 23.
  • FIG. 3, FIG. 4, and FIG. 5 are examples of the meta-information table [0028] 23 a for use with the preferred embodiment of the present invention.
  • FIG. 3 shows an example in which the meta-information table [0029] 23 a is generated periodically and in which the update information comprises a date and time. The update date/time of each document 1, 2, . . . is checked at each check date, and the update date/time of each document is written into the meta-information table 23 a.
  • FIG. 4 shows an example in which the meta-information table [0030] 23 a is generated periodically and in which update information comprises version information pertaining to individual documents. The version information pertaining to each document 1, 2, . . . is checked at each check date, and written into the meta-information table 23 a.
  • FIG. 5 shows an example in which the meta-information table [0031] 23 a is modified when a document has been updated. File names associated with the serial numbers and a differentiation (for example: updated/deleted/new) are written into the meta-information table 23 a in order of update date/time sequence (as indicated by the serial number). Thus, in effect the meta-information table 23 a forms a modification history of the documents on said document storage unit 21.
  • Referring once again to FIG. 2, the data transmission/[0032] reception units 25 process of transmission and reception of data with the information collection apparatus 12 and the client application 13 will be explained. When the client application 13 makes an information acquisition request (such as for a web page) to the information presentation apparatus 11, the data transmission/reception unit 25 retrieves the relevant information from document storage unit 21 and sends the retrieved information to the client application 13.
  • On the other hand, when the [0033] information collection apparatus 12 makes a collection request to the information presentation apparatus 11, the time (if the meta-information table 23 a as shown in FIG. 3 is used) when the previous collection was carried out is also specified. The data transmission/reception unit 25 receives and formats the request to the meta-information management unit 22. The meta-information management unit 22 searches the meta-information table 23 a for documents modified since the specified time and generates a collected documents table. The data transmission/reception unit 25 returns the created collected documents table to the information collection apparatus 12.
  • FIG. 6 is an example of a collected documents table for use with the preferred embodiment of the present invention. The collected documents table registers updated documents, deleted documents, and newly created documents. [0034]
  • FIG. 7 is an example of an optional information collection apparatus table [0035] 24 a for use with the preferred embodiment of the present invention. The name of information collection apparatuses 12 may be registered in the information collection table 24 a of the information collection apparatus storage unit 24. The information presentation apparatus 11 requests registered information collection apparatus 12 to issue a collection request to collect information when the meta-information table 23 a has been modified. This can further decrease network traffic in that Web robots only accesses WWW servers when informed that new information is present.
  • FIG. 8 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention. This particular process utilizes the meta-information table [0036] 23 a shown in FIG. 3. The process starts in step S0. In step S1, the information collection apparatus 12 makes a collection request to the information presentation apparatus 11, specifying a time T when a previous collection was made. This request is issued periodically at set intervals or at the request of the information presentation apparatus 11, as discussed above. When the data transmission/reception unit 25 receives the request, it transmits the request to the meta-information management unit 22.
  • In step S[0037] 2 the meta-information management unit 22 checks whether information received prior to the time T remains in meta-information table 23 a (see FIG. 3). If such information does not remain, the process goes to step S3 and the meta-information management unit 22 acquires all documents on behalf of information collection apparatus 12.
  • If information received prior to the time T remains in the meta-information table [0038] 23 a, the process goes to Step S4 and “I” is set to equal 1. Next, in Step S5, the collected documents table (shown in FIG. 6) is generated.
  • Thereafter, in Step S[0039] 6, assuming a number of documents N, a check is made as to whether I≦N (i.e., to determine if all documents have been processed). If I≦N, the process goes to Step S7 and after the meta-information table 23 a is referenced, a check is made as to whether document I has been modified since time T (Step S7). If document I has not been modified since time T, the process goes to Step S9 and I is incremented to I+1. Thereafter, the process repeats from Step S6.
  • If, in Step S[0040] 7, document I has been modified since time T, the process goes to Step S8 and a reference to document I is added to the collected documents table. Thereafter, the process goes to Step S9 (I=I+1) and returns to Step S6.
  • When I becomes greater than N in Step S[0041] 6, (all documents have been processed), the process goes to Step S10 and the data transmission/reception unit 25 returns the completed collected documents table to information collection apparatus 12. The process ends in Step S11. Thereafter, based on the collected documents table sent from the information presentation apparatus 11, the information collection apparatus 12 acquires only the required updated or new documents.
  • The flowchart in FIG. 8 pertains to a situation in which the meta-information table [0042] 23 a as shown in FIG. 3 was used. However, a collected documents table can also be generated using a similar process when version information of individual documents, as shown in the meta-information table 23 a in FIG. 4 is used.
  • FIG. 9 is a flowchart of an information collection process in accordance with a preferred embodiment of the present invention. This particular process utilizes the meta-information table [0043] 23 a shown in FIG. 5. The process starts in Step S100. In Step S101, the information collection apparatus 12 makes a collection request, to the information presentation apparatus 11, specifying a serial number A for the previous collection. When the data transmission/reception unit 25 receives the request, it transmits the request to the meta-information management unit 22.
  • In Step S[0044] 102, the meta-information management unit 22 checks whether information received prior to the modification indicated by serial number A remains in the meta-information table 23 a (see FIG. 5). If such information does not remain, the process goes to Step S103 and the meta-information management unit 22 responds by acquiring all documents on behalf of the information collection apparatus 12.
  • If, in Step S[0045] 102, information received prior to the modification indicated by serial number A remains in the meta-information table 23 a the process goes to Step S104 and “I” is set to equal 1. Thereafter, in Step S105, the collected documents table (shown in FIG. 6) is generated.
  • In Step S[0046] 106, assuming a number of documents N, a check is made as to whether I≦N, and if I≦N the process goes to Step S107. In step S107, the meta-information table 23 a is referenced and a check is made as to whether a document I has been modified since the modification indicated by serial number A. If document I has not been modified since modification indicated by serial number A, the process goes to Step S109 and I is incremented to I+1. The process then returns to Step S106. If, in step S107, document I has been modified since the modification indicated by serial number A, the process goes to Step S108 and document I is added to the collected documents table. Thereafter, the process goes to Step S109 (I=I+1) and the process returns to Step S106.
  • When in Step S[0047] 106, I becomes greater than N, the process goes to Step S110 and the data transmission/reception unit 25 returns the generated collected documents table to the information collection apparatus 12. The process ends in Step S111. Thereafter, based on the collected documents table sent from the information presentation apparatus 11, the information collection apparatus 12 acquires only the documents which are new or updated.
  • When, in either foregoing processes (FIG. 8 or FIG. 9), the collected documents table is returned, the collected documents table may be compressed, thereby decreasing transmission time to the [0048] information collection apparatus 12. In addition, when the collected documents table is returned to the information collection apparatus 12, the up-to-date documents stored in the document storage unit 21 may be sent to the information collection apparatus 12 together with the collected documents table.
  • Although a preferred embodiments of the present invention have been shown and described along with some possible variations, it will be appreciated by those skilled in the art that further changes and variations may be used without departing from the principles and spirit of the invention, the scope of which is defined in the claims and their equivalents. [0049]
  • Generally, in the present invention, a meta-information management function is provided in an information presentation apparatus. The meta-information enables an information collection apparatus to carry out an information collection process efficiently by indicating which documents have been modified since a previous request. The burden on the information presentation apparatus as well as on a network is alleviated. Consequently, the number of information updates can be increased, allowing searches to be carried out with newer data. [0050]

Claims (16)

What is claimed is:
1. A web page server comprising:
a document storage unit that stores web pages;
a meta-information table that stores information including a status of each web page stored in the document storage unit;
an information collection table that stores names of web robots that monitor the content of the web page server; and
management software that provides the following functions:
monitoring modification of the web pages in the document storage unit;
updating the meta-information table based on the monitoring function;
creating, in response to a collection request by the web robots, a list of web pages in the document storage that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server; and
retrieving the names of web robots from the information collection table when the monitoring function detects modification of web pages and transmitting a message to each web robot indicating that the web robot should issue a collection request.
2. A web page server, as set forth in claim 1, wherein the monitoring function stores a date and time of modification in the meta-information table when a modification of a web page is detected.
3. A web page server, as set forth in claim 1, wherein the monitoring function stores a current version of each web page in the meta-information table.
4. A web page server, as set forth in claim 1, wherein the monitoring function stores a modification history of the web pages.
5. A web page server, as set forth in claim 4, wherein each entry in the modification history is provided with a serial number indicating an order in the modification history.
6. A web page server, as set forth in claim 1, wherein the list includes names of modified web pages and an indication of the status of each modified web page.
7. A web page server, as set forth in claim 1, wherein the monitoring function monitors for the update, creation or deletion of web pages.
8. An information system comprising:
a client that requests information including web pages;
an information collection apparatus that uses a web robot to retrieve web pages on a web page server, indexes the retrieved web pages, and provides a search facility for the client; and
a web page server comprising:
a document storage unit that stores web pages for access by the client and the web robot;
a meta-information table that stores information including a status of each web page stored in the document storage unit; and
managing software that provides the following functions:
monitoring modification of the web pages in the document storage unit;
updating the meta-information table based on the monitoring function; and
creating, in response to a collection request from the web robot, a list of web pages in the document storage that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server.
9. A method of operating a web page server, comprising:
storing web pages;
storing status information of each stored web page;
storing names of web robots that monitor the content of the web page server;
monitoring modification of the stored web pages;
updating the status information based on the monitoring of the modification of the stored web pages;
creating, in response to a collection request by one of the web robots, a list of web pages in the document storage that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server; and
retrieving the names of web robots when the monitoring function detects modification of web pages and transmitting a message to each web robot indicating that the web robot should issue a collection request.
10. The method of claim 9, further comprising:
storing a date and time of modification when a modification of a web page is detected.
11. The method of claim 9, further comprising:
storing a current version of each web page in the web page server.
12. The method of claim 9, further comprising:
storing a modification history of the web pages.
13. The method of claim 9, further comprising:
providing a serial number indicating an order in the modification history.
14. The method of claim 9, further comprising:
including names of modified web pages and an indication of the status of each modified web page in the created list.
15. The method of claim 9, further comprising:
monitoring update, creation or deletion of web pages.
16. A method of collecting information for a client from a web page server using a web robot, the method comprising:
storing web pages in the web page server for access by the client and the web robot;
storing information including a status of each web page stored in the web page server;
collecting information using a web robot to retrieve web pages on the web page server;
indexing the retrieved web pages to provide a search facility for the client;
providing an information request from the client to the web robot, the information request including a request for web pages;
monitoring modification of the web pages stored in the web page server;
maintaining modification information based on the monitoring of the modification of the stored web pages; and
creating, in response to a collection request from the web robot, a list of web pages stored in the web page server that have been modified since a previous collection request so that the collection request only retrieves previously unretrieved documents, thereby reducing the load on the web page server.
US10/819,150 1998-01-20 2004-04-07 Information presentation apparatus with meta-information management function Abandoned US20040193628A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US10/819,150 US20040193628A1 (en) 1998-01-20 2004-04-07 Information presentation apparatus with meta-information management function

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
JP10008416A JPH11203321A (en) 1998-01-20 1998-01-20 Information providing device equipped with meta information managing function
JP10-008416 1998-01-20
US09/127,954 US6959299B2 (en) 1998-01-20 1998-08-03 Information presentation apparatus with meta-information management function
US10/819,150 US20040193628A1 (en) 1998-01-20 2004-04-07 Information presentation apparatus with meta-information management function

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US09/127,954 Continuation US6959299B2 (en) 1998-01-20 1998-08-03 Information presentation apparatus with meta-information management function

Publications (1)

Publication Number Publication Date
US20040193628A1 true US20040193628A1 (en) 2004-09-30

Family

ID=11692538

Family Applications (2)

Application Number Title Priority Date Filing Date
US09/127,954 Expired - Fee Related US6959299B2 (en) 1998-01-20 1998-08-03 Information presentation apparatus with meta-information management function
US10/819,150 Abandoned US20040193628A1 (en) 1998-01-20 2004-04-07 Information presentation apparatus with meta-information management function

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US09/127,954 Expired - Fee Related US6959299B2 (en) 1998-01-20 1998-08-03 Information presentation apparatus with meta-information management function

Country Status (2)

Country Link
US (2) US6959299B2 (en)
JP (1) JPH11203321A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133897A1 (en) * 2002-11-01 2004-07-08 Covely Frederick Henry Automated software robot generator
US20070156923A1 (en) * 2005-12-29 2007-07-05 Webex Communications, Inc. Methods and apparatuses for tracking progress of an invited participant

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7593954B1 (en) * 2000-11-15 2009-09-22 Traction Software, Inc. System and method for cross-referencing, searching and displaying entries in a document publishing system
JP4323853B2 (en) * 2003-04-11 2009-09-02 キヤノン株式会社 Update notification apparatus and method, program, and storage medium
JP4871079B2 (en) * 2006-09-04 2012-02-08 シャープ株式会社 Content receiving apparatus and content receiving method
WO2013140486A1 (en) * 2012-03-19 2013-09-26 富士通株式会社 Information processing device, data output method, and program

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5793966A (en) * 1995-12-01 1998-08-11 Vermeer Technologies, Inc. Computer system and computer-implemented process for creation and maintenance of online services
US5893114A (en) * 1995-11-29 1999-04-06 Hitachi Ltd. Document information collection method and document information collection apparatus
US5897643A (en) * 1995-04-20 1999-04-27 Fuji Xerox Co., Ltd. System for maintaining a record of documents including document copies
US5933832A (en) * 1996-09-17 1999-08-03 Kabushiki Kaisha Toshiba Retrieval system for frequently updated data distributed on network
US5978828A (en) * 1997-06-13 1999-11-02 Intel Corporation URL bookmark update notification of page content or location changes
US6006217A (en) * 1997-11-07 1999-12-21 International Business Machines Corporation Technique for providing enhanced relevance information for documents retrieved in a multi database search
US6012083A (en) * 1996-09-24 2000-01-04 Ricoh Company Ltd. Method and apparatus for document processing using agents to process transactions created based on document content
US6055570A (en) * 1997-04-03 2000-04-25 Sun Microsystems, Inc. Subscribed update monitors
US6094649A (en) * 1997-12-22 2000-07-25 Partnet, Inc. Keyword searches of structured databases
US6161102A (en) * 1994-07-25 2000-12-12 Apple Computer, Inc. Method and apparatus for searching for information in a data processing system and for providing scheduled search reports in a summary format

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6161102A (en) * 1994-07-25 2000-12-12 Apple Computer, Inc. Method and apparatus for searching for information in a data processing system and for providing scheduled search reports in a summary format
US5897643A (en) * 1995-04-20 1999-04-27 Fuji Xerox Co., Ltd. System for maintaining a record of documents including document copies
US5893114A (en) * 1995-11-29 1999-04-06 Hitachi Ltd. Document information collection method and document information collection apparatus
US5793966A (en) * 1995-12-01 1998-08-11 Vermeer Technologies, Inc. Computer system and computer-implemented process for creation and maintenance of online services
US5933832A (en) * 1996-09-17 1999-08-03 Kabushiki Kaisha Toshiba Retrieval system for frequently updated data distributed on network
US6012083A (en) * 1996-09-24 2000-01-04 Ricoh Company Ltd. Method and apparatus for document processing using agents to process transactions created based on document content
US6055570A (en) * 1997-04-03 2000-04-25 Sun Microsystems, Inc. Subscribed update monitors
US5978828A (en) * 1997-06-13 1999-11-02 Intel Corporation URL bookmark update notification of page content or location changes
US6006217A (en) * 1997-11-07 1999-12-21 International Business Machines Corporation Technique for providing enhanced relevance information for documents retrieved in a multi database search
US6094649A (en) * 1997-12-22 2000-07-25 Partnet, Inc. Keyword searches of structured databases

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040133897A1 (en) * 2002-11-01 2004-07-08 Covely Frederick Henry Automated software robot generator
US7716632B2 (en) * 2002-11-01 2010-05-11 Vertafore, Inc. Automated software robot generator
US20070156923A1 (en) * 2005-12-29 2007-07-05 Webex Communications, Inc. Methods and apparatuses for tracking progress of an invited participant

Also Published As

Publication number Publication date
US20020023068A1 (en) 2002-02-21
JPH11203321A (en) 1999-07-30
US6959299B2 (en) 2005-10-25

Similar Documents

Publication Publication Date Title
US6647421B1 (en) Method and apparatus for dispatching document requests in a proxy
US9229940B2 (en) Method and apparatus for improving the integration between a search engine and one or more file servers
US10210256B2 (en) Anchor tag indexing in a web crawler system
US7593981B2 (en) Detection of search behavior based associations between web sites
US6638314B1 (en) Method of web crawling utilizing crawl numbers
US8352597B1 (en) Method and system for distributing requests for content
CA2300239C (en) A content-indexing search system and method providing search results consistent with content filtering and blocking policies implemented in a blocking engine
JP4025379B2 (en) Search system
US6006217A (en) Technique for providing enhanced relevance information for documents retrieved in a multi database search
US6105021A (en) Thorough search of document database containing compressed and noncompressed documents
US6952730B1 (en) System and method for efficient filtering of data set addresses in a web crawler
US5884301A (en) Hypermedia system
EP1291788A2 (en) A computer implemented method and system for information retrieval
US20050262078A1 (en) Database processing method, apparatus for implementing same, and medium containing processing program therefor
US6883020B1 (en) Apparatus and method for filtering downloaded network sites
US7188108B2 (en) Method and apparatus for obtaining storage information from a transaction log
US20030005041A1 (en) World wide web document distribution system with user selective accessing of any one of a stored historical sequence of changed versions of a bookmarked web document
EP1204040A2 (en) Method for managing alterations of contents
CN1555532A (en) Temporary directory management
US6480887B1 (en) Method of retaining and managing currently displayed content information in web server
US20040193628A1 (en) Information presentation apparatus with meta-information management function
US6529939B1 (en) User-initiated maintenance of document locators
US20030115171A1 (en) Electronic files preparation for storage in a server
EP1454268B1 (en) Electronic file management
Sato et al. Temporal information retrieval in cooperative search engine

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION