Recherche Images Maps Play YouTube Actualités Gmail Drive Plus »
Recherche avancée dans les brevets | Images de page | Historique Web | Connexion

Brevets

  

United States Patent [w]

Wong et al.

US006128613A [ii] Patent Number: [45] Date of Patent:

6,128,613 Oct. 3,2000

[blocks in formation]

5,765,150 6/1998 Burrows 707/5

5,905,980 5/1999 Masuichi et al 707/1

5,920,854 7/1999 Kirsch et al 707/3

OTHER PUBLICATIONS

Unger, E.A. et al. ("Entropy as a Measure of Database
Information", IEEE, 1990, pp. 80-87).

Primary Examiner—-Thomas G. Black
Assistant Examiner—-William Trinh

Attorney, Agent, or Firm—Townsend and Townsend and
Crew LLP; Kenneth R. Allen

[57] ABSTRACT

A computer-based method and system for establishing topic words to represent a document, the topic words being suitable for use in document retrieval. The method includes determining document keywords from the document; classifying each of the document keywords into one of a plurality of preestablished keyword classes; and selecting words as the topic words, each selected word from a different one of the preestablished keyword classes, to minimize a cost function on proposed topic words. The cost function may be a metric of dissimilarity, such as crossentropy, between a first distribution of likelihood of appearance by the plurality of document keywords in a typical document and a second distribution of likelihood of appearance by the plurality of document keywords in a typical document, the second distribution being approximated using proposed topic words. The cost function can be a basis for sorting the priority of the documents.

23 Claims, 6 Drawing Sheets

[graphic][graphic][merged small]
[graphic]
[graphic][merged small]
[merged small][merged small][merged small][merged small][merged small][merged small][merged small][graphic][merged small][merged small][merged small][merged small][merged small][graphic]
« PrécédentContinuer »