CA2320386A1 - Similar document retrieval method using plural similarity calculation methods and recommended article notification service system using similar document retrieval method - Google Patents
Similar document retrieval method using plural similarity calculation methods and recommended article notification service system using similar document retrieval method Download PDFInfo
- Publication number
- CA2320386A1 CA2320386A1 CA002320386A CA2320386A CA2320386A1 CA 2320386 A1 CA2320386 A1 CA 2320386A1 CA 002320386 A CA002320386 A CA 002320386A CA 2320386 A CA2320386 A CA 2320386A CA 2320386 A1 CA2320386 A1 CA 2320386A1
- Authority
- CA
- Canada
- Prior art keywords
- retrieval method
- document retrieval
- similar document
- similarity calculation
- calculation methods
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99936—Pattern matching access
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99944—Object-oriented database structure
- Y10S707/99945—Object-oriented database structure processing
Abstract
A similar document retrieval method capable of realizing an improved retrieval performance is disclosed.
In this similar document retrieval method for retrieving similar documents of a reference document from a plurality of retrieval target documents, similarities of each one of the plurality of retrieval target documents with respect to the reference document are calculated by using each one of two or more similarity calculation methods separately, and the similar documents of the reference document are retrieved by carrying out a discrimination analysis with respect to each one of a plurality of similarities calculated by using each one of the two or more similarity calculation methods separately.
In this similar document retrieval method for retrieving similar documents of a reference document from a plurality of retrieval target documents, similarities of each one of the plurality of retrieval target documents with respect to the reference document are calculated by using each one of two or more similarity calculation methods separately, and the similar documents of the reference document are retrieved by carrying out a discrimination analysis with respect to each one of a plurality of similarities calculated by using each one of the two or more similarity calculation methods separately.
Applications Claiming Priority (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP26952899 | 1999-09-22 | ||
JPP11-269528 | 1999-09-22 | ||
JPP2000-69478 | 2000-03-13 | ||
JP2000069478A JP2001160067A (en) | 1999-09-22 | 2000-03-13 | Method for retrieving similar document and recommended article communication service system using the method |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2320386A1 true CA2320386A1 (en) | 2001-03-22 |
CA2320386C CA2320386C (en) | 2005-02-22 |
Family
ID=26548809
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002320386A Expired - Fee Related CA2320386C (en) | 1999-09-22 | 2000-09-21 | Similar document retrieval method using plural similarity calculation methods and recommended article notification service system using similar document retrieval method |
Country Status (4)
Country | Link |
---|---|
US (1) | US6301577B1 (en) |
EP (1) | EP1087302A3 (en) |
JP (1) | JP2001160067A (en) |
CA (1) | CA2320386C (en) |
Families Citing this family (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7137067B2 (en) * | 2000-03-17 | 2006-11-14 | Fujitsu Limited | Device and method for presenting news information |
US7877769B2 (en) | 2000-04-17 | 2011-01-25 | Lg Electronics Inc. | Information descriptor and extended information descriptor data structures for digital television signals |
US8782705B2 (en) | 2000-04-17 | 2014-07-15 | Lg Electronics Inc. | Information descriptor and extended information descriptor data structures for digital television signals |
US7035864B1 (en) | 2000-05-18 | 2006-04-25 | Endeca Technologies, Inc. | Hierarchical data-driven navigation system and method for information retrieval |
US7325201B2 (en) * | 2000-05-18 | 2008-01-29 | Endeca Technologies, Inc. | System and method for manipulating content in a hierarchical data-driven search and navigation system |
US20020083450A1 (en) * | 2000-12-01 | 2002-06-27 | Yakov Kamen | Method and system for content-based broadcasted program selection |
US7155668B2 (en) * | 2001-04-19 | 2006-12-26 | International Business Machines Corporation | Method and system for identifying relationships between text documents and structured variables pertaining to the text documents |
DE10160607A1 (en) * | 2001-12-10 | 2003-06-26 | Oce Printing Systems Gmbh | Production of printed document such as newspaper, from multiple files containing page data, by creating cluster file from associated input files and storing in memory before transmission to printer |
US7403990B2 (en) | 2002-05-08 | 2008-07-22 | Ricoh Company, Ltd. | Information distribution system |
US20040064449A1 (en) * | 2002-07-18 | 2004-04-01 | Ripley John R. | Remote scoring and aggregating similarity search engine for use with relational databases |
EP1678628A4 (en) * | 2003-10-10 | 2007-04-04 | Humanizing Technologies Inc | Clustering based personalized web experience |
US7428528B1 (en) | 2004-03-31 | 2008-09-23 | Endeca Technologies, Inc. | Integrated application for manipulating content in a hierarchical data-driven search and navigation system |
US7376643B2 (en) * | 2004-05-14 | 2008-05-20 | Microsoft Corporation | Method and system for determining similarity of objects based on heterogeneous relationships |
US20060167930A1 (en) * | 2004-10-08 | 2006-07-27 | George Witwer | Self-organized concept search and data storage method |
US20060142993A1 (en) * | 2004-12-28 | 2006-06-29 | Sony Corporation | System and method for utilizing distance measures to perform text classification |
US8019752B2 (en) | 2005-11-10 | 2011-09-13 | Endeca Technologies, Inc. | System and method for information retrieval from object collections with complex interrelationships |
EP1963957A4 (en) * | 2005-12-19 | 2009-05-06 | Strands Inc | User-to-user recommender |
US8676802B2 (en) | 2006-11-30 | 2014-03-18 | Oracle Otc Subsidiary Llc | Method and system for information retrieval with clustering |
US8027977B2 (en) * | 2007-06-20 | 2011-09-27 | Microsoft Corporation | Recommending content using discriminatively trained document similarity |
US7856434B2 (en) | 2007-11-12 | 2010-12-21 | Endeca Technologies, Inc. | System and method for filtering rules for manipulating search results in a hierarchical search and navigation system |
US8325362B2 (en) * | 2008-12-23 | 2012-12-04 | Microsoft Corporation | Choosing the next document |
JP5397198B2 (en) * | 2009-12-08 | 2014-01-22 | 日本電気株式会社 | Topic recommendation device, topic recommendation device method and program |
JP6075051B2 (en) * | 2012-12-14 | 2017-02-08 | 株式会社リコー | Server apparatus, electronic conference system, and program |
WO2014167880A1 (en) * | 2013-04-09 | 2014-10-16 | 株式会社日立国際電気 | Image retrieval device, image retrieval method, and recording medium |
US20170169032A1 (en) * | 2015-12-12 | 2017-06-15 | Hewlett-Packard Development Company, L.P. | Method and system of selecting and orderingcontent based on distance scores |
WO2020022536A1 (en) * | 2018-07-27 | 2020-01-30 | (주)브레인콜라 | Book recommendation method utilizing similarity between books |
CN109255018A (en) * | 2018-08-31 | 2019-01-22 | 沈文策 | A kind of method and apparatus identifying similar article |
JP7081454B2 (en) * | 2018-11-15 | 2022-06-07 | 日本電信電話株式会社 | Processing equipment, processing method, and processing program |
US11663843B2 (en) * | 2020-07-27 | 2023-05-30 | Coupa Software Incorporated | Automatic selection of templates for extraction of data from electronic documents |
US11941357B2 (en) | 2021-06-23 | 2024-03-26 | Optum Technology, Inc. | Machine learning techniques for word-based text similarity determinations |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2739950B2 (en) * | 1988-03-31 | 1998-04-15 | 株式会社東芝 | Pattern recognition device |
US5550928A (en) * | 1992-12-15 | 1996-08-27 | A.C. Nielsen Company | Audience measurement system and method |
JPH07105239A (en) * | 1993-09-30 | 1995-04-21 | Omron Corp | Data base managing method and data base retrieving method |
JP2937729B2 (en) * | 1993-12-21 | 1999-08-23 | 株式会社バーズ情報科学研究所 | Pattern recognition method and apparatus and dictionary creation method |
US5625748A (en) * | 1994-04-18 | 1997-04-29 | Bbn Corporation | Topic discriminator using posterior probability or confidence scores |
US6202058B1 (en) * | 1994-04-25 | 2001-03-13 | Apple Computer, Inc. | System for ranking the relevance of information objects accessed by computer users |
JP3416268B2 (en) * | 1994-06-30 | 2003-06-16 | キヤノン株式会社 | Image recognition apparatus and method |
US5907836A (en) * | 1995-07-31 | 1999-05-25 | Kabushiki Kaisha Toshiba | Information filtering apparatus for selecting predetermined article from plural articles to present selected article to user, and method therefore |
DK0932398T3 (en) * | 1996-06-28 | 2006-09-25 | Ortho Mcneil Pharm Inc | Use of topiramate or derivatives thereof for the manufacture of a medicament for the treatment of manic depressive bipolar disorders |
US5999893A (en) * | 1997-05-02 | 1999-12-07 | The United States Of America As Represented By The Secretary Of The Navy | Classification system and method using combined information testing |
US6128608A (en) * | 1998-05-01 | 2000-10-03 | Barnhill Technologies, Llc | Enhancing knowledge discovery using multiple support vector machines |
-
2000
- 2000-03-13 JP JP2000069478A patent/JP2001160067A/en active Pending
- 2000-09-21 CA CA002320386A patent/CA2320386C/en not_active Expired - Fee Related
- 2000-09-22 US US09/668,718 patent/US6301577B1/en not_active Expired - Fee Related
- 2000-09-22 EP EP00120168A patent/EP1087302A3/en not_active Withdrawn
Also Published As
Publication number | Publication date |
---|---|
EP1087302A3 (en) | 2005-08-24 |
JP2001160067A (en) | 2001-06-12 |
US6301577B1 (en) | 2001-10-09 |
EP1087302A2 (en) | 2001-03-28 |
CA2320386C (en) | 2005-02-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2320386A1 (en) | Similar document retrieval method using plural similarity calculation methods and recommended article notification service system using similar document retrieval method | |
EP1126388A3 (en) | System and Method for content-based multimedia retrieval | |
CA2392893A1 (en) | Similar document retrieving method and system | |
EP1168202A3 (en) | Apparatus for retrieving similar documents and apparatus for extracting relevant keywords | |
EP1072982A3 (en) | Method and system for similar word extraction and document retrieval | |
AU5587400A (en) | System and method for database retrieval, indexing and statistical analysis | |
WO2006041950A3 (en) | Classification-expanded indexing and retrieval of classified documents | |
WO2000031653A3 (en) | System for retrieving images using a database | |
EP1494137A3 (en) | Video retrieving system and method | |
WO2005036351A3 (en) | Systems and methods for search processing using superunits | |
WO2000057324A3 (en) | Funds transfer repair system | |
WO1999036863A3 (en) | System and method for selective retrieval of a video sequence | |
AU4020101A (en) | System and method for performing similarity searching | |
WO1998038561A3 (en) | A system and method of optimizing database queries in two or more dimensions | |
WO2003079234A3 (en) | Knowledge management using text classification | |
CA2329558A1 (en) | Methods and apparatus for similarity text search based on conceptual indexing | |
AU3832500A (en) | System and method for inputting, retrieving, organizing and analyzing data | |
MXPA01011691A (en) | Information management, retrieval and display system and associated method. | |
EP0851659A3 (en) | Information processing system and method therefor | |
WO2000075811A3 (en) | Method and system for text mining using multidimensional subspaces | |
EP0933727A3 (en) | Image information processing apparatus and its method | |
EP1503300A3 (en) | Vision-based document segmentation | |
DE60033118D1 (en) | System and method for content-based retrieval of images | |
EP0959420A3 (en) | Method of and apparatus for retrieving information and storage medium | |
FR2750519B1 (en) | METHOD FOR EXTRACTING DATABASE DOCUMENTS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |