CA2595674A1 - Multiple index based information retrieval system - Google Patents
Multiple index based information retrieval system Download PDFInfo
- Publication number
- CA2595674A1 CA2595674A1 CA002595674A CA2595674A CA2595674A1 CA 2595674 A1 CA2595674 A1 CA 2595674A1 CA 002595674 A CA002595674 A CA 002595674A CA 2595674 A CA2595674 A CA 2595674A CA 2595674 A1 CA2595674 A1 CA 2595674A1
- Authority
- CA
- Canada
- Prior art keywords
- phrase
- documents
- list
- document
- primary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/93—Document management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/313—Selection or weighting of terms for indexing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2457—Query processing with adaptation to user needs
- G06F16/24578—Query processing with adaptation to user needs using ranking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9538—Presentation of query results
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99935—Query augmenting and refining, e.g. inexact access
Abstract
An information retrieval system uses phrases to index, retrieve, organize and describe documents. Phrases are identified that predict the presence of other phrases in documents. Documents are the indexed according to their included phrases. The document index is partitioned into multiple indexes, including a primary index and a secondary index. The primary index stores phrase posting lists with relevance rank ordered documents. The secondary index stores excess documents from the posting lists in document order.
Claims (12)
1. A computer implemented method for indexing documents with respect to a phrase, wherein each document as a document identifier, the method comprising:
establishing a list of documents that contain the phrase;
ranking the documents in the list by a relevance score;
storing a first portion of the list comprising higher ranked documents in a primary index in rank order of the relevance scores; and storing a second portion of the list comprising lesser ranked documents in a secondary index in numerical order of the document identifiers.
establishing a list of documents that contain the phrase;
ranking the documents in the list by a relevance score;
storing a first portion of the list comprising higher ranked documents in a primary index in rank order of the relevance scores; and storing a second portion of the list comprising lesser ranked documents in a secondary index in numerical order of the document identifiers.
2. The method of claim 1, wherein the relevance score comprises a page rank based type score.
3. The method of claim 1, further comprising storing for each document in the primary index relevance attributes of the document.
4. The method of claim 3, wherein the relevance attributes include at least one of the following: a total number of occurrences of the phrase in document, a rank ordered list of anchor documents that also contain the phrase and that point to the document, a position of each phrase occurrence in the document, a set of one or more flag indicating a format of the occurrence or a portion of the document containing the occurrence.
5. The method of claim 3, wherein storing the second portion of the list in the secondary index comprises storing substantially only document identification information.
6. The method of claim 1, wherein storing the first portion of the list in a primary index comprises storing the first portion of the list on a physical storage device in rank order of the relevance scores.
7. The method of claim 1, wherein storing a second portion of the list in a secondary index comprises storing the second portion of the list on a physical storage device in numerical order of the document identifiers.
8. The method of claim 1, wherein the first portion of each list of documents includes a first section wherein each document listed in the first section includes a first plurality of relevance attributes, and a second section wherein each document listed in the second section comprises second plurality of relevance attributes that are subset of the first set of relevance attributes, and wherein the documents listed in the first section are ranked higher than the documents listed in the second section.
9. The method of claim 8, wherein the first portion of each list of documents includes a third section wherein each documents listed in the third section includes a third plurality of relevance attributes that are a subset of the second plurality of relevance attributes, and wherein the documents listed in the second section are ranked higher than the documents listed in the third section.
10. The method of claim 8, wherein the first portion of each list contain n entries, wherein the second portion of the list contain m~n entries, wherein m~2, and the third portion of the list contains 1~n entries, wherein 1~4.
11. A method of providing an information retrieval system, the method comprising:
storing a primary index including primary phrase posting lists, each posting list associated with a phrase and including up to a maximum number documents that contain the phrase, the documents rank ordered by respective relevance scores;
storing a secondary index including secondary phrase posting lists, each posting list associated with a primary phrase posting list in the primary index, and including documents that contain the phrase and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase, the documents ordered by document identifier;
receiving a search query comprising at least one phrase;
responsive to the search query containing a first phrase having a primary posting list and a secondary posting list and a second phrase having only a primary posting list, intersecting the primary posting list of the first phrase with the primary posting list of the second phrase to obtain a first set of common documents, and intersecting the secondary posting list of the first phrase with the primary posting list of the second phrase to obtain a second set of common documents, and conjoining the first and second sets of common documents; and ranking the common documents.
storing a primary index including primary phrase posting lists, each posting list associated with a phrase and including up to a maximum number documents that contain the phrase, the documents rank ordered by respective relevance scores;
storing a secondary index including secondary phrase posting lists, each posting list associated with a primary phrase posting list in the primary index, and including documents that contain the phrase and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase, the documents ordered by document identifier;
receiving a search query comprising at least one phrase;
responsive to the search query containing a first phrase having a primary posting list and a secondary posting list and a second phrase having only a primary posting list, intersecting the primary posting list of the first phrase with the primary posting list of the second phrase to obtain a first set of common documents, and intersecting the secondary posting list of the first phrase with the primary posting list of the second phrase to obtain a second set of common documents, and conjoining the first and second sets of common documents; and ranking the common documents.
12. An information retrieval system, comprising:
a primary index including primary phrase posting lists, each posting list associated with a phrase and including up to a maximum number documents that contain the phrase, the documents rank ordered by respective relevance scores; and a secondary index including secondary phrase posting lists, each posting list associated with a primary phrase posting list in the primary index, and including documents that contain the phrase and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase, the documents ordered by document identifier.
a primary index including primary phrase posting lists, each posting list associated with a phrase and including up to a maximum number documents that contain the phrase, the documents rank ordered by respective relevance scores; and a secondary index including secondary phrase posting lists, each posting list associated with a primary phrase posting list in the primary index, and including documents that contain the phrase and which have relevance scores less than the relevance score of a lowest ranked document in the primary posting list for the phrase, the documents ordered by document identifier.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/043,695 | 2005-01-25 | ||
US11/043,695 US7567959B2 (en) | 2004-07-26 | 2005-01-25 | Multiple index based information retrieval system |
PCT/US2006/002709 WO2006081325A2 (en) | 2005-01-25 | 2006-01-25 | Multiple index based information retrieval system |
Publications (2)
Publication Number | Publication Date |
---|---|
CA2595674A1 true CA2595674A1 (en) | 2006-08-03 |
CA2595674C CA2595674C (en) | 2012-07-03 |
Family
ID=36741037
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA2595674A Expired - Fee Related CA2595674C (en) | 2005-01-25 | 2006-01-25 | Multiple index based information retrieval system |
Country Status (11)
Country | Link |
---|---|
US (5) | US7567959B2 (en) |
EP (1) | EP1844391B1 (en) |
JP (1) | JP4881322B2 (en) |
KR (1) | KR101273520B1 (en) |
CN (1) | CN101133388B (en) |
AU (2) | AU2006208079B2 (en) |
BR (1) | BRPI0614024B1 (en) |
CA (1) | CA2595674C (en) |
DK (1) | DK1844391T3 (en) |
NO (1) | NO338518B1 (en) |
WO (1) | WO2006081325A2 (en) |
Families Citing this family (142)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7266553B1 (en) * | 2002-07-01 | 2007-09-04 | Microsoft Corporation | Content data indexing |
US7580929B2 (en) * | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase-based personalization of searches in an information retrieval system |
US7702618B1 (en) | 2004-07-26 | 2010-04-20 | Google Inc. | Information retrieval system for archiving multiple document versions |
US7584175B2 (en) | 2004-07-26 | 2009-09-01 | Google Inc. | Phrase-based generation of document descriptions |
US7536408B2 (en) | 2004-07-26 | 2009-05-19 | Google Inc. | Phrase-based indexing in an information retrieval system |
US7599914B2 (en) * | 2004-07-26 | 2009-10-06 | Google Inc. | Phrase-based searching in an information retrieval system |
US7711679B2 (en) * | 2004-07-26 | 2010-05-04 | Google Inc. | Phrase-based detection of duplicate documents in an information retrieval system |
US7567959B2 (en) * | 2004-07-26 | 2009-07-28 | Google Inc. | Multiple index based information retrieval system |
US7580921B2 (en) | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase identification in an information retrieval system |
US7199571B2 (en) * | 2004-07-27 | 2007-04-03 | Optisense Network, Inc. | Probe apparatus for use in a separable connector, and systems including same |
US20060036598A1 (en) * | 2004-08-09 | 2006-02-16 | Jie Wu | Computerized method for ranking linked information items in distributed sources |
US7539661B2 (en) * | 2005-06-02 | 2009-05-26 | Delphi Technologies, Inc. | Table look-up method with adaptive hashing |
US7451135B2 (en) * | 2005-06-13 | 2008-11-11 | Inform Technologies, Llc | System and method for retrieving and displaying information relating to electronic documents available from an informational network |
US20070150721A1 (en) * | 2005-06-13 | 2007-06-28 | Inform Technologies, Llc | Disambiguation for Preprocessing Content to Determine Relationships |
JP4756953B2 (en) * | 2005-08-26 | 2011-08-24 | 富士通株式会社 | Information search apparatus and information search method |
US20070078889A1 (en) * | 2005-10-04 | 2007-04-05 | Hoskinson Ronald A | Method and system for automated knowledge extraction and organization |
US7676463B2 (en) * | 2005-11-15 | 2010-03-09 | Kroll Ontrack, Inc. | Information exploration systems and method |
US8126874B2 (en) * | 2006-05-09 | 2012-02-28 | Google Inc. | Systems and methods for generating statistics from search engine query logs |
JP4322887B2 (en) * | 2006-06-01 | 2009-09-02 | 株式会社東芝 | Thread ranking apparatus and method |
US20080033943A1 (en) * | 2006-08-07 | 2008-02-07 | Bea Systems, Inc. | Distributed index search |
US9015197B2 (en) | 2006-08-07 | 2015-04-21 | Oracle International Corporation | Dynamic repartitioning for changing a number of nodes or partitions in a distributed search system |
US20080071732A1 (en) * | 2006-09-18 | 2008-03-20 | Konstantin Koll | Master/slave index in computer systems |
US20080082554A1 (en) * | 2006-10-03 | 2008-04-03 | Paul Pedersen | Systems and methods for providing a dynamic document index |
US8301603B2 (en) * | 2006-10-06 | 2012-10-30 | Nec Corporation | Information document search system, method and program for partitioned indexes on a time series in association with a backup document storage |
US8005822B2 (en) | 2007-01-17 | 2011-08-23 | Google Inc. | Location in search queries |
US8326858B2 (en) * | 2007-01-17 | 2012-12-04 | Google Inc. | Synchronization of fixed and mobile data |
US7966309B2 (en) * | 2007-01-17 | 2011-06-21 | Google Inc. | Providing relevance-ordered categories of information |
US8966407B2 (en) | 2007-01-17 | 2015-02-24 | Google Inc. | Expandable homepage modules |
US7966321B2 (en) | 2007-01-17 | 2011-06-21 | Google Inc. | Presentation of local results |
US8280877B2 (en) * | 2007-02-22 | 2012-10-02 | Microsoft Corporation | Diverse topic phrase extraction |
US8086594B1 (en) | 2007-03-30 | 2011-12-27 | Google Inc. | Bifurcated document relevance scoring |
US7925655B1 (en) | 2007-03-30 | 2011-04-12 | Google Inc. | Query scheduling using hierarchical tiers of index servers |
US8166021B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Query phrasification |
US8166045B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Phrase extraction using subphrase scoring |
US7702614B1 (en) * | 2007-03-30 | 2010-04-20 | Google Inc. | Index updating using segment swapping |
US7693813B1 (en) | 2007-03-30 | 2010-04-06 | Google Inc. | Index server architecture using tiered and sharded phrase posting lists |
WO2008120030A1 (en) * | 2007-04-02 | 2008-10-09 | Sobha Renaissance Information | Latent metonymical analysis and indexing [lmai] |
US7809610B2 (en) * | 2007-04-09 | 2010-10-05 | Platformation, Inc. | Methods and apparatus for freshness and completeness of information |
US7809714B1 (en) | 2007-04-30 | 2010-10-05 | Lawrence Richard Smith | Process for enhancing queries for information retrieval |
US7814107B1 (en) | 2007-05-25 | 2010-10-12 | Amazon Technologies, Inc. | Generating similarity scores for matching non-identical data strings |
US7908279B1 (en) | 2007-05-25 | 2011-03-15 | Amazon Technologies, Inc. | Filtering invalid tokens from a document using high IDF token filtering |
US8046372B1 (en) | 2007-05-25 | 2011-10-25 | Amazon Technologies, Inc. | Duplicate entry detection system and method |
US7917516B2 (en) | 2007-06-08 | 2011-03-29 | Apple Inc. | Updating an inverted index |
EP2031508A1 (en) * | 2007-08-31 | 2009-03-04 | Ricoh Europe PLC | Network printing apparatus and method |
US8117223B2 (en) | 2007-09-07 | 2012-02-14 | Google Inc. | Integrating external related phrase information into a phrase-based indexing information retrieval system |
US8165985B2 (en) | 2007-10-12 | 2012-04-24 | Palo Alto Research Center Incorporated | System and method for performing discovery of digital information in a subject area |
US8073682B2 (en) * | 2007-10-12 | 2011-12-06 | Palo Alto Research Center Incorporated | System and method for prospecting digital information |
US8671104B2 (en) * | 2007-10-12 | 2014-03-11 | Palo Alto Research Center Incorporated | System and method for providing orientation into digital information |
US20090112843A1 (en) * | 2007-10-29 | 2009-04-30 | International Business Machines Corporation | System and method for providing differentiated service levels for search index |
US7895225B1 (en) * | 2007-12-06 | 2011-02-22 | Amazon Technologies, Inc. | Identifying potential duplicates of a document in a document corpus |
US8799264B2 (en) * | 2007-12-14 | 2014-08-05 | Microsoft Corporation | Method for improving search engine efficiency |
US9037560B2 (en) * | 2008-03-05 | 2015-05-19 | Chacha Search, Inc. | Method and system for triggering a search request |
US9081853B2 (en) * | 2008-04-03 | 2015-07-14 | Graham Holdings Company | Information display system based on user profile data with assisted and explicit profile modification |
CN101359331B (en) * | 2008-05-04 | 2014-03-19 | 索意互动(北京)信息技术有限公司 | Method and system for reordering search result |
US20090287684A1 (en) * | 2008-05-14 | 2009-11-19 | Bennett James D | Historical internet |
US8161036B2 (en) * | 2008-06-27 | 2012-04-17 | Microsoft Corporation | Index optimization for ranking using a linear model |
US8171031B2 (en) * | 2008-06-27 | 2012-05-01 | Microsoft Corporation | Index optimization for ranking using a linear model |
US8788476B2 (en) * | 2008-08-15 | 2014-07-22 | Chacha Search, Inc. | Method and system of triggering a search request |
US8010545B2 (en) * | 2008-08-28 | 2011-08-30 | Palo Alto Research Center Incorporated | System and method for providing a topic-directed search |
US20100057577A1 (en) * | 2008-08-28 | 2010-03-04 | Palo Alto Research Center Incorporated | System And Method For Providing Topic-Guided Broadening Of Advertising Targets In Social Indexing |
US8209616B2 (en) * | 2008-08-28 | 2012-06-26 | Palo Alto Research Center Incorporated | System and method for interfacing a web browser widget with social indexing |
US20100057536A1 (en) * | 2008-08-28 | 2010-03-04 | Palo Alto Research Center Incorporated | System And Method For Providing Community-Based Advertising Term Disambiguation |
JP5384884B2 (en) * | 2008-09-03 | 2014-01-08 | 日本電信電話株式会社 | Information retrieval apparatus and information retrieval program |
US8156130B2 (en) | 2008-10-17 | 2012-04-10 | Embarq Holdings Company Llc | System and method for collapsing search results |
US8874564B2 (en) * | 2008-10-17 | 2014-10-28 | Centurylink Intellectual Property Llc | System and method for communicating search results to one or more other parties |
US8326829B2 (en) * | 2008-10-17 | 2012-12-04 | Centurylink Intellectual Property Llc | System and method for displaying publication dates for search results |
US8549016B2 (en) * | 2008-11-14 | 2013-10-01 | Palo Alto Research Center Incorporated | System and method for providing robust topic identification in social indexes |
US8452781B2 (en) * | 2009-01-27 | 2013-05-28 | Palo Alto Research Center Incorporated | System and method for using banded topic relevance and time for article prioritization |
US8356044B2 (en) * | 2009-01-27 | 2013-01-15 | Palo Alto Research Center Incorporated | System and method for providing default hierarchical training for social indexing |
US8239397B2 (en) * | 2009-01-27 | 2012-08-07 | Palo Alto Research Center Incorporated | System and method for managing user attention by detecting hot and cold topics in social indexes |
US9245033B2 (en) | 2009-04-02 | 2016-01-26 | Graham Holdings Company | Channel sharing |
US10089391B2 (en) * | 2009-07-29 | 2018-10-02 | Herbminers Informatics Limited | Ontological information retrieval system |
GB2472250A (en) * | 2009-07-31 | 2011-02-02 | Stephen Timothy Morris | Method for determining document relevance |
US8205025B2 (en) * | 2009-08-12 | 2012-06-19 | Globalspec, Inc. | Efficient buffered reading with a plug-in for input buffer size determination |
US20110078131A1 (en) * | 2009-09-30 | 2011-03-31 | Microsoft Corporation | Experimental web search system |
US8838576B2 (en) * | 2009-10-12 | 2014-09-16 | Yahoo! Inc. | Posting list intersection parallelism in query processing |
US8756215B2 (en) * | 2009-12-02 | 2014-06-17 | International Business Machines Corporation | Indexing documents |
US20110258212A1 (en) * | 2010-04-14 | 2011-10-20 | Microsoft Corporation | Automatic query suggestion generation using sub-queries |
US9031944B2 (en) | 2010-04-30 | 2015-05-12 | Palo Alto Research Center Incorporated | System and method for providing multi-core and multi-level topical organization in social indexes |
US10216831B2 (en) * | 2010-05-19 | 2019-02-26 | Excalibur Ip, Llc | Search results summarized with tokens |
US8352474B2 (en) * | 2010-06-16 | 2013-01-08 | Fuji Xerox Co., Ltd. | System and method for retrieving information using a query based index |
US20120047172A1 (en) * | 2010-08-23 | 2012-02-23 | Google Inc. | Parallel document mining |
US8655648B2 (en) * | 2010-09-01 | 2014-02-18 | Microsoft Corporation | Identifying topically-related phrases in a browsing sequence |
US8738673B2 (en) | 2010-09-03 | 2014-05-27 | International Business Machines Corporation | Index partition maintenance over monotonically addressed document sequences |
JP5492814B2 (en) * | 2011-03-28 | 2014-05-14 | デジタルア−ツ株式会社 | SEARCH DEVICE, SEARCH SYSTEM, METHOD, AND PROGRAM |
US9201895B2 (en) | 2011-06-03 | 2015-12-01 | Apple Inc. | Management of downloads from a network-based digital data repository based on network performance |
US20120311080A1 (en) * | 2011-06-03 | 2012-12-06 | Thomas Alsina | Management of Downloads from a Network-Based Digital Data Repository |
US8595238B2 (en) | 2011-06-22 | 2013-11-26 | International Business Machines Corporation | Smart index creation and reconciliation in an interconnected network of systems |
US9152697B2 (en) * | 2011-07-13 | 2015-10-06 | International Business Machines Corporation | Real-time search of vertically partitioned, inverted indexes |
US20130024459A1 (en) * | 2011-07-20 | 2013-01-24 | Microsoft Corporation | Combining Full-Text Search and Queryable Fields in the Same Data Structure |
US8818971B1 (en) | 2012-01-30 | 2014-08-26 | Google Inc. | Processing bulk deletions in distributed databases |
US9892198B2 (en) | 2012-06-07 | 2018-02-13 | Oath Inc. | Page personalization performed by an edge server |
US8892422B1 (en) | 2012-07-09 | 2014-11-18 | Google Inc. | Phrase identification in a sequence of words |
US20140046976A1 (en) * | 2012-08-11 | 2014-02-13 | Guangsheng Zhang | Systems, methods, and user interface for effectively presenting information |
GB2505183A (en) * | 2012-08-21 | 2014-02-26 | Ibm | Discovering composite keys |
US10198776B2 (en) | 2012-09-21 | 2019-02-05 | Graham Holdings Company | System and method for delivering an open profile personalization system through social media based on profile data structures that contain interest nodes or channels |
US9721000B2 (en) * | 2012-12-20 | 2017-08-01 | Microsoft Technology Licensing, Llc | Generating and using a customized index |
US20140195961A1 (en) * | 2013-01-07 | 2014-07-10 | Apple Inc. | Dynamic Index |
US10387429B2 (en) * | 2013-02-08 | 2019-08-20 | Jive Software, Inc. | Fast ad-hoc filtering of time series analytics |
US9501506B1 (en) | 2013-03-15 | 2016-11-22 | Google Inc. | Indexing system |
US9256644B1 (en) * | 2013-03-15 | 2016-02-09 | Ca, Inc. | System for identifying and investigating shared and derived content |
US9575958B1 (en) * | 2013-05-02 | 2017-02-21 | Athena Ann Smyros | Differentiation testing |
US9483568B1 (en) | 2013-06-05 | 2016-11-01 | Google Inc. | Indexing system |
US9146980B1 (en) * | 2013-06-24 | 2015-09-29 | Google Inc. | Temporal content selection |
US20150019565A1 (en) * | 2013-07-11 | 2015-01-15 | Outside Intelligence Inc. | Method And System For Scoring Credibility Of Information Sources |
US9489411B2 (en) * | 2013-07-29 | 2016-11-08 | Sybase, Inc. | High performance index creation |
US9424345B1 (en) | 2013-09-25 | 2016-08-23 | Google Inc. | Contextual content distribution |
US9336258B2 (en) | 2013-10-25 | 2016-05-10 | International Business Machines Corporation | Reducing database locking contention using multi-version data record concurrency control |
US9450771B2 (en) * | 2013-11-20 | 2016-09-20 | Blab, Inc. | Determining information inter-relationships from distributed group discussions |
KR101592670B1 (en) * | 2014-02-17 | 2016-02-11 | 포항공과대학교 산학협력단 | Apparatus for searching data using index and method for using the apparatus |
CN103810300B (en) * | 2014-03-10 | 2017-08-01 | 北京国双科技有限公司 | The data query method and apparatus covered for non-index |
US9817855B2 (en) * | 2014-03-17 | 2017-11-14 | SynerScope B.V. | Method and system for determining a measure of overlap between data entries |
US10503761B2 (en) | 2014-07-14 | 2019-12-10 | International Business Machines Corporation | System for searching, recommending, and exploring documents through conceptual associations |
US10162882B2 (en) | 2014-07-14 | 2018-12-25 | Nternational Business Machines Corporation | Automatically linking text to concepts in a knowledge base |
US10437869B2 (en) * | 2014-07-14 | 2019-10-08 | International Business Machines Corporation | Automatic new concept definition |
US9864741B2 (en) * | 2014-09-23 | 2018-01-09 | Prysm, Inc. | Automated collective term and phrase index |
US9785724B2 (en) | 2014-10-30 | 2017-10-10 | Microsoft Technology Licensing, Llc | Secondary queue for index process |
US10042928B1 (en) | 2014-12-03 | 2018-08-07 | The Government Of The United States As Represented By The Director, National Security Agency | System and method for automated reasoning with and searching of documents |
US10025783B2 (en) * | 2015-01-30 | 2018-07-17 | Microsoft Technology Licensing, Llc | Identifying similar documents using graphs |
CN104715063B (en) * | 2015-03-31 | 2018-11-02 | 百度在线网络技术(北京)有限公司 | search ordering method and device |
US10229143B2 (en) | 2015-06-23 | 2019-03-12 | Microsoft Technology Licensing, Llc | Storage and retrieval of data from a bit vector search index |
US11281639B2 (en) | 2015-06-23 | 2022-03-22 | Microsoft Technology Licensing, Llc | Match fix-up to remove matching documents |
US10242071B2 (en) | 2015-06-23 | 2019-03-26 | Microsoft Technology Licensing, Llc | Preliminary ranker for scoring matching documents |
US10733164B2 (en) | 2015-06-23 | 2020-08-04 | Microsoft Technology Licensing, Llc | Updating a bit vector search index |
US10467215B2 (en) | 2015-06-23 | 2019-11-05 | Microsoft Technology Licensing, Llc | Matching documents using a bit vector search index |
US10565198B2 (en) | 2015-06-23 | 2020-02-18 | Microsoft Technology Licensing, Llc | Bit vector search index using shards |
US11392568B2 (en) | 2015-06-23 | 2022-07-19 | Microsoft Technology Licensing, Llc | Reducing matching documents for a search query |
US11392582B2 (en) * | 2015-10-15 | 2022-07-19 | Sumo Logic, Inc. | Automatic partitioning |
CN107015992A (en) * | 2016-01-28 | 2017-08-04 | 珠海金山办公软件有限公司 | A kind of document display method and device |
US10885009B1 (en) * | 2016-06-14 | 2021-01-05 | Amazon Technologies, Inc. | Generating aggregate views for data indices |
US10810236B1 (en) * | 2016-10-21 | 2020-10-20 | Twitter, Inc. | Indexing data in information retrieval systems |
US10169331B2 (en) * | 2017-01-29 | 2019-01-01 | International Business Machines Corporation | Text mining for automatically determining semantic relatedness |
CN107357846B (en) * | 2017-06-26 | 2018-12-14 | 北京金堤科技有限公司 | The methods of exhibiting and device of relation map |
US11449484B2 (en) * | 2018-06-25 | 2022-09-20 | Ebay Inc. | Data indexing and searching using permutation indexes |
CN108897730B (en) * | 2018-06-29 | 2022-07-29 | 国信优易数据股份有限公司 | PDF text processing method and device |
CN109376121B (en) * | 2018-08-10 | 2021-07-02 | 南京华讯方舟通信设备有限公司 | File indexing system and method based on elastic search full-text retrieval |
CN109086456B (en) * | 2018-08-31 | 2020-11-03 | 中国联合网络通信集团有限公司 | Data indexing method and device |
US10902069B2 (en) | 2018-12-18 | 2021-01-26 | Runtime Collective Limited | Distributed indexing and aggregation |
CN112084435A (en) * | 2020-08-07 | 2020-12-15 | 北京三快在线科技有限公司 | Search ranking model training method and device and search ranking method and device |
US11442971B1 (en) * | 2021-05-26 | 2022-09-13 | Adobe Inc. | Selective database re-indexing |
WO2023059909A2 (en) * | 2021-10-08 | 2023-04-13 | Open Text Holdings, Inc. | System and method for efficient multi-stage querying of archived data |
US20230109804A1 (en) * | 2021-10-08 | 2023-04-13 | Open Text Holdings, Inc. | System and method for efficient multi-stage querying of archived data |
Family Cites Families (191)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS619753A (en) * | 1984-06-26 | 1986-01-17 | Hitachi Ltd | Automatic registering method of frequent phrase for document processor |
US4773039A (en) * | 1985-11-19 | 1988-09-20 | International Business Machines Corporation | Information processing system for compaction and replacement of phrases |
JPH02270067A (en) | 1987-04-16 | 1990-11-05 | Westinghouse Electric Corp <We> | Inteligent inquiry system |
US5321833A (en) * | 1990-08-29 | 1994-06-14 | Gte Laboratories Incorporated | Adaptive ranking system for information retrieval |
US5278980A (en) | 1991-08-16 | 1994-01-11 | Xerox Corporation | Iterative technique for phrase query formation and an information retrieval system employing same |
US5523946A (en) | 1992-02-11 | 1996-06-04 | Xerox Corporation | Compact encoding of multi-lingual translation dictionaries |
US5353401A (en) * | 1992-11-06 | 1994-10-04 | Ricoh Company, Ltd. | Automatic interface layout generator for database systems |
JPH0756933A (en) * | 1993-06-24 | 1995-03-03 | Xerox Corp | Method for retrieval of document |
US5692176A (en) * | 1993-11-22 | 1997-11-25 | Reed Elsevier Inc. | Associative text search and retrieval system |
US5734749A (en) | 1993-12-27 | 1998-03-31 | Nec Corporation | Character string input system for completing an input character string with an incomplete input indicative sign |
JPH07262217A (en) | 1994-03-24 | 1995-10-13 | Fuji Xerox Co Ltd | Text retrieval device |
US5715443A (en) | 1994-07-25 | 1998-02-03 | Apple Computer, Inc. | Method and apparatus for searching for information in a data processing system and for providing scheduled search reports in a summary format |
JP3669016B2 (en) | 1994-09-30 | 2005-07-06 | 株式会社日立製作所 | Document information classification device |
US5694593A (en) | 1994-10-05 | 1997-12-02 | Northeastern University | Distributed computer database system and method |
US5758257A (en) * | 1994-11-29 | 1998-05-26 | Herz; Frederick | System and method for scheduling broadcast of and access to video programs and other data using customer profiles |
US6460036B1 (en) | 1994-11-29 | 2002-10-01 | Pinpoint Incorporated | System and method for providing customized electronic newspapers and target advertisements |
JP2929963B2 (en) * | 1995-03-15 | 1999-08-03 | 松下電器産業株式会社 | Document search device, word index creation method, and document search method |
US5745602A (en) * | 1995-05-01 | 1998-04-28 | Xerox Corporation | Automatic method of selecting multi-word key phrases from a document |
US5659732A (en) | 1995-05-17 | 1997-08-19 | Infoseek Corporation | Document retrieval over networks wherein ranking and relevance scores are computed at the client for multiple database documents |
US5724571A (en) * | 1995-07-07 | 1998-03-03 | Sun Microsystems, Inc. | Method and apparatus for generating query responses in a computer-based document retrieval system |
JPH0934911A (en) | 1995-07-18 | 1997-02-07 | Fuji Xerox Co Ltd | Information retrieval device |
US5668987A (en) | 1995-08-31 | 1997-09-16 | Sybase, Inc. | Database system with subquery optimizer |
US6366933B1 (en) * | 1995-10-27 | 2002-04-02 | At&T Corp. | Method and apparatus for tracking and viewing changes on the web |
US5757917A (en) | 1995-11-01 | 1998-05-26 | First Virtual Holdings Incorporated | Computerized payment system for purchasing goods and services on the internet |
US6098034A (en) * | 1996-03-18 | 2000-08-01 | Expert Ease Development, Ltd. | Method for standardizing phrasing in a document |
US7051024B2 (en) * | 1999-04-08 | 2006-05-23 | Microsoft Corporation | Document summarizer for word processors |
US5924108A (en) * | 1996-03-29 | 1999-07-13 | Microsoft Corporation | Document summarizer for word processors |
US5721897A (en) | 1996-04-09 | 1998-02-24 | Rubinstein; Seymour I. | Browse by prompted keyword phrases with an improved user interface |
US5794233A (en) | 1996-04-09 | 1998-08-11 | Rubinstein; Seymour I. | Browse by prompted keyword phrases |
US5826261A (en) * | 1996-05-10 | 1998-10-20 | Spencer; Graham | System and method for querying multiple, distributed databases by selective sharing of local relative significance information for terms related to the query |
US5915249A (en) * | 1996-06-14 | 1999-06-22 | Excite, Inc. | System and method for accelerated query evaluation of very large full-text databases |
EP0822502A1 (en) * | 1996-07-31 | 1998-02-04 | BRITISH TELECOMMUNICATIONS public limited company | Data access system |
US5920854A (en) * | 1996-08-14 | 1999-07-06 | Infoseek Corporation | Real-time document collection search engine with phrase indexing |
US6085186A (en) | 1996-09-20 | 2000-07-04 | Netbot, Inc. | Method and system using information written in a wrapper description language to execute query on a network |
US20030093790A1 (en) * | 2000-03-28 | 2003-05-15 | Logan James D. | Audio and video program recording, editing and playback systems using metadata |
JP3584848B2 (en) | 1996-10-31 | 2004-11-04 | 富士ゼロックス株式会社 | Document processing device, item search device, and item search method |
JP3902825B2 (en) * | 1997-01-16 | 2007-04-11 | キヤノン株式会社 | Document search system and method |
US5960383A (en) | 1997-02-25 | 1999-09-28 | Digital Equipment Corporation | Extraction of key sections from texts using automatic indexing techniques |
US6539430B1 (en) * | 1997-03-25 | 2003-03-25 | Symantec Corporation | System and method for filtering data received by a computer system |
US6185550B1 (en) | 1997-06-13 | 2001-02-06 | Sun Microsystems, Inc. | Method and apparatus for classifying documents within a class hierarchy creating term vector, term file and relevance ranking |
US6470307B1 (en) * | 1997-06-23 | 2002-10-22 | National Research Council Of Canada | Method and apparatus for automatically identifying keywords within a document |
US5995962A (en) * | 1997-07-25 | 1999-11-30 | Claritech Corporation | Sort system for merging database entries |
US5983216A (en) * | 1997-09-12 | 1999-11-09 | Infoseek Corporation | Performing automated document collection and selection by providing a meta-index with meta-index values indentifying corresponding document collections |
US5845278A (en) | 1997-09-12 | 1998-12-01 | Inioseek Corporation | Method for automatically selecting collections to search in full text searches |
US6018733A (en) | 1997-09-12 | 2000-01-25 | Infoseek Corporation | Methods for iteratively and interactively performing collection selection in full text searches |
US5956722A (en) | 1997-09-23 | 1999-09-21 | At&T Corp. | Method for effective indexing of partially dynamic documents |
US6542888B2 (en) * | 1997-11-26 | 2003-04-01 | International Business Machines Corporation | Content filtering for electronic documents generated in multiple foreign languages |
JP4183311B2 (en) | 1997-12-22 | 2008-11-19 | 株式会社リコー | Document annotation method, annotation device, and recording medium |
US6185558B1 (en) | 1998-03-03 | 2001-02-06 | Amazon.Com, Inc. | Identifying the items most relevant to a current query based on items selected in connection with similar queries |
JP3664874B2 (en) * | 1998-03-28 | 2005-06-29 | 松下電器産業株式会社 | Document search device |
JPH11293535A (en) * | 1998-04-10 | 1999-10-26 | Mitsubishi Rayon Co Ltd | Manufacture of heat-fusible composite fiber |
US6638314B1 (en) | 1998-06-26 | 2003-10-28 | Microsoft Corporation | Method of web crawling utilizing crawl numbers |
US6363377B1 (en) | 1998-07-30 | 2002-03-26 | Sarnoff Corporation | Search data processor |
US6377949B1 (en) * | 1998-09-18 | 2002-04-23 | Tacit Knowledge Systems, Inc. | Method and apparatus for assigning a confidence level to a term within a user knowledge profile |
US6366911B1 (en) * | 1998-09-28 | 2002-04-02 | International Business Machines Corporation | Partitioning of sorted lists (containing duplicate entries) for multiprocessors sort and merge |
US6415283B1 (en) * | 1998-10-13 | 2002-07-02 | Orack Corporation | Methods and apparatus for determining focal points of clusters in a tree structure |
US7058589B1 (en) * | 1998-12-17 | 2006-06-06 | Iex Corporation | Method and system for employee work scheduling |
US6862710B1 (en) * | 1999-03-23 | 2005-03-01 | Insightful Corporation | Internet navigation using soft hyperlinks |
JP4021583B2 (en) * | 1999-04-08 | 2007-12-12 | 富士通株式会社 | Information search apparatus, information search method, and recording medium storing program for realizing the method |
US6430539B1 (en) | 1999-05-06 | 2002-08-06 | Hnc Software | Predictive modeling of consumer financial behavior |
US6175830B1 (en) * | 1999-05-20 | 2001-01-16 | Evresearch, Ltd. | Information management, retrieval and display system and associated method |
US7089236B1 (en) * | 1999-06-24 | 2006-08-08 | Search 123.Com, Inc. | Search engine interface |
US6601026B2 (en) * | 1999-09-17 | 2003-07-29 | Discern Communications, Inc. | Information retrieval by natural language querying |
US6996775B1 (en) * | 1999-10-29 | 2006-02-07 | Verizon Laboratories Inc. | Hypervideo: information retrieval using time-related multimedia: |
US6751612B1 (en) | 1999-11-29 | 2004-06-15 | Xerox Corporation | User query generate search results that rank set of servers where ranking is based on comparing content on each server with user query, frequency at which content on each server is altered using web crawler in a search engine |
US6684183B1 (en) | 1999-12-06 | 2004-01-27 | Comverse Ltd. | Generic natural language service creation environment |
US6785671B1 (en) * | 1999-12-08 | 2004-08-31 | Amazon.Com, Inc. | System and method for locating web-based product offerings |
US6963867B2 (en) * | 1999-12-08 | 2005-11-08 | A9.Com, Inc. | Search query processing to provide category-ranked presentation of search results |
US6772150B1 (en) | 1999-12-10 | 2004-08-03 | Amazon.Com, Inc. | Search query refinement using related search phrases |
AU4517501A (en) | 1999-12-10 | 2001-06-18 | Amazon.Com, Inc. | Search query refinement using related search phrases |
CA2293064C (en) * | 1999-12-22 | 2004-05-04 | Ibm Canada Limited-Ibm Canada Limitee | Method and apparatus for analyzing data retrieval using index scanning |
US6981040B1 (en) * | 1999-12-28 | 2005-12-27 | Utopy, Inc. | Automatic, personalized online information and product services |
US6820237B1 (en) * | 2000-01-21 | 2004-11-16 | Amikanow! Corporation | Apparatus and method for context-based highlighting of an electronic document |
US6883135B1 (en) | 2000-01-28 | 2005-04-19 | Microsoft Corporation | Proxy server using a statistical model |
US6654739B1 (en) * | 2000-01-31 | 2003-11-25 | International Business Machines Corporation | Lightweight document clustering |
US6571240B1 (en) * | 2000-02-02 | 2003-05-27 | Chi Fai Ho | Information processing for searching categorizing information in a document based on a categorization hierarchy and extracted phrases |
JP2001303279A (en) * | 2000-02-17 | 2001-10-31 | Toyo Gurahoiru:Kk | Self-sacrificial metal corrosion preventive agent and metal corrosion preventive method |
US7137065B1 (en) * | 2000-02-24 | 2006-11-14 | International Business Machines Corporation | System and method for classifying electronically posted documents |
US20060143714A1 (en) | 2000-03-09 | 2006-06-29 | Pkware, Inc. | System and method for manipulating and managing computer archive files |
US6859800B1 (en) * | 2000-04-26 | 2005-02-22 | Global Information Research And Technologies Llc | System for fulfilling an information need |
EP1352338A2 (en) | 2000-05-11 | 2003-10-15 | University Of Southern California | Machine translation techniques |
US6691106B1 (en) * | 2000-05-23 | 2004-02-10 | Intel Corporation | Profile driven instant web portal |
US7096220B1 (en) | 2000-05-24 | 2006-08-22 | Reachforce, Inc. | Web-based customer prospects harvester system |
US20020042707A1 (en) | 2000-06-19 | 2002-04-11 | Gang Zhao | Grammar-packaged parsing |
US20020078090A1 (en) | 2000-06-30 | 2002-06-20 | Hwang Chung Hee | Ontological concept-based, user-centric text summarization |
EP1182577A1 (en) | 2000-08-18 | 2002-02-27 | SER Systeme AG Produkte und Anwendungen der Datenverarbeitung | Associative memory |
KR100426382B1 (en) | 2000-08-23 | 2004-04-08 | 학교법인 김포대학 | Method for re-adjusting ranking document based cluster depending on entropy information and Bayesian SOM(Self Organizing feature Map) |
US7017114B2 (en) | 2000-09-20 | 2006-03-21 | International Business Machines Corporation | Automatic correlation method for generating summaries for text documents |
US20020143524A1 (en) | 2000-09-29 | 2002-10-03 | Lingomotors, Inc. | Method and resulting system for integrating a query reformation module onto an information retrieval system |
US20020065857A1 (en) | 2000-10-04 | 2002-05-30 | Zbigniew Michalewicz | System and method for analysis and clustering of documents for search engine |
CA2322599A1 (en) | 2000-10-06 | 2002-04-06 | Ibm Canada Limited-Ibm Canada Limitee | System and method for workflow control of contractual activities |
JP2002132789A (en) | 2000-10-19 | 2002-05-10 | Hitachi Ltd | Document retrieving method |
US7130790B1 (en) | 2000-10-24 | 2006-10-31 | Global Translations, Inc. | System and method for closed caption data translation |
JP2002169834A (en) | 2000-11-20 | 2002-06-14 | Hewlett Packard Co <Hp> | Computer and method for making vector analysis of document |
US20020091671A1 (en) | 2000-11-23 | 2002-07-11 | Andreas Prokoph | Method and system for data retrieval in large collections of data |
KR20020045343A (en) | 2000-12-08 | 2002-06-19 | 오길록 | Method of information generation and retrieval system based on a standardized Representation format of sentences structures and meanings |
JP2002207760A (en) | 2001-01-10 | 2002-07-26 | Hitachi Ltd | Document retrieval method, executing device thereof, and storage medium with its processing program stored therein |
US6778980B1 (en) | 2001-02-22 | 2004-08-17 | Drugstore.Com | Techniques for improved searching of electronically stored information |
US6741984B2 (en) * | 2001-02-23 | 2004-05-25 | General Electric Company | Method, system and storage medium for arranging a database |
US6697793B2 (en) * | 2001-03-02 | 2004-02-24 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | System, method and apparatus for generating phrases from a database |
US6741981B2 (en) * | 2001-03-02 | 2004-05-25 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration (Nasa) | System, method and apparatus for conducting a phrase search |
US6721728B2 (en) * | 2001-03-02 | 2004-04-13 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | System, method and apparatus for discovering phrases in a database |
US6823333B2 (en) * | 2001-03-02 | 2004-11-23 | The United States Of America As Represented By The Administrator Of The National Aeronautics And Space Administration | System, method and apparatus for conducting a keyterm search |
US7194483B1 (en) | 2001-05-07 | 2007-03-20 | Intelligenxia, Inc. | Method, system, and computer program product for concept-based multi-dimensional analysis of unstructured information |
US7171619B1 (en) | 2001-07-05 | 2007-01-30 | Sun Microsystems, Inc. | Methods and apparatus for accessing document content |
US6769016B2 (en) | 2001-07-26 | 2004-07-27 | Networks Associates Technology, Inc. | Intelligent SPAM detection system using an updateable neural analysis engine |
US20030130993A1 (en) | 2001-08-08 | 2003-07-10 | Quiver, Inc. | Document categorization engine |
US20030031996A1 (en) | 2001-08-08 | 2003-02-13 | Adam Robinson | Method and system for evaluating documents |
US6778979B2 (en) | 2001-08-13 | 2004-08-17 | Xerox Corporation | System for automatically generating queries |
US6978274B1 (en) | 2001-08-31 | 2005-12-20 | Attenex Corporation | System and method for dynamically evaluating latent concepts in unstructured documents |
JP2003242176A (en) | 2001-12-13 | 2003-08-29 | Sony Corp | Information processing device and method, recording medium and program |
US7356527B2 (en) | 2001-12-19 | 2008-04-08 | International Business Machines Corporation | Lossy index compression |
US6741982B2 (en) * | 2001-12-19 | 2004-05-25 | Cognos Incorporated | System and method for retrieving data from a database system |
US7243092B2 (en) | 2001-12-28 | 2007-07-10 | Sap Ag | Taxonomy generation for electronic documents |
US7137062B2 (en) | 2001-12-28 | 2006-11-14 | International Business Machines Corporation | System and method for hierarchical segmentation with latent semantic indexing in scale space |
JP4108337B2 (en) * | 2002-01-10 | 2008-06-25 | 三菱電機株式会社 | Electronic filing system and search index creation method thereof |
US7139756B2 (en) * | 2002-01-22 | 2006-11-21 | International Business Machines Corporation | System and method for detecting duplicate and similar documents |
US7028045B2 (en) | 2002-01-25 | 2006-04-11 | International Business Machines Corporation | Compressing index files in information retrieval |
US7421660B2 (en) | 2003-02-04 | 2008-09-02 | Cataphora, Inc. | Method and apparatus to visually present discussions for data mining purposes |
JP4092933B2 (en) | 2002-03-20 | 2008-05-28 | 富士ゼロックス株式会社 | Document information retrieval apparatus and document information retrieval program |
US7743045B2 (en) | 2005-08-10 | 2010-06-22 | Google Inc. | Detecting spam related and biased contexts for programmable search engines |
US20030195937A1 (en) | 2002-04-16 | 2003-10-16 | Kontact Software Inc. | Intelligent message screening |
US6877001B2 (en) | 2002-04-25 | 2005-04-05 | Mitsubishi Electric Research Laboratories, Inc. | Method and system for retrieving documents with spoken queries |
NZ518744A (en) | 2002-05-03 | 2004-08-27 | Hyperbolex Ltd | Electronic document indexing using word use nodes, node objects and link objects |
US7085771B2 (en) | 2002-05-17 | 2006-08-01 | Verity, Inc | System and method for automatically discovering a hierarchy of concepts from a corpus of documents |
US7028026B1 (en) * | 2002-05-28 | 2006-04-11 | Ask Jeeves, Inc. | Relevancy-based database retrieval and display techniques |
JP4452012B2 (en) * | 2002-07-04 | 2010-04-21 | ヒューレット・パッカード・カンパニー | Document uniqueness evaluation method |
JP2004046438A (en) | 2002-07-10 | 2004-02-12 | Nippon Telegr & Teleph Corp <Ntt> | Text retrieval method and device, text retrieval program and storage medium storing text retrieval program |
US7379978B2 (en) | 2002-07-19 | 2008-05-27 | Fiserv Incorporated | Electronic item management and archival system and method of operating the same |
US20040034633A1 (en) | 2002-08-05 | 2004-02-19 | Rickard John Terrell | Data search system and method using mutual subsethood measures |
US7151864B2 (en) | 2002-09-18 | 2006-12-19 | Hewlett-Packard Development Company, L.P. | Information research initiated from a scanned image media |
US7158983B2 (en) | 2002-09-23 | 2007-01-02 | Battelle Memorial Institute | Text analysis technique |
US20040064442A1 (en) | 2002-09-27 | 2004-04-01 | Popovitch Steven Gregory | Incremental search engine |
US6886010B2 (en) | 2002-09-30 | 2005-04-26 | The United States Of America As Represented By The Secretary Of The Navy | Method for data and text mining and literature-based discovery |
JP2004139150A (en) | 2002-10-15 | 2004-05-13 | Ricoh Co Ltd | Document search system, program, and storage medium |
US7970832B2 (en) | 2002-11-20 | 2011-06-28 | Return Path, Inc. | Electronic message delivery with estimation approaches and complaint, bond, and statistics panels |
JP2004192546A (en) * | 2002-12-13 | 2004-07-08 | Nippon Telegr & Teleph Corp <Ntt> | Information retrieval method, device, program, and recording medium |
US20040133560A1 (en) | 2003-01-07 | 2004-07-08 | Simske Steven J. | Methods and systems for organizing electronic documents |
US7725544B2 (en) | 2003-01-24 | 2010-05-25 | Aol Inc. | Group based spam classification |
GB2399427A (en) * | 2003-03-12 | 2004-09-15 | Canon Kk | Apparatus for and method of summarising text |
US7945567B2 (en) * | 2003-03-17 | 2011-05-17 | Hewlett-Packard Development Company, L.P. | Storing and/or retrieving a document within a knowledge base or document repository |
US6947930B2 (en) | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
US7051023B2 (en) | 2003-04-04 | 2006-05-23 | Yahoo! Inc. | Systems and methods for generating concept units from search queries |
US7149748B1 (en) | 2003-05-06 | 2006-12-12 | Sap Ag | Expanded inverted index |
US7146361B2 (en) * | 2003-05-30 | 2006-12-05 | International Business Machines Corporation | System, method and computer program product for performing unstructured information management and automatic text analysis, including a search operator functioning as a Weighted AND (WAND) |
JP2007528520A (en) | 2003-05-31 | 2007-10-11 | エヌエイチエヌ コーポレーション | Method and system for managing websites registered with search engines |
US7272853B2 (en) | 2003-06-04 | 2007-09-18 | Microsoft Corporation | Origination/destination features and lists for spam prevention |
US7051014B2 (en) * | 2003-06-18 | 2006-05-23 | Microsoft Corporation | Utilizing information redundancy to improve text searches |
US7162473B2 (en) | 2003-06-26 | 2007-01-09 | Microsoft Corporation | Method and system for usage analyzer that determines user accessed sources, indexes data subsets, and associated metadata, processing implicit queries based on potential interest to users |
US8707312B1 (en) | 2003-07-03 | 2014-04-22 | Google Inc. | Document reuse in a search engine crawler |
US7254580B1 (en) * | 2003-07-31 | 2007-08-07 | Google Inc. | System and method for selectively searching partitions of a database |
JP2005056233A (en) | 2003-08-06 | 2005-03-03 | Nec Corp | Mobile communication device, and method and program for operating it for reception of email |
US20050043940A1 (en) | 2003-08-20 | 2005-02-24 | Marvin Elder | Preparing a data source for a natural language query |
US20050060295A1 (en) | 2003-09-12 | 2005-03-17 | Sensory Networks, Inc. | Statistical classification of high-speed network data through content inspection |
US7346839B2 (en) | 2003-09-30 | 2008-03-18 | Google Inc. | Information retrieval based on historical data |
US20050071310A1 (en) | 2003-09-30 | 2005-03-31 | Nadav Eiron | System, method, and computer program product for identifying multi-page documents in hypertext collections |
US20050071328A1 (en) | 2003-09-30 | 2005-03-31 | Lawrence Stephen R. | Personalization of web search |
US7257564B2 (en) | 2003-10-03 | 2007-08-14 | Tumbleweed Communications Corp. | Dynamic message filtering |
US7240064B2 (en) * | 2003-11-10 | 2007-07-03 | Overture Services, Inc. | Search engine with hierarchically stored indices |
US20050154723A1 (en) * | 2003-12-29 | 2005-07-14 | Ping Liang | Advanced search, file system, and intelligent assistant agent |
US7206389B1 (en) * | 2004-01-07 | 2007-04-17 | Nuance Communications, Inc. | Method and apparatus for generating a speech-recognition-based call-routing system |
US20060294124A1 (en) | 2004-01-12 | 2006-12-28 | Junghoo Cho | Unbiased page ranking |
US7310632B2 (en) | 2004-02-12 | 2007-12-18 | Microsoft Corporation | Decision-theoretic web-crawling and predicting web-page change |
US20050198559A1 (en) | 2004-03-08 | 2005-09-08 | Kabushiki Kaisha Toshiba | Document information management system, document information management program, and document information management method |
US20050216564A1 (en) | 2004-03-11 | 2005-09-29 | Myers Gregory K | Method and apparatus for analysis of electronic communications containing imagery |
US20050256848A1 (en) * | 2004-05-13 | 2005-11-17 | International Business Machines Corporation | System and method for user rank search |
WO2006002076A2 (en) | 2004-06-15 | 2006-01-05 | Tekelec | Methods, systems, and computer program products for content-based screening of messaging service messages |
JP2006026844A (en) * | 2004-07-20 | 2006-02-02 | Fujitsu Ltd | Polishing pad, polishing device provided with it and sticking device |
US7580929B2 (en) | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase-based personalization of searches in an information retrieval system |
US7702618B1 (en) | 2004-07-26 | 2010-04-20 | Google Inc. | Information retrieval system for archiving multiple document versions |
US7580921B2 (en) | 2004-07-26 | 2009-08-25 | Google Inc. | Phrase identification in an information retrieval system |
US7426507B1 (en) | 2004-07-26 | 2008-09-16 | Google, Inc. | Automatic taxonomy generation in search results using phrases |
US7536408B2 (en) | 2004-07-26 | 2009-05-19 | Google Inc. | Phrase-based indexing in an information retrieval system |
US7584175B2 (en) | 2004-07-26 | 2009-09-01 | Google Inc. | Phrase-based generation of document descriptions |
US7711679B2 (en) * | 2004-07-26 | 2010-05-04 | Google Inc. | Phrase-based detection of duplicate documents in an information retrieval system |
US7567959B2 (en) | 2004-07-26 | 2009-07-28 | Google Inc. | Multiple index based information retrieval system |
US7599914B2 (en) | 2004-07-26 | 2009-10-06 | Google Inc. | Phrase-based searching in an information retrieval system |
US7395260B2 (en) | 2004-08-04 | 2008-07-01 | International Business Machines Corporation | Method for providing graphical representations of search results in multiple related histograms |
US8407239B2 (en) * | 2004-08-13 | 2013-03-26 | Google Inc. | Multi-stage query processing system and method for use with tokenspace repository |
US8504565B2 (en) | 2004-09-09 | 2013-08-06 | William M. Pitts | Full text search capabilities integrated into distributed file systems— incrementally indexing files |
US20060200464A1 (en) | 2005-03-03 | 2006-09-07 | Microsoft Corporation | Method and system for generating a document summary |
WO2006113597A2 (en) | 2005-04-14 | 2006-10-26 | The Regents Of The University Of California | Method for information retrieval |
US7552230B2 (en) | 2005-06-15 | 2009-06-23 | International Business Machines Corporation | Method and apparatus for reducing spam on peer-to-peer networks |
US20080005064A1 (en) | 2005-06-28 | 2008-01-03 | Yahoo! Inc. | Apparatus and method for content annotation and conditional annotation retrieval in a search context |
US7512596B2 (en) | 2005-08-01 | 2009-03-31 | Business Objects Americas | Processor for fast phrase searching |
US7454449B2 (en) * | 2005-12-20 | 2008-11-18 | International Business Machines Corporation | Method for reorganizing a set of database partitions |
JP2007262217A (en) | 2006-03-28 | 2007-10-11 | Toray Ind Inc | Polyphenylene sulfide resin composition and molded article composed thereof |
WO2007123919A2 (en) | 2006-04-18 | 2007-11-01 | Gemini Design Technology, Inc. | Method for ranking webpages via circuit simulation |
US8166045B1 (en) | 2007-03-30 | 2012-04-24 | Google Inc. | Phrase extraction using subphrase scoring |
JP2008305730A (en) | 2007-06-11 | 2008-12-18 | Fuji Electric Holdings Co Ltd | Manufacturing method for multicolor light-emitting device |
US8117223B2 (en) | 2007-09-07 | 2012-02-14 | Google Inc. | Integrating external related phrase information into a phrase-based indexing information retrieval system |
-
2005
- 2005-01-25 US US11/043,695 patent/US7567959B2/en not_active Expired - Fee Related
-
2006
- 2006-01-25 WO PCT/US2006/002709 patent/WO2006081325A2/en active Application Filing
- 2006-01-25 CA CA2595674A patent/CA2595674C/en not_active Expired - Fee Related
- 2006-01-25 DK DK06719537.0T patent/DK1844391T3/en active
- 2006-01-25 EP EP06719537A patent/EP1844391B1/en active Active
- 2006-01-25 AU AU2006208079A patent/AU2006208079B2/en not_active Ceased
- 2006-01-25 KR KR1020077018720A patent/KR101273520B1/en active IP Right Grant
- 2006-01-25 BR BRPI0614024-6A patent/BRPI0614024B1/en active IP Right Grant
- 2006-01-25 CN CN200680007173XA patent/CN101133388B/en active Active
- 2006-01-25 JP JP2007552403A patent/JP4881322B2/en active Active
-
2007
- 2007-08-24 NO NO20074329A patent/NO338518B1/en not_active IP Right Cessation
-
2009
- 2009-07-20 US US12/506,088 patent/US8560550B2/en active Active
-
2010
- 2010-02-09 AU AU2010200478A patent/AU2010200478B2/en active Active
-
2013
- 2013-03-13 US US13/801,108 patent/US9361331B2/en active Active
-
2016
- 2016-06-03 US US15/172,717 patent/US9817825B2/en active Active
-
2017
- 2017-11-10 US US15/809,356 patent/US10671676B2/en active Active
Also Published As
Publication number | Publication date |
---|---|
US20060106792A1 (en) | 2006-05-18 |
BRPI0614024A2 (en) | 2012-12-25 |
EP1844391A2 (en) | 2007-10-17 |
CN101133388A (en) | 2008-02-27 |
US9361331B2 (en) | 2016-06-07 |
KR20070094972A (en) | 2007-09-27 |
US7567959B2 (en) | 2009-07-28 |
AU2006208079A1 (en) | 2006-08-03 |
US20160283474A1 (en) | 2016-09-29 |
NO20074329L (en) | 2007-10-23 |
AU2010200478B2 (en) | 2012-10-04 |
WO2006081325A2 (en) | 2006-08-03 |
DK1844391T3 (en) | 2013-01-28 |
CA2595674C (en) | 2012-07-03 |
CN101133388B (en) | 2011-07-06 |
AU2010200478A1 (en) | 2010-03-04 |
JP4881322B2 (en) | 2012-02-22 |
US8560550B2 (en) | 2013-10-15 |
US9817825B2 (en) | 2017-11-14 |
US20180101528A1 (en) | 2018-04-12 |
KR101273520B1 (en) | 2013-06-14 |
US10671676B2 (en) | 2020-06-02 |
NO338518B1 (en) | 2016-08-29 |
EP1844391A4 (en) | 2010-05-19 |
US20100030773A1 (en) | 2010-02-04 |
WO2006081325A3 (en) | 2007-08-09 |
EP1844391B1 (en) | 2012-10-17 |
JP2008529138A (en) | 2008-07-31 |
US20140095511A1 (en) | 2014-04-03 |
AU2006208079B2 (en) | 2009-11-26 |
BRPI0614024B1 (en) | 2018-02-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2595674A1 (en) | Multiple index based information retrieval system | |
Wilkinson | Effective retrieval of structured documents | |
Sun | Short text classification using very few words | |
US8832057B2 (en) | Results returned for list-seeking queries | |
RU2011130218A (en) | SYSTEM AND METHOD OF DATA AGREEMENT FROM MANY WEBSITES | |
US20140365499A1 (en) | System and Method for Determining Concepts in a Content Item Using Context | |
CA2617538A1 (en) | Processor for fast phrase searching | |
KR20140093762A (en) | Method, apparatus, and computer storage medium for automatically adding tags to document | |
CN111026710A (en) | Data set retrieval method and system | |
CN105843960B (en) | Indexing method and system based on semantic tree | |
CN102663030B (en) | Double-hash table association method for inquiring interval durability top-k | |
Elsas et al. | Retrieval and feedback models for blog distillation | |
Yu et al. | Collective POI querying based on multiple keywords and user preference | |
US20210064641A1 (en) | Methods for indexing and retrieving text | |
CN105868406A (en) | Multi-database based patent retrieval system | |
Kamps et al. | Using anchor text, spam filtering and wikipedia for web search and entity ranking | |
Suchomel et al. | Improving synoptic querying for source retrieval | |
Puppin et al. | The query-vector document model | |
Yokomoto et al. | Utilizing Wikipedia in categorizing topic related blogs into facets | |
JP5633552B2 (en) | Document search method, document search device, and recording medium recording document search program | |
Martinovic et al. | Vector model improvement using suffix trees | |
Zheng et al. | University of Delaware at Diverstiy Task of Web Track 2010. | |
Chuang et al. | Improving the effectiveness of POI search by associated information summarization | |
Tollari et al. | Consortium AVEIR at ImageCLEFphoto 2008: on the Fusion of Runs. | |
Hsu et al. | National Taiwan University at Terabyte Track of TREC 2005. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
EEER | Examination request | ||
MKLA | Lapsed |
Effective date: 20220125 |