US7158961B1 - Methods and apparatus for estimating similarity - Google Patents
Methods and apparatus for estimating similarity Download PDFInfo
- Publication number
- US7158961B1 US7158961B1 US10/029,883 US2988301A US7158961B1 US 7158961 B1 US7158961 B1 US 7158961B1 US 2988301 A US2988301 A US 2988301A US 7158961 B1 US7158961 B1 US 7158961B1
- Authority
- US
- United States
- Prior art keywords
- vector
- computer
- implemented method
- generating
- hashing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Lifetime, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3347—Query execution using vector based model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99932—Access augmentation or optimizing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99943—Generating database or data structure, e.g. via user interface
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
Another possibility is to choose coordinates to be either +1 or −1 with equal probability. A hash function could be used to map input vector coordinates to hashing vectors directly. For example, a 64-bit hash value obtained from a hash function could be mapped to a 64-dimensional hashing vector by choosing the ith coordinate of the hashing vector to be +1 or −1 based on whether the ith bit in the hash value is 1 or 0.
where {right arrow over (R)} is the result vector, c1 is the weight value for the ith coordinate (a scalar), {right arrow over (v)}i is the predetermined hashing vector for the ith coordinate in the object vector, and the sum is taken over all the possible coordinates in the object vector.
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/029,883 US7158961B1 (en) | 2001-12-31 | 2001-12-31 | Methods and apparatus for estimating similarity |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10/029,883 US7158961B1 (en) | 2001-12-31 | 2001-12-31 | Methods and apparatus for estimating similarity |
Publications (1)
Publication Number | Publication Date |
---|---|
US7158961B1 true US7158961B1 (en) | 2007-01-02 |
Family
ID=37592388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/029,883 Expired - Lifetime US7158961B1 (en) | 2001-12-31 | 2001-12-31 | Methods and apparatus for estimating similarity |
Country Status (1)
Country | Link |
---|---|
US (1) | US7158961B1 (en) |
Cited By (51)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040111708A1 (en) * | 2002-09-09 | 2004-06-10 | The Regents Of The University Of California | Method and apparatus for identifying similar regions of a program's execution |
US20040267596A1 (en) * | 2003-06-25 | 2004-12-30 | Lind Jesper B. | Systems and methods for improving collaborative filtering |
US20050039086A1 (en) * | 2003-08-14 | 2005-02-17 | Balachander Krishnamurthy | Method and apparatus for sketch-based detection of changes in network traffic |
US20060101060A1 (en) * | 2004-11-08 | 2006-05-11 | Kai Li | Similarity search system with compact data structures |
US20060248063A1 (en) * | 2005-04-18 | 2006-11-02 | Raz Gordon | System and method for efficiently tracking and dating content in very large dynamic document spaces |
US20070016580A1 (en) * | 2005-07-15 | 2007-01-18 | International Business Machines Corporation | Extracting information about references to entities rom a plurality of electronic documents |
US20070043757A1 (en) * | 2005-08-17 | 2007-02-22 | Microsoft Corporation | Storage reports duplicate file detection |
US20070211762A1 (en) * | 2006-03-07 | 2007-09-13 | Samsung Electronics Co., Ltd. | Method and system for integrating content and services among multiple networks |
US20070214123A1 (en) * | 2006-03-07 | 2007-09-13 | Samsung Electronics Co., Ltd. | Method and system for providing a user interface application and presenting information thereon |
US7296031B1 (en) * | 2001-05-30 | 2007-11-13 | Microsoft Corporation | Auto playlist generator |
US20080133504A1 (en) * | 2006-12-04 | 2008-06-05 | Samsung Electronics Co., Ltd. | Method and apparatus for contextual search and query refinement on consumer electronics devices |
US20080183698A1 (en) * | 2006-03-07 | 2008-07-31 | Samsung Electronics Co., Ltd. | Method and system for facilitating information searching on electronic devices |
US20080235393A1 (en) * | 2007-03-21 | 2008-09-25 | Samsung Electronics Co., Ltd. | Framework for corrrelating content on a local network with information on an external network |
US20080235209A1 (en) * | 2007-03-20 | 2008-09-25 | Samsung Electronics Co., Ltd. | Method and apparatus for search result snippet analysis for query expansion and result filtering |
US20080243837A1 (en) * | 2001-06-27 | 2008-10-02 | Business Objects Americas | Method and apparatus for duplicate detection |
US20080266449A1 (en) * | 2007-04-25 | 2008-10-30 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
US20080288641A1 (en) * | 2007-05-15 | 2008-11-20 | Samsung Electronics Co., Ltd. | Method and system for providing relevant information to a user of a device in a local network |
US20090055393A1 (en) * | 2007-01-29 | 2009-02-26 | Samsung Electronics Co., Ltd. | Method and system for facilitating information searching on electronic devices based on metadata information |
US20090063470A1 (en) * | 2007-08-28 | 2009-03-05 | Nogacom Ltd. | Document management using business objects |
US20090100018A1 (en) * | 2007-10-12 | 2009-04-16 | Jonathan Roberts | System and method for capturing, integrating, discovering, and using geo-temporal data |
US20090132571A1 (en) * | 2007-11-16 | 2009-05-21 | Microsoft Corporation | Efficient use of randomness in min-hashing |
US20100070509A1 (en) * | 2008-08-15 | 2010-03-18 | Kai Li | System And Method For High-Dimensional Similarity Search |
US20100070511A1 (en) * | 2008-09-17 | 2010-03-18 | Microsoft Corporation | Reducing use of randomness in consistent uniform hashing |
US20100070895A1 (en) * | 2008-09-10 | 2010-03-18 | Samsung Electronics Co., Ltd. | Method and system for utilizing packaged content sources to identify and provide information based on contextual information |
US20100153387A1 (en) * | 2008-12-12 | 2010-06-17 | At&T Intellectual Property I, L.P. | Methods and systems for estimate derivation |
US20100250474A1 (en) * | 2009-03-27 | 2010-09-30 | Bank Of America Corporation | Predictive coding of documents in an electronic discovery system |
US20110072013A1 (en) * | 2009-09-23 | 2011-03-24 | Adobe Systems Incorporated | Algorithm and implementation for fast computation of content recommendations |
US20110087668A1 (en) * | 2009-10-09 | 2011-04-14 | Stratify, Inc. | Clustering of near-duplicate documents |
US20110087669A1 (en) * | 2009-10-09 | 2011-04-14 | Stratify, Inc. | Composite locality sensitive hash based processing of documents |
WO2011126489A1 (en) * | 2010-04-09 | 2011-10-13 | Hewlett-Packard Development Company, L.P. | Method and system for comparing and locating projects |
US8115869B2 (en) | 2007-02-28 | 2012-02-14 | Samsung Electronics Co., Ltd. | Method and system for extracting relevant information from content metadata |
US20120089397A1 (en) * | 2010-10-12 | 2012-04-12 | Nec Informatec Systems, Ltd. | Language model generating device, method thereof, and recording medium storing program thereof |
US8176068B2 (en) | 2007-10-31 | 2012-05-08 | Samsung Electronics Co., Ltd. | Method and system for suggesting search queries on electronic devices |
US20120215803A1 (en) * | 2011-02-22 | 2012-08-23 | International Business Machines Corporation | Aggregate contribution of iceberg queries |
US8423541B1 (en) | 2005-03-31 | 2013-04-16 | Google Inc. | Using saved search results for quality feedback |
US20130159275A1 (en) * | 2011-12-14 | 2013-06-20 | Hon Hai Precision Industry Co., Ltd. | Information searching system and method |
US8594239B2 (en) | 2011-02-21 | 2013-11-26 | Microsoft Corporation | Estimating document similarity using bit-strings |
US8661341B1 (en) | 2011-01-19 | 2014-02-25 | Google, Inc. | Simhash based spell correction |
US20140207782A1 (en) * | 2013-01-22 | 2014-07-24 | Equivio Ltd. | System and method for computerized semantic processing of electronic documents including themes |
US9286385B2 (en) | 2007-04-25 | 2016-03-15 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
US9298757B1 (en) | 2013-03-13 | 2016-03-29 | International Business Machines Corporation | Determining similarity of linguistic objects |
US9330358B1 (en) * | 2013-01-04 | 2016-05-03 | The United States Of America As Represented By The Secretary Of The Navy | Case-based reasoning system using normalized weight vectors |
WO2018106663A1 (en) * | 2016-12-06 | 2018-06-14 | Ebay Inc. | Anchored search |
US20180268025A1 (en) * | 2017-03-20 | 2018-09-20 | International Business Machines Corporation | Numeric data type support for cognitive intelligence queries |
US11003567B2 (en) * | 2017-12-06 | 2021-05-11 | International Business Machines Corporation | Method and apparatus for test modeling |
US11080273B2 (en) * | 2017-03-20 | 2021-08-03 | International Business Machines Corporation | Image support for cognitive intelligence queries |
US11170759B2 (en) * | 2018-12-31 | 2021-11-09 | Verint Systems UK Limited | System and method for discriminating removing boilerplate text in documents comprising structured labelled text elements |
US11379506B2 (en) * | 2014-09-26 | 2022-07-05 | Oracle International Corporation | Techniques for similarity analysis and data enrichment using knowledge sources |
US11500880B2 (en) | 2017-09-29 | 2022-11-15 | Oracle International Corporation | Adaptive recommendations |
US20230185788A1 (en) * | 2021-12-09 | 2023-06-15 | International Business Machines Corporation | Semantic indices for accelerating semantic queries on databases |
US11693549B2 (en) | 2014-09-26 | 2023-07-04 | Oracle International Corporation | Declarative external data source importation, exportation, and metadata reflection utilizing HTTP and HDFS protocols |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5067152A (en) * | 1989-01-30 | 1991-11-19 | Information Technologies Research, Inc. | Method and apparatus for vector quantization |
US5101475A (en) * | 1989-04-17 | 1992-03-31 | The Research Foundation Of State University Of New York | Method and apparatus for generating arbitrary projections of three-dimensional voxel-based data |
US5469354A (en) * | 1989-06-14 | 1995-11-21 | Hitachi, Ltd. | Document data processing method and apparatus for document retrieval |
US5612865A (en) * | 1995-06-01 | 1997-03-18 | Ncr Corporation | Dynamic hashing method for optimal distribution of locks within a clustered system |
US5794178A (en) * | 1993-09-20 | 1998-08-11 | Hnc Software, Inc. | Visualization of information using graphical representations of context vector based relationships and attributes |
US5806061A (en) * | 1997-05-20 | 1998-09-08 | Hewlett-Packard Company | Method for cost-based optimization over multimeida repositories |
US6061734A (en) * | 1997-09-24 | 2000-05-09 | At&T Corp | System and method for determining if a message identifier could be equivalent to one of a set of predetermined indentifiers |
US6134532A (en) * | 1997-11-14 | 2000-10-17 | Aptex Software, Inc. | System and method for optimal adaptive matching of users to most relevant entity and information in real-time |
US6349296B1 (en) * | 1998-03-26 | 2002-02-19 | Altavista Company | Method for clustering closely resembling data objects |
US6603470B1 (en) * | 1995-08-04 | 2003-08-05 | Sun Microsystems, Inc. | Compression of surface normals in three-dimensional graphics data |
-
2001
- 2001-12-31 US US10/029,883 patent/US7158961B1/en not_active Expired - Lifetime
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5067152A (en) * | 1989-01-30 | 1991-11-19 | Information Technologies Research, Inc. | Method and apparatus for vector quantization |
US5101475A (en) * | 1989-04-17 | 1992-03-31 | The Research Foundation Of State University Of New York | Method and apparatus for generating arbitrary projections of three-dimensional voxel-based data |
US5469354A (en) * | 1989-06-14 | 1995-11-21 | Hitachi, Ltd. | Document data processing method and apparatus for document retrieval |
US5794178A (en) * | 1993-09-20 | 1998-08-11 | Hnc Software, Inc. | Visualization of information using graphical representations of context vector based relationships and attributes |
US5612865A (en) * | 1995-06-01 | 1997-03-18 | Ncr Corporation | Dynamic hashing method for optimal distribution of locks within a clustered system |
US6603470B1 (en) * | 1995-08-04 | 2003-08-05 | Sun Microsystems, Inc. | Compression of surface normals in three-dimensional graphics data |
US5806061A (en) * | 1997-05-20 | 1998-09-08 | Hewlett-Packard Company | Method for cost-based optimization over multimeida repositories |
US6061734A (en) * | 1997-09-24 | 2000-05-09 | At&T Corp | System and method for determining if a message identifier could be equivalent to one of a set of predetermined indentifiers |
US6134532A (en) * | 1997-11-14 | 2000-10-17 | Aptex Software, Inc. | System and method for optimal adaptive matching of users to most relevant entity and information in real-time |
US6349296B1 (en) * | 1998-03-26 | 2002-02-19 | Altavista Company | Method for clustering closely resembling data objects |
Non-Patent Citations (9)
Title |
---|
"Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality (preliminary version)"; Piotr Indyk et al.; Department of Computer Science; Stanford University; Jul. 21, 1999; pp. 1-13 and i-vii. |
"Approximation Algorithms for Classification Problems with Pairwise Relationships: Metric Labeling and Markov Random Fields"; Jon Kleinberg et al. |
"Chapter 26-Improved approximation algorithms for network design problems"; M.X. Goemans et al.; pp. 223-232. |
"Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces"; Eyal Kushilevitz et al.; pp. 1-17. |
"On the resemblance and containment of documents"; Andrei Z. Broder; Digital Systems Research Center; Palo Alto, CA; pp. 1-9. |
"Scalable Techniques for Clustering the Web"; Taher H. Haveliwala et al. |
"Similarity Search in High Dimensions via Hashing"; Aristides Gionis et al.; Department of Computer Science; Stanford University; pp. 518-529; 1999. |
Moses Sampspn Charikar, "Algorithms for Clustering Problems", 2001, Stanford University, vol. 62/01-B of Dissertation Abstracts International. * |
SRC Technical Note; 1997-015; Jul. 25, 1997; "Syntactic Clustering of the Web"; Andrei Z. Broder et al.; pp. 1-14; Digital Equipment Corporation http://gatekeeper.dec.com/pub/DEC/SRC/technical-notes/SRC-1997-015-html/. |
Cited By (82)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7548934B1 (en) | 2001-05-30 | 2009-06-16 | Microsoft Corporation | Auto playlist generator |
US7296031B1 (en) * | 2001-05-30 | 2007-11-13 | Microsoft Corporation | Auto playlist generator |
US20080243837A1 (en) * | 2001-06-27 | 2008-10-02 | Business Objects Americas | Method and apparatus for duplicate detection |
US7899825B2 (en) * | 2001-06-27 | 2011-03-01 | SAP America, Inc. | Method and apparatus for duplicate detection |
US7802236B2 (en) * | 2002-09-09 | 2010-09-21 | The Regents Of The University Of California | Method and apparatus for identifying similar regions of a program's execution |
US20040111708A1 (en) * | 2002-09-09 | 2004-06-10 | The Regents Of The University Of California | Method and apparatus for identifying similar regions of a program's execution |
US7630916B2 (en) * | 2003-06-25 | 2009-12-08 | Microsoft Corporation | Systems and methods for improving collaborative filtering |
US20040267596A1 (en) * | 2003-06-25 | 2004-12-30 | Lind Jesper B. | Systems and methods for improving collaborative filtering |
US7751325B2 (en) * | 2003-08-14 | 2010-07-06 | At&T Intellectual Property Ii, L.P. | Method and apparatus for sketch-based detection of changes in network traffic |
US20050039086A1 (en) * | 2003-08-14 | 2005-02-17 | Balachander Krishnamurthy | Method and apparatus for sketch-based detection of changes in network traffic |
US20060101060A1 (en) * | 2004-11-08 | 2006-05-11 | Kai Li | Similarity search system with compact data structures |
US7966327B2 (en) | 2004-11-08 | 2011-06-21 | The Trustees Of Princeton University | Similarity search system with compact data structures |
US9031945B1 (en) | 2005-03-31 | 2015-05-12 | Google Inc. | Sharing and using search results |
US8423541B1 (en) | 2005-03-31 | 2013-04-16 | Google Inc. | Using saved search results for quality feedback |
US20060248063A1 (en) * | 2005-04-18 | 2006-11-02 | Raz Gordon | System and method for efficiently tracking and dating content in very large dynamic document spaces |
US20070016580A1 (en) * | 2005-07-15 | 2007-01-18 | International Business Machines Corporation | Extracting information about references to entities rom a plurality of electronic documents |
US20070043757A1 (en) * | 2005-08-17 | 2007-02-22 | Microsoft Corporation | Storage reports duplicate file detection |
US7401080B2 (en) * | 2005-08-17 | 2008-07-15 | Microsoft Corporation | Storage reports duplicate file detection |
US8200688B2 (en) | 2006-03-07 | 2012-06-12 | Samsung Electronics Co., Ltd. | Method and system for facilitating information searching on electronic devices |
US20080183698A1 (en) * | 2006-03-07 | 2008-07-31 | Samsung Electronics Co., Ltd. | Method and system for facilitating information searching on electronic devices |
US8863221B2 (en) | 2006-03-07 | 2014-10-14 | Samsung Electronics Co., Ltd. | Method and system for integrating content and services among multiple networks |
US20070214123A1 (en) * | 2006-03-07 | 2007-09-13 | Samsung Electronics Co., Ltd. | Method and system for providing a user interface application and presenting information thereon |
US20070211762A1 (en) * | 2006-03-07 | 2007-09-13 | Samsung Electronics Co., Ltd. | Method and system for integrating content and services among multiple networks |
US20080133504A1 (en) * | 2006-12-04 | 2008-06-05 | Samsung Electronics Co., Ltd. | Method and apparatus for contextual search and query refinement on consumer electronics devices |
US8935269B2 (en) | 2006-12-04 | 2015-01-13 | Samsung Electronics Co., Ltd. | Method and apparatus for contextual search and query refinement on consumer electronics devices |
US20090055393A1 (en) * | 2007-01-29 | 2009-02-26 | Samsung Electronics Co., Ltd. | Method and system for facilitating information searching on electronic devices based on metadata information |
US8782056B2 (en) | 2007-01-29 | 2014-07-15 | Samsung Electronics Co., Ltd. | Method and system for facilitating information searching on electronic devices |
US8115869B2 (en) | 2007-02-28 | 2012-02-14 | Samsung Electronics Co., Ltd. | Method and system for extracting relevant information from content metadata |
US20080235209A1 (en) * | 2007-03-20 | 2008-09-25 | Samsung Electronics Co., Ltd. | Method and apparatus for search result snippet analysis for query expansion and result filtering |
US8510453B2 (en) | 2007-03-21 | 2013-08-13 | Samsung Electronics Co., Ltd. | Framework for correlating content on a local network with information on an external network |
US20080235393A1 (en) * | 2007-03-21 | 2008-09-25 | Samsung Electronics Co., Ltd. | Framework for corrrelating content on a local network with information on an external network |
US9286385B2 (en) | 2007-04-25 | 2016-03-15 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
US20080266449A1 (en) * | 2007-04-25 | 2008-10-30 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
US8209724B2 (en) | 2007-04-25 | 2012-06-26 | Samsung Electronics Co., Ltd. | Method and system for providing access to information of potential interest to a user |
US8843467B2 (en) | 2007-05-15 | 2014-09-23 | Samsung Electronics Co., Ltd. | Method and system for providing relevant information to a user of a device in a local network |
US20080288641A1 (en) * | 2007-05-15 | 2008-11-20 | Samsung Electronics Co., Ltd. | Method and system for providing relevant information to a user of a device in a local network |
US20090063470A1 (en) * | 2007-08-28 | 2009-03-05 | Nogacom Ltd. | Document management using business objects |
US20090100018A1 (en) * | 2007-10-12 | 2009-04-16 | Jonathan Roberts | System and method for capturing, integrating, discovering, and using geo-temporal data |
US8176068B2 (en) | 2007-10-31 | 2012-05-08 | Samsung Electronics Co., Ltd. | Method and system for suggesting search queries on electronic devices |
US20090132571A1 (en) * | 2007-11-16 | 2009-05-21 | Microsoft Corporation | Efficient use of randomness in min-hashing |
US20100070509A1 (en) * | 2008-08-15 | 2010-03-18 | Kai Li | System And Method For High-Dimensional Similarity Search |
US8938465B2 (en) | 2008-09-10 | 2015-01-20 | Samsung Electronics Co., Ltd. | Method and system for utilizing packaged content sources to identify and provide information based on contextual information |
US20100070895A1 (en) * | 2008-09-10 | 2010-03-18 | Samsung Electronics Co., Ltd. | Method and system for utilizing packaged content sources to identify and provide information based on contextual information |
US20100070511A1 (en) * | 2008-09-17 | 2010-03-18 | Microsoft Corporation | Reducing use of randomness in consistent uniform hashing |
US8738618B2 (en) | 2008-12-12 | 2014-05-27 | At&T Intellectual Property I, L.P. | Methods and systems to estimate query responses based on data set sketches |
US9779142B2 (en) | 2008-12-12 | 2017-10-03 | At&T Intellectual Property I, L.P. | Methods and systems to estimate query responses based on data set sketches |
US20100153387A1 (en) * | 2008-12-12 | 2010-06-17 | At&T Intellectual Property I, L.P. | Methods and systems for estimate derivation |
US20100250474A1 (en) * | 2009-03-27 | 2010-09-30 | Bank Of America Corporation | Predictive coding of documents in an electronic discovery system |
US8504489B2 (en) * | 2009-03-27 | 2013-08-06 | Bank Of America Corporation | Predictive coding of documents in an electronic discovery system |
US20110072013A1 (en) * | 2009-09-23 | 2011-03-24 | Adobe Systems Incorporated | Algorithm and implementation for fast computation of content recommendations |
US8554764B2 (en) | 2009-09-23 | 2013-10-08 | Adobe Systems Incorporated | Algorithm and implementation for fast computation of content recommendations |
US20110087669A1 (en) * | 2009-10-09 | 2011-04-14 | Stratify, Inc. | Composite locality sensitive hash based processing of documents |
US20110087668A1 (en) * | 2009-10-09 | 2011-04-14 | Stratify, Inc. | Clustering of near-duplicate documents |
US8244767B2 (en) | 2009-10-09 | 2012-08-14 | Stratify, Inc. | Composite locality sensitive hash based processing of documents |
US9355171B2 (en) | 2009-10-09 | 2016-05-31 | Hewlett Packard Enterprise Development Lp | Clustering of near-duplicate documents |
US9928239B2 (en) | 2010-04-09 | 2018-03-27 | Entit Software Llc | Comparing projects |
WO2011126489A1 (en) * | 2010-04-09 | 2011-10-13 | Hewlett-Packard Development Company, L.P. | Method and system for comparing and locating projects |
US8938437B2 (en) | 2010-04-09 | 2015-01-20 | Hewlett-Packard Development Company, L.P. | Method and system for comparing and locating projects |
US8831945B2 (en) * | 2010-10-12 | 2014-09-09 | Nec Informatec Systems, Ltd. | Language model generating device, method thereof, and recording medium storing program thereof |
US9128907B2 (en) * | 2010-10-12 | 2015-09-08 | Nec Informatec Systems, Ltd. | Language model generating device, method thereof, and recording medium storing program thereof |
US20120089397A1 (en) * | 2010-10-12 | 2012-04-12 | Nec Informatec Systems, Ltd. | Language model generating device, method thereof, and recording medium storing program thereof |
US20140343926A1 (en) * | 2010-10-12 | 2014-11-20 | Nec Informatec Systems, Ltd. | Language model generating device, method thereof, and recording medium storing program thereof |
US8661341B1 (en) | 2011-01-19 | 2014-02-25 | Google, Inc. | Simhash based spell correction |
US8594239B2 (en) | 2011-02-21 | 2013-11-26 | Microsoft Corporation | Estimating document similarity using bit-strings |
US20120215803A1 (en) * | 2011-02-22 | 2012-08-23 | International Business Machines Corporation | Aggregate contribution of iceberg queries |
US8495087B2 (en) * | 2011-02-22 | 2013-07-23 | International Business Machines Corporation | Aggregate contribution of iceberg queries |
US8499003B2 (en) * | 2011-02-22 | 2013-07-30 | International Business Machines Corporation | Aggregate contribution of iceberg queries |
US20130159275A1 (en) * | 2011-12-14 | 2013-06-20 | Hon Hai Precision Industry Co., Ltd. | Information searching system and method |
US9330358B1 (en) * | 2013-01-04 | 2016-05-03 | The United States Of America As Represented By The Secretary Of The Navy | Case-based reasoning system using normalized weight vectors |
US20140207782A1 (en) * | 2013-01-22 | 2014-07-24 | Equivio Ltd. | System and method for computerized semantic processing of electronic documents including themes |
US10002182B2 (en) | 2013-01-22 | 2018-06-19 | Microsoft Israel Research And Development (2002) Ltd | System and method for computerized identification and effective presentation of semantic themes occurring in a set of electronic documents |
US9298757B1 (en) | 2013-03-13 | 2016-03-29 | International Business Machines Corporation | Determining similarity of linguistic objects |
US11379506B2 (en) * | 2014-09-26 | 2022-07-05 | Oracle International Corporation | Techniques for similarity analysis and data enrichment using knowledge sources |
US11693549B2 (en) | 2014-09-26 | 2023-07-04 | Oracle International Corporation | Declarative external data source importation, exportation, and metadata reflection utilizing HTTP and HDFS protocols |
WO2018106663A1 (en) * | 2016-12-06 | 2018-06-14 | Ebay Inc. | Anchored search |
US20180268025A1 (en) * | 2017-03-20 | 2018-09-20 | International Business Machines Corporation | Numeric data type support for cognitive intelligence queries |
US11080273B2 (en) * | 2017-03-20 | 2021-08-03 | International Business Machines Corporation | Image support for cognitive intelligence queries |
US11100100B2 (en) * | 2017-03-20 | 2021-08-24 | International Business Machines Corporation | Numeric data type support for cognitive intelligence queries |
US11500880B2 (en) | 2017-09-29 | 2022-11-15 | Oracle International Corporation | Adaptive recommendations |
US11003567B2 (en) * | 2017-12-06 | 2021-05-11 | International Business Machines Corporation | Method and apparatus for test modeling |
US11170759B2 (en) * | 2018-12-31 | 2021-11-09 | Verint Systems UK Limited | System and method for discriminating removing boilerplate text in documents comprising structured labelled text elements |
US20230185788A1 (en) * | 2021-12-09 | 2023-06-15 | International Business Machines Corporation | Semantic indices for accelerating semantic queries on databases |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7158961B1 (en) | Methods and apparatus for estimating similarity | |
US7647331B2 (en) | Detecting duplicate images using hash code grouping | |
US8713034B1 (en) | Systems and methods for identifying similar documents | |
US7664735B2 (en) | Method and system for ranking documents of a search result to improve diversity and information richness | |
JP4908214B2 (en) | Systems and methods for providing search query refinement. | |
US7801897B2 (en) | Indexing documents according to geographical relevance | |
CN101454750B (en) | Disambiguation of named entities | |
US6549897B1 (en) | Method and system for calculating phrase-document importance | |
US6173275B1 (en) | Representation and retrieval of images using context vectors derived from image information elements | |
EP1225517B1 (en) | System and methods for computer based searching for relevant texts | |
US8156129B2 (en) | Substantially similar queries | |
US8516357B1 (en) | Link based clustering of hyperlinked documents | |
US7392244B1 (en) | Methods and apparatus for determining equivalent descriptions for an information need | |
JP5147162B2 (en) | Method and system for determining object similarity based on heterogeneous relationships | |
Chung | A Brief Survey of PageRank Algorithms. | |
US20110087670A1 (en) | Systems and methods for concept mapping | |
US8805755B2 (en) | Decomposable ranking for efficient precomputing | |
US6718325B1 (en) | Approximate string matcher for delimited strings | |
US20080235208A1 (en) | Method For Fast Large Scale Data Mining Using Logistic Regression | |
US9977816B1 (en) | Link-based ranking of objects that do not include explicitly defined links | |
US7774340B2 (en) | Method and system for calculating document importance using document classifications | |
Witter et al. | Downdating the latent semantic indexing model for conceptual information retrieval | |
US20030204494A1 (en) | Method and system for searching documents with numbers | |
Lee et al. | 2D Z-string: a new spatial knowledge representation for image databases | |
Lee et al. | Perturbation of the hyper-linked environment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: GOOGLE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:CHARIKA, MOSES SAMSON;REEL/FRAME:013144/0265 Effective date: 20020731 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FPAY | Fee payment |
Year of fee payment: 8 |
|
AS | Assignment |
Owner name: GOOGLE LLC, CALIFORNIA Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044695/0115 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553) Year of fee payment: 12 |