WO2004079631A3 - Method and arrangement for searching for strings - Google Patents

Method and arrangement for searching for strings Download PDF

Info

Publication number
WO2004079631A3
WO2004079631A3 PCT/IB2004/050148 IB2004050148W WO2004079631A3 WO 2004079631 A3 WO2004079631 A3 WO 2004079631A3 IB 2004050148 W IB2004050148 W IB 2004050148W WO 2004079631 A3 WO2004079631 A3 WO 2004079631A3
Authority
WO
WIPO (PCT)
Prior art keywords
strings
string
database
query
exact
Prior art date
Application number
PCT/IB2004/050148
Other languages
French (fr)
Other versions
WO2004079631A2 (en
Inventor
Sebastian Egner
Johannes H M Korst
Vuuren Marcel Van
Original Assignee
Koninkl Philips Electronics Nv
Pauws Steffen C
Sebastian Egner
Johannes H M Korst
Vuuren Marcel Van
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Koninkl Philips Electronics Nv, Pauws Steffen C, Sebastian Egner, Johannes H M Korst, Vuuren Marcel Van filed Critical Koninkl Philips Electronics Nv
Priority to US10/547,328 priority Critical patent/US7756847B2/en
Priority to JP2006506641A priority patent/JP4538449B2/en
Priority to EP04714404A priority patent/EP1602039A2/en
Publication of WO2004079631A2 publication Critical patent/WO2004079631A2/en
Publication of WO2004079631A3 publication Critical patent/WO2004079631A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Abstract

This invention relates to methods of searching for a final number of result strings (30-33) having a partial or an exact match with a query string (34) in a database (80) comprised of many long strings or a long string, said method includes the steps of partitioning the query string in a first number of input query strings (35, 36, 37); determining a second number of neighboring strings (38-41, 42-45, 44-49, respectively) for each string in said first number of input query strings, wherein each string in said second number of neighboring strings has a predetermined first number of errors; searching the database for a third number of exact matches (50-61, 70-74) for each string in said second number of neighboring strings based on a search method; concatenating said searched exact matched strings from the database into a fourth number of intermediate strings (29, 30, 32, 33, 34) wherein said searched exact matched strings (50-61, 70-74) comprised in each of said intermediate strings are in succession to one another in said database; and determining the final number of result strings (30-33) based in said fourth number of intermediate strings, wherein each string in the final number of result strings has a maximum of predetermined second number of errors compared to said query string (34). This enables for a perfect match or a partial match containing only minor errors with respect to said query string, and for a fast search in larger databases with a relative low use of processing power.
PCT/IB2004/050148 2003-03-03 2004-02-25 Method and arrangement for searching for strings WO2004079631A2 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
US10/547,328 US7756847B2 (en) 2003-03-03 2004-02-25 Method and arrangement for searching for strings
JP2006506641A JP4538449B2 (en) 2003-03-03 2004-02-25 String search method and equipment
EP04714404A EP1602039A2 (en) 2003-03-03 2004-02-25 Method and arrangement for searching for strings

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP03100517 2003-03-03
EP03100517.6 2003-03-03

Publications (2)

Publication Number Publication Date
WO2004079631A2 WO2004079631A2 (en) 2004-09-16
WO2004079631A3 true WO2004079631A3 (en) 2004-10-28

Family

ID=32946911

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IB2004/050148 WO2004079631A2 (en) 2003-03-03 2004-02-25 Method and arrangement for searching for strings

Country Status (6)

Country Link
US (1) US7756847B2 (en)
EP (1) EP1602039A2 (en)
JP (1) JP4538449B2 (en)
KR (1) KR101068678B1 (en)
CN (1) CN100557606C (en)
WO (1) WO2004079631A2 (en)

Families Citing this family (71)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6876309B1 (en) * 1994-11-21 2005-04-05 Espeed, Inc. Bond trading system
US8588729B2 (en) * 1994-11-21 2013-11-19 Bgc Partners, Inc. Method for retrieving data stored in a database
KR20040027259A (en) 2002-09-26 2004-04-01 엘지전자 주식회사 Method for managing a defect area on optical disc write once
US7233550B2 (en) 2002-09-30 2007-06-19 Lg Electronics Inc. Write-once optical disc, and method and apparatus for recording management information on write-once optical disc
KR20040028469A (en) 2002-09-30 2004-04-03 엘지전자 주식회사 Method for managing a defect area on optical disc write once
JP4914610B2 (en) 2002-12-11 2012-04-11 エルジー エレクトロニクス インコーポレイティド Overwrite management method and management information recording method for write-once optical disc
US7672204B2 (en) 2003-01-27 2010-03-02 Lg Electronics Inc. Optical disc, method and apparatus for managing a defective area on an optical disc
TWI314315B (en) 2003-01-27 2009-09-01 Lg Electronics Inc Optical disc of write once type, method, and apparatus for managing defect information on the optical disc
US20040160799A1 (en) 2003-02-17 2004-08-19 Park Yong Cheol Write-once optical disc, and method and apparatus for allocating spare area on write-once optical disc
US7499383B2 (en) 2003-02-21 2009-03-03 Lg Electronics Inc. Write-once optical disc and method for managing spare area thereof
TWI335587B (en) 2003-02-21 2011-01-01 Lg Electronics Inc Write-once optical recording medium and defect management information management method thereof
US7675828B2 (en) 2003-02-25 2010-03-09 Lg Electronics Inc. Recording medium having data structure for managing at least a data area of the recording medium and recording and reproducing methods and apparatuses
AU2003282449A1 (en) 2003-03-04 2004-09-28 Lg Electronics Inc. Method for recording on optical recording medium and apparatus using the same
TWI328805B (en) 2003-03-13 2010-08-11 Lg Electronics Inc Write-once recording medium and defective area management method and apparatus for write-once recording medium
MXPA05012044A (en) 2003-05-09 2006-02-03 Lg Electronics Inc Write once optical disc, and method and apparatus for recovering disc management information from the write once optical disc.
JP4846566B2 (en) 2003-05-09 2011-12-28 エルジー エレクトロニクス インコーポレイティド Optical disk that can be recorded only once, and method and apparatus for restoring management information from an optical disk that can be recorded only once
KR20050009031A (en) 2003-07-15 2005-01-24 엘지전자 주식회사 Method for recording management information on optical disc write once
DK1652174T3 (en) 2003-08-05 2010-06-07 Lg Electronics Inc Disposable optical disc and method and apparatus for recording / reproducing control information on / from the optical disc
US7313065B2 (en) 2003-08-05 2007-12-25 Lg Electronics Inc. Write-once optical disc, and method and apparatus for recording/reproducing management information on/from optical disc
RU2361295C2 (en) 2003-09-08 2009-07-10 ЭлДжи ЭЛЕКТРОНИКС ИНК. Write-once optical disk and method of recording management data on it
CA2537895A1 (en) 2003-09-08 2005-03-17 Lg Electronics Inc. Write-once optical disc, and method and apparatus for recording management information thereon
CA2537889C (en) 2003-09-08 2013-10-22 Lg Electronics Inc. Write-once optical disc, and method and apparatus for recording management information on the write-once optical disc
KR100964685B1 (en) 2003-10-20 2010-06-21 엘지전자 주식회사 Method and apparatus for recording and reproducing data on/from optical disc write once
US7283999B1 (en) * 2003-12-19 2007-10-16 Ncr Corp. Similarity string filtering
KR101024916B1 (en) 2004-03-19 2011-03-31 엘지전자 주식회사 Method for writing data in high density optical write once disc and Apparatus for the same
KR101049117B1 (en) 2004-06-08 2011-07-14 엘지전자 주식회사 Method and apparatus for recording management information on optical write once disc
KR101014727B1 (en) 2004-06-23 2011-02-16 엘지전자 주식회사 Method and Apparatus for managing a overwrite in Optical write once disc
KR101012378B1 (en) 2004-08-16 2011-02-09 엘지전자 주식회사 Method and Apparatus for recording / reproducing in Optical storage
JP5144265B2 (en) 2004-09-14 2013-02-13 エルジー エレクトロニクス インコーポレイティド Recording medium and recording / reproducing method and apparatus for recording medium
US7747635B1 (en) * 2004-12-21 2010-06-29 Oracle America, Inc. Automatically generating efficient string matching code
US7338819B2 (en) * 2005-06-30 2008-03-04 Broadcom Corporation System and method for matching chip and package terminals
KR101227485B1 (en) 2005-11-25 2013-01-29 엘지전자 주식회사 Recording mdium, Method and Apparatus for recording defect management information on the recording medium
US7509339B2 (en) * 2006-01-03 2009-03-24 International Business Machines Corporation System and method of implementing personalized alerts utilizing a user registry in instant messenger
US7406479B2 (en) * 2006-02-10 2008-07-29 Microsoft Corporation Primitive operator for similarity joins in data cleaning
US7395270B2 (en) * 2006-06-26 2008-07-01 International Business Machines Corporation Classification-based method and apparatus for string selectivity estimation
US7945627B1 (en) 2006-09-28 2011-05-17 Bitdefender IPR Management Ltd. Layout-based electronic communication filtering systems and methods
US8283546B2 (en) * 2007-03-28 2012-10-09 Van Os Jan L Melody encoding and searching system
US7962530B1 (en) * 2007-04-27 2011-06-14 Michael Joseph Kolta Method for locating information in a musical database using a fragment of a melody
US8572184B1 (en) 2007-10-04 2013-10-29 Bitdefender IPR Management Ltd. Systems and methods for dynamically integrating heterogeneous anti-spam filters
US8010614B1 (en) 2007-11-01 2011-08-30 Bitdefender IPR Management Ltd. Systems and methods for generating signatures for electronic communication classification
WO2009094649A1 (en) * 2008-01-24 2009-07-30 Sra International, Inc. System and method for variant string matching
US20090234852A1 (en) * 2008-03-17 2009-09-17 Microsoft Corporation Sub-linear approximate string match
US8126913B2 (en) * 2008-05-08 2012-02-28 International Business Machines Corporation Method to identify exact, non-exact and further non-exact matches to part numbers in an enterprise database
US9569528B2 (en) 2008-10-03 2017-02-14 Ab Initio Technology Llc Detection of confidential information
US8504580B2 (en) * 2009-03-03 2013-08-06 Ilya Geller Systems and methods for creating an artificial intelligence
US8516013B2 (en) 2009-03-03 2013-08-20 Ilya Geller Systems and methods for subtext searching data using synonym-enriched predicative phrases and substituted pronouns
US8447789B2 (en) * 2009-09-15 2013-05-21 Ilya Geller Systems and methods for creating structured data
US20100274755A1 (en) * 2009-04-28 2010-10-28 Stewart Richard Alan Binary software binary image analysis
US20100325136A1 (en) * 2009-06-23 2010-12-23 Microsoft Corporation Error tolerant autocompletion
CN104484381B (en) * 2010-02-26 2018-05-22 电子湾有限公司 For searching for the method and system of multiple strings
WO2011137368A2 (en) 2010-04-30 2011-11-03 Life Technologies Corporation Systems and methods for analyzing nucleic acid sequences
KR101638594B1 (en) 2010-05-26 2016-07-20 삼성전자주식회사 Method and apparatus for searching DNA sequence
US9268903B2 (en) 2010-07-06 2016-02-23 Life Technologies Corporation Systems and methods for sequence data alignment quality assessment
CN102479191B (en) 2010-11-22 2014-03-26 阿里巴巴集团控股有限公司 Method and device for providing multi-granularity word segmentation result
CN103425691B (en) 2012-05-22 2016-12-14 阿里巴巴集团控股有限公司 A kind of searching method and system
KR101452638B1 (en) 2013-06-21 2014-10-22 서울대학교산학협력단 Method and apparatus for recommending contents
US9934217B2 (en) 2013-07-26 2018-04-03 Facebook, Inc. Index for electronic string of symbols
US9400845B2 (en) * 2013-09-03 2016-07-26 Ferrandino & Son Inc. Providing intelligent service provider searching and statistics on service providers
CN104008119B (en) * 2013-12-30 2017-09-26 西南交通大学 A kind of one-to-many mixed characters string fusion comparison method
CN105653567A (en) * 2014-12-04 2016-06-08 南京理工大学常熟研究院有限公司 Method for quickly looking for feature character strings in text sequential data
US9953065B2 (en) * 2015-02-13 2018-04-24 International Business Machines Corporation Method for processing a database query
CN104750846B (en) * 2015-04-10 2017-12-08 浪潮集团有限公司 A kind of substring lookup method and device
KR101910491B1 (en) * 2016-12-07 2018-10-22 전북대학교 산학협력단 A method and apparatus for efficient string similarity search based on generating inverted list of variable length grams
CN110892401A (en) 2017-03-19 2020-03-17 奥菲克-艾什科洛研究与发展有限公司 System and method for generating filters for k mismatched searches
US10528556B1 (en) * 2017-12-31 2020-01-07 Allscripts Software, Llc Database methodology for searching encrypted data records
US10747819B2 (en) 2018-04-20 2020-08-18 International Business Machines Corporation Rapid partial substring matching
US10169451B1 (en) 2018-04-20 2019-01-01 International Business Machines Corporation Rapid character substring searching
US10732972B2 (en) 2018-08-23 2020-08-04 International Business Machines Corporation Non-overlapping substring detection within a data element string
US10782968B2 (en) 2018-08-23 2020-09-22 International Business Machines Corporation Rapid substring detection within a data element string
US11042371B2 (en) 2019-09-11 2021-06-22 International Business Machines Corporation Plausability-driven fault detection in result logic and condition codes for fast exact substring match
US10996951B2 (en) 2019-09-11 2021-05-04 International Business Machines Corporation Plausibility-driven fault detection in string termination logic for fast exact substring match

Family Cites Families (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2880199B2 (en) * 1989-10-18 1999-04-05 株式会社日立製作所 Symbol string search method and search device
US6370479B1 (en) * 1992-02-06 2002-04-09 Fujitsu Limited Method and apparatus for extracting and evaluating mutually similar portions in one-dimensional sequences in molecules and/or three-dimensional structures of molecules
GB9220404D0 (en) * 1992-08-20 1992-11-11 Nat Security Agency Method of identifying,retrieving and sorting documents
US5852821A (en) * 1993-04-16 1998-12-22 Sybase, Inc. High-speed data base query method and apparatus
US5553272A (en) * 1994-09-30 1996-09-03 The University Of South Florida VLSI circuit structure for determining the edit distance between strings
DE69422406T2 (en) * 1994-10-28 2000-05-04 Hewlett Packard Co Method for performing data chain comparison
US5778361A (en) * 1995-09-29 1998-07-07 Microsoft Corporation Method and system for fast indexing and searching of text in compound-word languages
US5963957A (en) * 1997-04-28 1999-10-05 Philips Electronics North America Corporation Bibliographic music data base with normalized musical themes
US6026398A (en) * 1997-10-16 2000-02-15 Imarket, Incorporated System and methods for searching and matching databases
JP3622503B2 (en) * 1998-05-29 2005-02-23 株式会社日立製作所 Feature character string extraction method and apparatus, similar document search method and apparatus using the same, storage medium storing feature character string extraction program, and storage medium storing similar document search program
US6144958A (en) * 1998-07-15 2000-11-07 Amazon.Com, Inc. System and method for correcting spelling errors in search queries
US6556984B1 (en) * 1999-01-19 2003-04-29 International Business Machines Corporation Hierarchical string matching using multi-path dynamic programming
DE10028624B4 (en) * 1999-06-09 2007-07-05 Ricoh Co., Ltd. Method and device for document procurement
US6757675B2 (en) * 2000-07-24 2004-06-29 The Regents Of The University Of California Method and apparatus for indexing document content and content comparison with World Wide Web search service
US6654734B1 (en) * 2000-08-30 2003-11-25 International Business Machines Corporation System and method for query processing and optimization for XML repositories
JP2002189747A (en) * 2000-12-19 2002-07-05 Hitachi Ltd Retrieving method for document information
US6775666B1 (en) * 2001-05-29 2004-08-10 Microsoft Corporation Method and system for searching index databases
US6681222B2 (en) * 2001-07-16 2004-01-20 Quip Incorporated Unified database and text retrieval system
US7152056B2 (en) * 2002-04-19 2006-12-19 Dow Jones Reuters Business Interactive, Llc Apparatus and method for generating data useful in indexing and searching
US7010522B1 (en) * 2002-06-17 2006-03-07 At&T Corp. Method of performing approximate substring indexing
US7734565B2 (en) * 2003-01-18 2010-06-08 Yahoo! Inc. Query string matching method and apparatus
US7313554B2 (en) * 2003-09-29 2007-12-25 International Business Machines Corporation System and method for indexing queries, rules and subscriptions
US7299126B2 (en) * 2003-11-03 2007-11-20 International Business Machines Corporation System and method for evaluating moving queries over moving objects
US7283999B1 (en) * 2003-12-19 2007-10-16 Ncr Corp. Similarity string filtering
US7346625B2 (en) * 2004-11-05 2008-03-18 International Business Machines Corporation Methods and apparatus for performing structural joins for answering containment queries

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LUIS GRAVANO AND OTHERS: "Using q-grams in a DBMS for Approximate String Processing", IEEE DATA ENGINEERING BULLETIN, vol. 24, no. 4, 2001, pages 28 - 34, XP002291636, Retrieved from the Internet <URL:http://citeseer.ist.psu.edu/cache/papers/cs/27618/http:zSzzSzwww1.cs.columbia.eduzSz~pirotzSzpublicationszSzdeb-dec2001.pdf/gravano01using.pdf> [retrieved on 20040806] *
MYERS E. W.: "A Sublinear Algorithm for approximate keyword searching", ALGORITHMICA, vol. 12, no. 4-5, October 1994 (1994-10-01), GERMANY, pages 345 - 374, XP008033755 *

Also Published As

Publication number Publication date
JP4538449B2 (en) 2010-09-08
KR20060002792A (en) 2006-01-09
WO2004079631A2 (en) 2004-09-16
CN100557606C (en) 2009-11-04
US7756847B2 (en) 2010-07-13
CN1761958A (en) 2006-04-19
KR101068678B1 (en) 2011-09-30
EP1602039A2 (en) 2005-12-07
US20060179052A1 (en) 2006-08-10
JP2006519445A (en) 2006-08-24

Similar Documents

Publication Publication Date Title
WO2004079631A3 (en) Method and arrangement for searching for strings
WO2002077873A3 (en) System, method and apparatus for conducting a phrase search
US6947920B2 (en) Method and system for response time optimization of data query rankings and retrieval
EP1320041A3 (en) Searching profile information
WO2005066847A3 (en) Systems and methods for improving search quality
WO2004017158A3 (en) System, method and apparatus for conducting a keyterm search
WO2004066090A3 (en) Query string matching method and apparatus
US20070239752A1 (en) Fuzzy alphanumeric search apparatus and method
US9507881B2 (en) Search device
WO2004077272A3 (en) System and method for software reuse
WO2002057883A3 (en) Efficient searching techniques
DE60045393D1 (en) SEARCH IN MUSIC DATABASE
DE69602444D1 (en) SYSTEM AND METHOD FOR NARROWING THE SCOPE OF SEARCH IN A LEXICON
SE0004043D0 (en) Method and apparatus for document indexing and searching
CA2429338A1 (en) Method and apparatus for categorizing and presenting documents of a distributed database
CA2373568A1 (en) Method of searching similar document, system for performing the same and program for processing the same
US6691103B1 (en) Method for searching a database, search engine system for searching a database, and method of providing a key table for use by a search engine for a database
EP1622042A4 (en) Database device, database search device, and method thereof
WO2004079505A3 (en) Matching queries to partitioned document path segments
WO2003083720A3 (en) Database searching method and system
KR20050054538A (en) Method of high-speed pattern storing and matching
WO2002033571A3 (en) Method of operating a plurality of electronic databases
WO2003085552A3 (en) Comparison of source files
EP1808781A2 (en) Evaluation of name prefix and suffix during a search
US20050086209A1 (en) Conceptual article collector

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2004714404

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2006179052

Country of ref document: US

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 10547328

Country of ref document: US

WWE Wipo information: entry into national phase

Ref document number: 2006506641

Country of ref document: JP

Ref document number: 2111/CHENP/2005

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 1020057016418

Country of ref document: KR

Ref document number: 20048058740

Country of ref document: CN

WWP Wipo information: published in national office

Ref document number: 2004714404

Country of ref document: EP

WWP Wipo information: published in national office

Ref document number: 1020057016418

Country of ref document: KR

WWP Wipo information: published in national office

Ref document number: 10547328

Country of ref document: US