WO2004084099A3 - Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval - Google Patents

Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval Download PDF

Info

Publication number
WO2004084099A3
WO2004084099A3 PCT/US2004/008309 US2004008309W WO2004084099A3 WO 2004084099 A3 WO2004084099 A3 WO 2004084099A3 US 2004008309 W US2004008309 W US 2004008309W WO 2004084099 A3 WO2004084099 A3 WO 2004084099A3
Authority
WO
WIPO (PCT)
Prior art keywords
toponym
place
pair
toponyms
confidence
Prior art date
Application number
PCT/US2004/008309
Other languages
French (fr)
Other versions
WO2004084099A2 (en
Inventor
John R Frank
Original Assignee
Metacarta Inc
John R Frank
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Metacarta Inc, John R Frank filed Critical Metacarta Inc
Priority to CA002519236A priority Critical patent/CA2519236A1/en
Priority to EP04757619A priority patent/EP1604309A2/en
Priority to AU2004220880A priority patent/AU2004220880B2/en
Priority to JP2006507322A priority patent/JP2006521621A/en
Publication of WO2004084099A2 publication Critical patent/WO2004084099A2/en
Publication of WO2004084099A3 publication Critical patent/WO2004084099A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/289Object oriented databases

Abstract

A computer-implemented method for processing a plurality of toponyms, the method involving: in a large corpus, identifying geo-textual correlations among readings of the toponyms within the plurality of toponyms; and for each toponym selected from the plurality of toponyms, using the identified geo-textual correlations to generate a value for a confidence that the selected toponym refers to a corresponding geographic location. Also a method of generating information useful for ranking a document that includes a plurality of toponyms for which there is a corresponding plurality of (toponym,place) pairs, there being associated with each (toponym,place) pair of said plurality of (toponym,place) pairs a corresponding value for a confidence that the toponym of that (toponym,place) pair refers to the place of that (toponym,place) pair. This further method includes, for a selected (toponym,place) pair of the plurality of (toponym,place) pairs, (1) determining if another toponym is present within the document that has an associated place that is geographically related to the place of the selected (toponym, place) pair; and (2) if a toponym is identified within the document that has an associated place that is geographically related to the place of the selected (toponym, place) pair, boosting the value of the confidence for the selected (toponym,place) pair.
PCT/US2004/008309 2003-03-18 2004-03-18 Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval WO2004084099A2 (en)

Priority Applications (4)

Application Number Priority Date Filing Date Title
CA002519236A CA2519236A1 (en) 2003-03-18 2004-03-18 Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval
EP04757619A EP1604309A2 (en) 2003-03-18 2004-03-18 Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval
AU2004220880A AU2004220880B2 (en) 2003-03-18 2004-03-18 Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval
JP2006507322A JP2006521621A (en) 2003-03-18 2004-03-18 Material grouping, confidence improvement, and ranking for geographic text and information retrieval

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US45562703P 2003-03-18 2003-03-18
US60/455,627 2003-03-18

Publications (2)

Publication Number Publication Date
WO2004084099A2 WO2004084099A2 (en) 2004-09-30
WO2004084099A3 true WO2004084099A3 (en) 2005-04-21

Family

ID=33030034

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2004/008309 WO2004084099A2 (en) 2003-03-18 2004-03-18 Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval

Country Status (6)

Country Link
US (1) US8037078B2 (en)
EP (1) EP1604309A2 (en)
JP (1) JP2006521621A (en)
AU (1) AU2004220880B2 (en)
CA (1) CA2519236A1 (en)
WO (1) WO2004084099A2 (en)

Families Citing this family (48)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1269357A4 (en) 2000-02-22 2005-10-12 Metacarta Inc Spatially coding and displaying information
JP2005301786A (en) * 2004-04-14 2005-10-27 Internatl Business Mach Corp <Ibm> Evaluating apparatus, cluster generating apparatus, program, recording medium, evaluation method, and cluster generation method
GB0414623D0 (en) * 2004-06-30 2004-08-04 Ibm Method and system for determining the focus of a document
US7716162B2 (en) 2004-12-30 2010-05-11 Google Inc. Classification of ambiguous geographic references
US20060149800A1 (en) * 2004-12-30 2006-07-06 Daniel Egnor Authoritative document identification
US7831438B2 (en) * 2004-12-30 2010-11-09 Google Inc. Local item extraction
US7610545B2 (en) * 2005-06-06 2009-10-27 Bea Systems, Inc. Annotations for tracking provenance
US8200676B2 (en) * 2005-06-28 2012-06-12 Nokia Corporation User interface for geographic search
US9026511B1 (en) 2005-06-29 2015-05-05 Google Inc. Call connection via document browsing
US8977953B1 (en) * 2006-01-27 2015-03-10 Linguastat, Inc. Customizing information by combining pair of annotations from at least two different documents
US9411896B2 (en) 2006-02-10 2016-08-09 Nokia Technologies Oy Systems and methods for spatial thumbnails and companion maps for media objects
US20080010273A1 (en) 2006-06-12 2008-01-10 Metacarta, Inc. Systems and methods for hierarchical organization and presentation of geographic search results
US9721157B2 (en) 2006-08-04 2017-08-01 Nokia Technologies Oy Systems and methods for obtaining and using information from map images
US9286404B2 (en) 2006-06-28 2016-03-15 Nokia Technologies Oy Methods of systems using geographic meta-metadata in information retrieval and document displays
US7747630B2 (en) * 2006-09-28 2010-06-29 Amazon Technologies, Inc. Assessing author authority and blog influence
US8484199B1 (en) 2006-12-12 2013-07-09 Google Inc. Ranking of geographic information
US7783644B1 (en) * 2006-12-13 2010-08-24 Google Inc. Query-independent entity importance in books
US8671341B1 (en) 2007-01-05 2014-03-11 Linguastat, Inc. Systems and methods for identifying claims associated with electronic text
US20080208847A1 (en) * 2007-02-26 2008-08-28 Fabian Moerchen Relevance ranking for document retrieval
US8024454B2 (en) * 2007-03-28 2011-09-20 Yahoo! Inc. System and method for associating a geographic location with an internet protocol address
US8621064B2 (en) * 2007-03-28 2013-12-31 Yahoo! Inc. System and method for associating a geographic location with an Internet protocol address
WO2008129339A1 (en) * 2007-04-18 2008-10-30 Mitsco - Seekport Fz-Llc Method for location identification in web pages and location-based ranking of internet search results
US8015196B2 (en) * 2007-06-18 2011-09-06 Geographic Services, Inc. Geographic feature name search system
US7889888B2 (en) * 2007-06-27 2011-02-15 Raytheon Company System and method for grouping and visualizing data
US20090005970A1 (en) * 2007-06-27 2009-01-01 Raytheon Company System and Method for Displaying Geographical Information
US20090006323A1 (en) * 2007-06-27 2009-01-01 Raytheon Company System and Method for Analyzing Intelligence Information
US8352393B2 (en) * 2007-08-03 2013-01-08 Alcatel Lucent Method and system for evaluating tests used in operating system fingerprinting
US7987195B1 (en) 2008-04-08 2011-07-26 Google Inc. Dynamic determination of location-identifying search phrases
US8463774B1 (en) * 2008-07-15 2013-06-11 Google Inc. Universal scores for location search queries
CN101661461B (en) * 2008-08-29 2016-01-13 阿里巴巴集团控股有限公司 Determine the method for core geographic information in document, system
US20100250562A1 (en) * 2009-03-24 2010-09-30 Mireo d.o.o. Recognition of addresses from the body of arbitrary text
US9068849B2 (en) * 2009-05-04 2015-06-30 Tomtom North America, Inc. Method and system for reducing shape points in a geographic data information system
US20110047136A1 (en) * 2009-06-03 2011-02-24 Michael Hans Dehn Method For One-Click Exclusion Of Undesired Search Engine Query Results Without Clustering Analysis
WO2011003232A1 (en) 2009-07-07 2011-01-13 Google Inc. Query parsing for map search
US10068178B2 (en) * 2010-01-13 2018-09-04 Oath, Inc. Methods and system for associating locations with annotations
US20110202886A1 (en) * 2010-02-13 2011-08-18 Vinay Deolalikar System and method for displaying documents
US8417709B2 (en) * 2010-05-27 2013-04-09 International Business Machines Corporation Automatic refinement of information extraction rules
US8515973B1 (en) * 2011-02-08 2013-08-20 Google Inc. Identifying geographic features from query prefixes
EP2707818B1 (en) 2011-05-10 2015-08-05 deCarta Inc. Systems and methods for performing search and retrieval of electronic documents using a big index
WO2013082507A1 (en) * 2011-11-30 2013-06-06 Decarta Systems and methods for performing geo-search and retrieval of electronic point-of-interest records using a big index
US9152698B1 (en) * 2012-01-03 2015-10-06 Google Inc. Substitute term identification based on over-represented terms identification
US9141672B1 (en) 2012-01-25 2015-09-22 Google Inc. Click or skip evaluation of query term optionalization rule
US9146966B1 (en) 2012-10-04 2015-09-29 Google Inc. Click or skip evaluation of proximity rules
US20140250113A1 (en) * 2013-03-04 2014-09-04 International Business Machines Corporation Geographic relevance within a soft copy document or media object
US9753945B2 (en) 2013-03-13 2017-09-05 Google Inc. Systems, methods, and computer-readable media for interpreting geographical search queries
EP3143526A4 (en) 2014-05-12 2017-10-04 Diffeo, Inc. Entity-centric knowledge discovery
WO2017011483A1 (en) * 2015-07-12 2017-01-19 Aravind Musuluri System and method for ranking documents
CN107180045B (en) * 2016-03-10 2020-10-16 中国科学院地理科学与资源研究所 Method for extracting geographic entity relation contained in internet text

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001063479A1 (en) * 2000-02-22 2001-08-30 Metacarta, Inc. Spatially coding and displaying information
US20020031269A1 (en) * 2000-09-08 2002-03-14 Nec Corporation System, method and program for discriminating named entity

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6240425B1 (en) * 1997-12-31 2001-05-29 John Naughton Geographic search engine having independent selections of direction and distance from a relocatable hub
US6377949B1 (en) 1998-09-18 2002-04-23 Tacit Knowledge Systems, Inc. Method and apparatus for assigning a confidence level to a term within a user knowledge profile
US6701307B2 (en) * 1998-10-28 2004-03-02 Microsoft Corporation Method and apparatus of expanding web searching capabilities
JP2000268033A (en) * 1999-03-12 2000-09-29 Nippon Telegr & Teleph Corp <Ntt> Method and device for giving tag to information string and recording medium recorded with the method
US7024407B2 (en) 2000-08-24 2006-04-04 Content Analyst Company, Llc Word sense disambiguation
US7599911B2 (en) * 2002-08-05 2009-10-06 Yahoo! Inc. Method and apparatus for search ranking using human input and automated ranking

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001063479A1 (en) * 2000-02-22 2001-08-30 Metacarta, Inc. Spatially coding and displaying information
US20020031269A1 (en) * 2000-09-08 2002-03-14 Nec Corporation System, method and program for discriminating named entity

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHAW W M JR: "Term-relevance computations and perfect retrieval performance", INFORMATION PROCESSING & MANAGEMENT, ELSEVIER, BARKING, GB, vol. 31, no. 4, 1 July 1995 (1995-07-01), pages 491 - 498, XP004062656, ISSN: 0306-4573 *

Also Published As

Publication number Publication date
US8037078B2 (en) 2011-10-11
CA2519236A1 (en) 2004-09-30
AU2004220880A1 (en) 2004-09-30
US20040236730A1 (en) 2004-11-25
JP2006521621A (en) 2006-09-21
WO2004084099A2 (en) 2004-09-30
AU2004220880B2 (en) 2010-09-23
EP1604309A2 (en) 2005-12-14

Similar Documents

Publication Publication Date Title
WO2004084099A3 (en) Corpus clustering, confidence refinement, and ranking for geographic text search and information retrieval
NZ578672A (en) Information-retrieval systems, methods, and software with concept-based searching and ranking
WO2004086192A3 (en) Systems and methods for interactive search query refinement
Saxonhouse Women in the history of political thought: Ancient Greece to Machiavelli
WO2004072757A3 (en) Text and attribute searches of data stores that include business object
EP1457898A3 (en) Data search system and method
WO2004090755A3 (en) System and method for providing preferred language ordering of search results
EP1033662A3 (en) Natural language search method and apparatus
TW200508916A (en) System and method to acquire information from a database
CN106202294B (en) Related news computing method and device based on keyword and topic model fusion
WO2005091825A3 (en) Keyword recommendation for internet search engines
WO2005033967A3 (en) Systems and methods for searching using queries written in a different character-set and/or language from the target pages
JP2005085285A5 (en)
EP1367509A3 (en) Method and apparatus for categorizing and presenting documents of a distributed database
CN101820592A (en) Method and device for mobile search
CN102893280A (en) Data search device, data search method and program
TW200512602A (en) Method and system of fuzzy searching
TW200519645A (en) Creating taxonomies and training data in multiple languages
Prokić et al. Combining regular sound correspondences and geographic spread
EP1465082A3 (en) Method of determining database search path
Szmrecsanyi Analyzing aggregated linguistic data
CN107818091B (en) Document processing method and device
CN105095270B (en) Retrieve device and search method
Dalli System for spatio-temporal analysis of online news and blogs
CN109766415B (en) Book directory positioning method and system

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): BW GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
WWE Wipo information: entry into national phase

Ref document number: 2519236

Country of ref document: CA

WWE Wipo information: entry into national phase

Ref document number: 2004220880

Country of ref document: AU

Ref document number: 2006507322

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2004757619

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2004220880

Country of ref document: AU

Date of ref document: 20040318

Kind code of ref document: A

WWP Wipo information: published in national office

Ref document number: 2004220880

Country of ref document: AU

WWP Wipo information: published in national office

Ref document number: 2004757619

Country of ref document: EP