US20050065959A1 - Systems and methods for clustering search results - Google Patents

Systems and methods for clustering search results Download PDF

Info

Publication number
US20050065959A1
US20050065959A1 US10/664,929 US66492903A US2005065959A1 US 20050065959 A1 US20050065959 A1 US 20050065959A1 US 66492903 A US66492903 A US 66492903A US 2005065959 A1 US2005065959 A1 US 2005065959A1
Authority
US
United States
Prior art keywords
documents
clusters
interest
geographical
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US10/664,929
Other versions
US8346770B2 (en
Inventor
Adam Smith
Xianping Ge
Elizabeth Hamon
Abhishek Parmar
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Google LLC
Original Assignee
Google LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority to US10/664,929 priority Critical patent/US8346770B2/en
Application filed by Google LLC filed Critical Google LLC
Assigned to GOOGLE INC. reassignment GOOGLE INC. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: GE, XIANPING, HAMON, ELIZABETH, PARMAR, ABHISHEK, SMITH, ADAM
Priority to PCT/US2004/030983 priority patent/WO2005031614A1/en
Priority to EP04784728A priority patent/EP1665101A1/en
Priority to KR1020067005695A priority patent/KR100814667B1/en
Publication of US20050065959A1 publication Critical patent/US20050065959A1/en
Priority to NO20061794A priority patent/NO337806B1/en
Publication of US8346770B2 publication Critical patent/US8346770B2/en
Application granted granted Critical
Assigned to GOOGLE LLC reassignment GOOGLE LLC CHANGE OF NAME (SEE DOCUMENT FOR DETAILS). Assignors: GOOGLE INC.
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • the present invention relates generally to information retrieval systems and, more particularly, to systems and methods for clustering search results by address and/or telephone number.
  • search engines attempt to return hyperlinks to web documents in which a user is interested.
  • search engines base their determination of the user's interest on search terms (called a search query) entered by the user.
  • the goal of the search engine is to provide links to high quality, relevant results to the user based on the search query.
  • the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web documents. Web documents that contain the user's search terms are “hits” and are returned to the user.
  • Some web documents may be of particular interest to users that reside in certain geographical areas. For example, web documents associated with local businesses or organizations may be of most relevance to individuals located in the geographical area of the local businesses/organizations.
  • a user desires information regarding a type of business (e.g., a restaurant, a hardware store, a pharmacy, etc.) within a certain geographical area
  • the user may provide one or more keywords associated with the business type and the geographical area to a search engine.
  • the search engine returns search results that include web documents associated with the business type.
  • search results typically will not include web documents associated with businesses or organizations outside the geographical area identified by the user, even if these businesses or organizations are located in an area geographically close (or next) to the geographical area identified by the user.
  • search results typically include more than one, and oftentimes many, web documents associated with the same business location, possibly requiring the user to peruse many web documents in the search results that are irrelevant to the business of interest before locating all of the web documents associated with the business of interest.
  • Systems and methods, consistent with the principles of the invention cluster web documents based at least in part on addresses (or telephone numbers) included in the web documents.
  • a method for clustering by address may include receiving a search query, identifying a geographical area of interest based, at least in part, on the search query, and identifying documents that include addresses located within the geographical area of interest.
  • the method may also include grouping the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest and presenting the clusters.
  • a system for forming search results may include a processor and a memory configured to store information that associates documents to addresses included in the documents.
  • the processor is configured to receive a search query, determine a geographical area of interest based, at least in part, on the search query, and identify documents that include addresses located within the geographical area of interest based, at least in part, on the information stored in the memory.
  • the processor is also configured to group the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest and provide the clusters as the search results.
  • a method for forming search results may include receiving a search query that includes at least one portion of a telephone number and identifying documents that include telephone numbers that match the at least one portion of the telephone number. The method may also include grouping the identified documents into clusters based on the telephone numbers included in the identified documents and presenting the clusters as the search results.
  • a system for forming search results may include means for receiving a search query, means for identifying a geographical location, means for determining a geographical center of the geographical location, and means for identifying locations within a certain distance of the geographical center as a geographical area of interest.
  • the system may also include means for identifying documents that include addresses located within the geographical area of interest and means for determining relevant ones of the identified documents, as relevant documents, based, at least in part, on the search query.
  • the relevant documents may form the search results.
  • FIG. 1 is a diagram of an exemplary network in which systems and methods consistent with the principles of the invention may be implemented;
  • FIG. 2 is an exemplary diagram of a client and/or server of FIG. 1 in an implementation consistent with the principles of the invention
  • FIG. 3 is a diagram of an exemplary computer-readable medium that may be used by a server of FIG. 1 according to an implementation consistent with the principles of the invention
  • FIGS. 4A and 4B are flowcharts of exemplary processing for clustering search results by address according to an implementation consistent with the principles of the invention
  • FIG. 5 is a functional block diagram of a portion of a server according to this exemplary implementation consistent with the principles of the invention.
  • FIG. 6A is a diagram of an exemplary result list according to an implementation consistent with the principles of the invention.
  • FIG. 6B is a diagram of an exemplary result list according to another implementation consistent with the principles of the invention.
  • Systems and methods consistent with the principles of the invention may provide search results that are clustered by address (or telephone number) to provide search results that are meaningful to users looking for information associated with particular geographic locations.
  • the search results may also be more meaningful to the users because they may include information associated with other geographic locations that are geographically close (or next) to the geographic locations in which the users are interested.
  • FIG. 1 is an exemplary diagram of a network 100 in which systems and methods consistent with the principles of the invention may be implemented.
  • Network 100 may include multiple clients 110 connected to multiple servers 120 - 140 via a network 150 .
  • Network 150 may include a local area network (LAN), a wide area network (WAN), a telephone network, such as the Public Switched Telephone Network (PSTN), an intranet, the Internet, a memory device, another type of network, or a combination of networks.
  • PSTN Public Switched Telephone Network
  • Clients 110 may include client entities.
  • An entity may be defined as a device, such as a wireless telephone, a personal computer, a personal digital assistant (PDA), a lap top, or another type of computation or communication device, a thread or process running on one of these devices, and/or an object executable by one of these device.
  • Servers 120 - 140 may include server entities that gather, process, search, and/or maintain documents in a manner consistent with the principles of the invention.
  • Clients 110 and servers 120 - 140 may connect to network 150 via wired, wireless, and/or optical connections.
  • server 120 may optionally include a search engine 125 usable by clients 110 .
  • Server 120 may crawl documents (e.g., web pages) and store information associated with these documents in a repository of crawled documents.
  • Servers 130 and 140 may store or maintain documents that may be crawled by server 120 .
  • servers 120 - 140 are shown as separate entities, it may be possible for one or more of servers 120 - 140 to perform one or more of the functions of another one or more of servers 120 - 140 . It may be possible that two or more of servers 120 - 140 are implemented as a single server or that one of servers 120 - 140 is implemented as multiple computing devices.
  • FIG. 2 is an exemplary diagram of a client or server entity (hereinafter called “client/server entity”), which may correspond to one or more of clients 110 and servers 120 - 140 , according to an implementation consistent with the principles of the invention.
  • the client/server entity may include a bus 210 , a processor 220 , a main memory 230 , a read only memory (ROM) 240 , a storage device 250 , one or more input devices 260 , one or more output devices 270 , and a communication interface 280 .
  • Bus 210 may include one or more conductors that permit communication among the components of the client/server entity.
  • Processor 220 may include any type of conventional processor or microprocessor that interprets and executes instructions.
  • Main memory 230 may include a random access memory (RAM) or another type of dynamic storage device that stores information and instructions for execution by processor 220 .
  • ROM 240 may include a conventional ROM device or another type of static storage device that stores static information and instructions for use by processor 220 .
  • Storage device 250 may include a magnetic and/or optical recording medium and its corresponding drive.
  • Input device(s) 260 may include one or more conventional mechanisms that permit an operator to input information to the client/server entity, such as a keyboard, a mouse, a pen, voice recognition and/or biometric mechanisms, etc.
  • Output device(s) 270 may include one or more conventional mechanisms that output information to the operator, including a display, a printer, a speaker, etc.
  • Communication interface 280 may include any transceiver-like mechanism that enables the client/server entity to communicate with other devices and/or systems.
  • communication interface 280 may include mechanisms for communicating with another device or system via a network, such as network 150 .
  • the client/server entity perform certain searching-related operations.
  • the client/server entity may perform these operations in response to processor 220 executing software instructions contained in a computer-readable medium, such as memory 230 .
  • a computer-readable medium may be defined as one or more physical or logical memory devices and/or carrier waves.
  • the software instructions may be read into memory 230 from another computer-readable medium, such as data storage device 250 , or from another device via communication interface 280 .
  • the software instructions contained in memory 230 causes processor 220 to perform processes that will be described later.
  • hardwired circuitry may be used in place of or in combination with software instructions to implement processes consistent with the principles of the invention.
  • implementations consistent with the principles of the invention are not limited to any specific combination of hardware circuitry and software.
  • FIG. 3 is a diagram of an exemplary computer-readable medium that may be associated with a server, such as server 120 in FIG. 1 , according to an implementation consistent with the principles of the invention.
  • the contents of computer-readable medium may physically reside in one or more memory devices accessible by server 120 .
  • Computer-readable medium may include a database 300 of entries corresponding to documents with associated addresses (e.g., postal addresses).
  • server 120 may analyze a repository of crawled documents to locate documents that contain one or more addresses. Server 120 may then identify and extract the addresses from the documents using a technique, such as the one described in U.S. patent application, Ser. No. ______, entitled “ADDRESS GEOCODING,” filed concurrently herewith, and incorporated herein by reference. In another implementation, the addresses could be manually extracted from the documents.
  • an address associated with a document may be inferred from other information sources.
  • the geographical locations of people accessing the document may be used to infer the address of the document.
  • the geographical locations of the people accessing the document may be determined based on their IP addresses. If most of the people accessing a document are in the same town, it can be inferred that the document has an address associated with the town.
  • a business name included in the document may also be used to infer the address of the document. From the business name, an address may be determined using, for example, yellow page data.
  • the geographical location of the server hosting the document may also be used to infer the address of the document.
  • Each of the entries in database 300 may include a document identifier field 310 and an address field 320 , which may be separately searchable.
  • the entries in database 300 may include more fields, such as additional address fields, and/or different fields, such as telephone number fields and/or fields for latitude and longitude coordinates corresponding to the information in address field 320 .
  • Document identifier field 310 may include information that uniquely identifies documents.
  • document identifier field 310 includes a uniform resource locator (URL) associated with a document.
  • Address field 320 may include information regarding an address associated with the corresponding document. It may be beneficial to note that the same address may be associated with more than one document. For example, as shown in FIG. 3 , address_K is associated with document doc — 1 and document doc — 2. In other words, there may be between zero and hundreds of documents that have the same associated address.
  • FIGS. 4A and 4B are flowcharts of exemplary processing for clustering search results by address according to an implementation consistent with the principles of the invention. Processing may begin with server 120 receiving a search query from a user (act 410 ) ( FIG. 4A ). For example, a user may use conventional web browser software on client 110 to access search engine 125 of server 120 . The user may then enter the search query via a graphical user interface provided by server 120 .
  • the search query may take different forms.
  • the search query may include one or more keywords relating to a business or organization in which the user is interested and, possibly, one or more geographical identifiers relating to a location at which the business or organization is located.
  • the keyword(s) may include temm(s) associated with the business or organization in which the user is interested. For example, if the user is looking for a pharmacy, the user may include the term “pharmacy” as a keyword. Likewise, if the user is looking for restaurants that serve pizza, the user may include the term “pizza” as a keyword.
  • the geographical identifier(s) may include location-specific information that approximately identifies the location of the business or organization in which the user is interested.
  • the geographical identifier(s) may include information, such as an entire or partial address or an entire or partial telephone number associated with a business or organization of interest.
  • the user might specify address-specific data, such as the state, city, zip code, street name, or some combination of this information.
  • address-specific data such as the state, city, zip code, street name, or some combination of this information.
  • telephone-specific data such as the area code, prefix, or some combination of this information.
  • Both the address-specific data and the telephone-specific data include information by which server 120 may determine a geographic location.
  • the geographic location may be as broad as a state, city, zip code, or area code or as specific as a street address or area code and prefix.
  • Server 120 may determine a geographic center of the geographic location (act 420 ). For example, if the user specified “Palo Alto,” then server 120 may identify the geographic center of Palo Alto. Likewise, if the user specified the zip code 22030, then server 120 may identify the geographic center of the region covered by that zip code. Server 120 may express the geographic center in terms of its latitude and longitude coordinates.
  • server 120 may identify a relevant geographic center based on information other than that explicitly provided by the user. For example, the user's IP address or past browsing history may be used to estimate a geographic center. Alternatively, the user may register a “home” location of the user with server 120 .
  • Server 120 may then identify an area that covers locations within a certain distance of the geographic center as an area of interest (act 430 ). For example, sever 120 , in effect, may draw a circle with a certain radius around the geographic center and identify the area within the circle as the area of interest.
  • the radius may be a predetermined radius, such as 5 miles or 10 miles.
  • server 120 may determine the radius based, at least in part, on the specificity of the geographical identifier(s). For example, server 120 may provide a smaller radius when the geographical identifier(s) correspond to a specific address (e.g., a street address) and a larger radius when the geographical identifier(s) correspond to a very broad address (e.g., a state).
  • server 120 may permit the user to define the radius. This may be a dynamic feature. For example, if the user is unhappy with the search results (e.g., the search results provide too many or too few results), the user may be permitted to either increase or decrease the radius.
  • the size of the radius may be dynamically set based on the type of the keyword(s) provided by the user. For example, the radius may be set at 5 miles for a restaurant search and 20 miles for a car dealership search.
  • server 120 may use driving distance to identify the area of interest.
  • server 120 may use distance (either straight line distance or driving distance) to a driving route to identify the area of interest.
  • server 120 may specify the area of interest as “along Highway 101 when driving from Mountain View to San Francisco.” Server 120 might use yet other ways to identify the area of interest.
  • Server 120 may identify documents that are associated with one or more addresses located within the area of interest as potential “hits” (act 440 ). For example, server 120 may use a database that matches documents from the repository of crawled documents to their associated addresses, such as database 300 ( FIG. 3 ), to identify documents that are associated with one or more addresses located within the area of interest. To facilitate the document identification, server 120 may search database 300 for addresses that fall within the area of interest and then identify the documents associated with these addresses.
  • Server 120 may then identify documents, of the potential hits, that include the one or more keywords provided by the user, as relevant results (act 450 ). For example, server 120 may analyze the words within the documents and determine whether these words match the one or more keywords. Documents that have words that match the one or more keywords may be classified as relevant results.
  • acts 440 and 450 may be reversed.
  • server 120 may determine documents matching the one or more keywords and then determine which of these documents are associated with an address within the area of interest.
  • acts 440 and 450 may be performed concurrently.
  • server 120 may determine the intersection of the two-separately identified groups of documents to identify the documents that are associated with an address within the area of interest.
  • a set of documents may be identified as relevant results.
  • Server 120 may score the relevant results (act 460 ) ( FIG. 4B ). Server 120 may use different factors in scoring the relevant results. For example, server 120 may consider distance and/or relevancy when determining the score for a document. Distance may refer to the distance that the address of a document is from the geographic center. Documents associated with addresses closer to the geographic center may be given a higher score than documents associated with addresses further from the geographic center. Relevancy may refer to the number of the keywords that the document contains and/or how prominently the one or more keywords are presented in the document. Documents containing all of the one or more keywords may be given a higher score than documents containing fewer than all of the one or more keywords. Documents containing the one or more keywords in a more prominent location, such as in a title, may be given a higher score than documents containing the one or more keywords in a less prominent location, such as in fine print.
  • Distance may refer to the distance that the address of a document is from the geographic center. Documents associated with addresses closer to the geographic center may be given
  • Server 120 may cluster documents, of the relevant results, associated with the same address (act 470 ). To facilitate this, server 120 may optionally sort the documents based on their scores. Server 120 may consider an address associated with a first one of the documents (e.g., a highest scoring document) and determine whether there are any other documents that are associated with this same address. Server 120 may then cluster these documents together, as being associated with the same address. Server 120 may then consider another address associated with the first document, if there is one that is also located within the area of interest, or an address associated with a second one of the documents (e.g., a next highest scoring document) and determine whether there are any other documents that are associated with this same address. Server 120 may then cluster these documents together.
  • a first one of the documents e.g., a highest scoring document
  • Server 120 may then cluster these documents together, as being associated with the same address.
  • Server 120 may then consider another address associated with the first document, if there is one that is also located within the area of interest, or an address associated with a
  • Server 120 may continue until all of the documents have been included in at least one cluster, even if the cluster is a cluster of one (which would occur when the document is associated with an address that is not associated with any other document). Server 120 may sort the documents within each of the clusters based on their scores, if they are not already in order from an earlier sorting (described above).
  • Server 120 may rank the clusters to form a result list (act 480 ). Server 120 may use different factors in ranking the clusters. For example, server 120 may consider distance and relevancy when ranking the clusters. Distance for a cluster may refer to the distance that the address associated with the cluster is from the geographic center. Clusters with addresses closer to the geographic center may be ranked higher than clusters with addresses further from the geographic center.
  • Relevancy for a cluster may refer to the number of the keywords that the documents in the cluster contain and/or how prominently the one or more keywords are presented in the documents.
  • server 120 may consider a predetermined number (e.g., one, three, all, etc.) of the highest scoring documents in the cluster.
  • Clusters with document(s) containing all of the one or more keywords may be ranked higher than clusters with document(s) containing fewer than all of the one or more keywords.
  • clusters with document(s) containing the one or more keywords in a more prominent location such as in a title, may be ranked higher than clusters with documents containing the one or more keywords in a less prominent location, such as in fine print.
  • Server 120 may give more weight to either distance or relevancy based at least in part, for example, on the specificity of the geographical identifier(s). For example, if the geographical identifier(s) are broad (e.g., the geographical identifier(s) correspond to a large geographical area, such as a state or large city), then server 120 may give relevancy more weight. If the geographical identifier(s) are narrow (e.g., the geographical identifier(s) correspond to a small geographical area, such as a small town, an exact address, or a nearly-exact address), then server 120 may give distance more weight.
  • the geographical identifier(s) are broad (e.g., the geographical identifier(s) correspond to a large geographical area, such as a state or large city), then server 120 may give relevancy more weight. If the geographical identifier(s) are narrow (e.g., the geographical identifier(s) correspond to a small geographical area, such as a small town, an exact address, or a nearly
  • server 120 may rank the clusters based on the scores of the documents they contain.
  • server 120 may consider a predetermined number (e.g., one, three, all, etc.) of the highest scoring documents in the cluster.
  • Server 120 may add the scores of these documents together or use another technique, such as an averaging technique, to determine the cluster rank.
  • Server 120 may then sort and present the result list to the user (act 490 ). For example, server 120 may sort the clusters based on their rank. Server 120 may then create a result output for each cluster, which may be presented to the user.
  • a result output for a cluster may include the title (which may contain a hypertext link that will direct the user, when selected, to the actual document) and a snippet (i.e., a text excerpt) from the highest-scoring document in the cluster.
  • the result output may also include titles (e.g., hypertext links) of one or more other next-highest-scoring documents, possibly also with a snippet or the URLs associated with these documents.
  • the result output may further include a “See More” option that, when selected by the user, may display titles, snippets, and/or URLs of additional ones of the remaining documents in the cluster.
  • Server 120 may present the result outputs along with a map that illustrates locations corresponding to the addresses associated with the result outputs.
  • the result output for a cluster may include a business or organization name associated with the cluster, possibly, along with the address associated with the cluster.
  • Server 120 may analyze the words of the documents in the cluster to determine a business or organization corresponding to the address associated with the cluster.
  • the result output in this implementation, may also include a title, snippet, and/or URL for a predetermined number (e.g., four) of the highest-scoring documents in the cluster.
  • the result output in this implementation, may further include a “See More” option that, when selected by the user, may display titles, snippets, and/or URLs of additional ones of the remaining documents in the cluster.
  • Server 120 may present the result outputs along with a map that illustrates locations corresponding to the addresses associated with the result outputs.
  • a good result may not necessarily include the word(s) making up the geographical identifier(s). In other words, a good result may be associated with a location different from the location of the geographical identifier(s) (though still within the area of interest). It may also be beneficial to note that for a document to be excluded from the result list, the document may not be associated with an address located within the area of interest or any of the keyword(s) provided by the user.
  • a user desires to find restaurants that serve pizza in the Palo Alto area.
  • the user may access a server using conventional web browser software.
  • the user provides the following search query: “pizza Palo Alto.”
  • the server may recognize the search query as a search for a business or organization in a certain area based at least in part on the presence of one or more keywords (“pizza”) and one or more geographical identifiers (“Palo Alto”).
  • FIG. 5 is a functional block diagram of a portion of the server according to this exemplary implementation consistent with the principles of the invention.
  • the server may determine the geographic center of Palo Alto, possibly in terms of its latitude and longitude coordinates.
  • the server may then identify the area of interest. Assume that the radius is set at 10 miles.
  • the server may determine the area of interest to include the area within a circle centered on the geographic center of Palo Alto with a 10 mile radius.
  • the server may analyze documents from the repository of crawled documents to identify documents that are associated with one or more addresses located within the area of interest as potential hits. Assume that the potential hits include 50 documents: documents 1 through 50. Some of these documents may be associated with more than one address located within the area of interest. For example, document 1 is associated with four addresses, three of which are located within the area of interest. The shaded block indicates an address that is not located within the area of interest. Similarly, document 2 is associated with three addresses, all of which are located within the area of interest. Document 50 is associated with three addresses, two of which are located within the area of interest.
  • the server may then identify which of documents 1 though 50 include the keyword “pizza.” For example, the server may analyze the words within the documents and determine whether any of these words match the keyword “pizza.” Documents that include the word “pizza” may be classified as relevant results. Assume that only documents 1 through 25 include the word “pizza” and, thus, make up the relevant results.
  • the server may score the relevant results based, for example, on distance and/or relevancy. Assume that document 1 contains an address that is closest to the geographic center of Palo Alto and includes the word “pizza” in a prominent place, like its title. The server may then score document 1 higher than the rest of the documents. Assume further that document 25 contains a single address that is furthest from the geographic center and includes the word “pizza” in very small print. The server may score document 25 lower than the rest of the documents.
  • the server may cluster documents 1 through 25 based on the addresses they contain.
  • the server may sort the documents based on their scores and consider an address associated with one of the documents (e.g., document 1) to determine whether there are any other documents that are associated with this same address. Assume that documents 3 and 12 are associated with the same address.
  • the server may cluster documents 1, 3, and 12 as being associated with the same address.
  • the server may then consider another address, such as another address associated with document 1 or another document.
  • the server may then determine whether there are any other documents that are associated with this address. Assume that there are several documents that are associated with the address of which document 25 is one.
  • the server may then cluster documents 1, . . . , 25 as being associated with the same address.
  • the server may continue this process until no additional clusters can be formed. There should be one cluster formed for each distinct address contained in one or more of documents 1 through 25. Assume that there are 10 distinct addresses and, thus, 10 clusters formed. As shown in FIG. 5 , some of the clusters may include the same documents. For example, both clusters 1 and 2 include document 1.
  • FIG. 6A is a diagram of an exemplary result list according to an implementation consistent with the principles of the invention.
  • the result list contains two result outputs 610 and 620 , corresponding to two clusters.
  • Result output 610 refers to four documents 612 , 614 , 616 , and 618 in the cluster.
  • Document 612 may correspond to the highest-scoring document in the cluster.
  • the server may include the title and a snippet.
  • Documents 614 - 618 may correspond to lesser-scoring documents.
  • the server may include the title and/or the URL associated with these documents.
  • FIG. 6B is a diagram of an exemplary result list according to another implementation consistent with the principles of the invention.
  • the result list contains two result outputs 650 and 660 , corresponding to two clusters.
  • Result output 650 includes a business name and, possibly, the address associated with the cluster 652 and refers to three documents 654 , 656 , and 658 in the cluster.
  • Documents 654 - 658 may be ordered by their scores.
  • the server may include the title and/or the URL associated with these documents.
  • Systems and methods consistent with the principles of the invention cluster search results based on locations (or telephone numbers) of interest to users.
  • the users might provide data associated with a business or organization and, possibly, a location of the business or organization.
  • the users might provide the location data as broadly or narrowly as they desire. They may also dynamically broaden or narrow the location data to obtain more or fewer results.
  • clustering has been described thus far as grouping documents based on the addresses with which they are associated.
  • clustering may be performed to group documents based on the telephone numbers with which they are associated. For example, a user might provide a partial telephone number in the search query. The server may identify documents that are associated with the partial telephone number and match any keyword(s) also included in the search query. The server may then cluster the documents based on the telephone numbers with which they are associated and present the clusters as search results to the user.
  • the previously-described acts may be used to target, and possibly cluster, advertisements to users.
  • the keyword(s) and geographical identifier(s) may be used to determine interests and locations of the users.
  • the server may use these interests and locations to identify advertisements to present to the users along with the search results.
  • the server might present the user with advertisements regarding other restaurants (maybe ones not serving pizza) in the Palo Alto area (or within the area of interest).
  • advertisements may be clustered in a manner similar to that described above.

Abstract

A system forms search results clustered by address or telephone number. When clustering by address, the system may receive a search query and identify a geographical area of interest based, at least in part, on the search query. The system may identify documents that are associated with addresses located within the geographical area of interest, group the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest, and present the clusters as the search results. When clustering by telephone number, the system may receive a search query that includes at least one portion of a telephone number and identify documents that are associated with telephone numbers that match the at least one portion of the telephone number. The system may group the identified documents into clusters based on the telephone numbers included in the identified documents and present the clusters as the search results.

Description

    BACKGROUND OF THE INVENTION
  • 1. Field of the Invention
  • The present invention relates generally to information retrieval systems and, more particularly, to systems and methods for clustering search results by address and/or telephone number.
  • 2. Description of Related Art
  • The World Wide Web (“web”) contains a vast amount of information. Locating a desired portion of the information, however, can be challenging. This problem is compounded because the amount of information on the web and the number of new users inexperienced at web searching are growing rapidly. Search engines attempt to return hyperlinks to web documents in which a user is interested. Generally, search engines base their determination of the user's interest on search terms (called a search query) entered by the user. The goal of the search engine is to provide links to high quality, relevant results to the user based on the search query. Typically, the search engine accomplishes this by matching the terms in the search query to a corpus of pre-stored web documents. Web documents that contain the user's search terms are “hits” and are returned to the user.
  • Some web documents may be of particular interest to users that reside in certain geographical areas. For example, web documents associated with local businesses or organizations may be of most relevance to individuals located in the geographical area of the local businesses/organizations.
  • When a user desires information regarding a type of business (e.g., a restaurant, a hardware store, a pharmacy, etc.) within a certain geographical area, the user may provide one or more keywords associated with the business type and the geographical area to a search engine. The search engine returns search results that include web documents associated with the business type.
  • One problem with these search results is that the search results typically will not include web documents associated with businesses or organizations outside the geographical area identified by the user, even if these businesses or organizations are located in an area geographically close (or next) to the geographical area identified by the user. Another problem with these search results is that the search results typically include more than one, and oftentimes many, web documents associated with the same business location, possibly requiring the user to peruse many web documents in the search results that are irrelevant to the business of interest before locating all of the web documents associated with the business of interest.
  • As a result, there is a need for systems and methods for organizing search results in a manner that is meaningful to users, given that there are a set number of unique locations in the world and anywhere between zero and hundreds of web documents that describe each location.
  • SUMMARY OF THE INVENTION
  • Systems and methods, consistent with the principles of the invention, cluster web documents based at least in part on addresses (or telephone numbers) included in the web documents.
  • In accordance with one aspect consistent with the principles of the invention, a method for clustering by address is provided. The method may include receiving a search query, identifying a geographical area of interest based, at least in part, on the search query, and identifying documents that include addresses located within the geographical area of interest. The method may also include grouping the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest and presenting the clusters.
  • According to another aspect, a system for forming search results is provided. The system may include a processor and a memory configured to store information that associates documents to addresses included in the documents. The processor is configured to receive a search query, determine a geographical area of interest based, at least in part, on the search query, and identify documents that include addresses located within the geographical area of interest based, at least in part, on the information stored in the memory. The processor is also configured to group the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest and provide the clusters as the search results.
  • According to yet another aspect, a method for forming search results is provided. The method may include receiving a search query that includes at least one portion of a telephone number and identifying documents that include telephone numbers that match the at least one portion of the telephone number. The method may also include grouping the identified documents into clusters based on the telephone numbers included in the identified documents and presenting the clusters as the search results.
  • According to a further aspect, a system for forming search results is provided. The system may include means for receiving a search query, means for identifying a geographical location, means for determining a geographical center of the geographical location, and means for identifying locations within a certain distance of the geographical center as a geographical area of interest. The system may also include means for identifying documents that include addresses located within the geographical area of interest and means for determining relevant ones of the identified documents, as relevant documents, based, at least in part, on the search query. The relevant documents may form the search results.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate an embodiment of the invention and, together with the description, explain the invention. In the drawings,
  • FIG. 1 is a diagram of an exemplary network in which systems and methods consistent with the principles of the invention may be implemented;
  • FIG. 2 is an exemplary diagram of a client and/or server of FIG. 1 in an implementation consistent with the principles of the invention;
  • FIG. 3 is a diagram of an exemplary computer-readable medium that may be used by a server of FIG. 1 according to an implementation consistent with the principles of the invention;
  • FIGS. 4A and 4B are flowcharts of exemplary processing for clustering search results by address according to an implementation consistent with the principles of the invention;
  • FIG. 5 is a functional block diagram of a portion of a server according to this exemplary implementation consistent with the principles of the invention;
  • FIG. 6A is a diagram of an exemplary result list according to an implementation consistent with the principles of the invention; and
  • FIG. 6B is a diagram of an exemplary result list according to another implementation consistent with the principles of the invention.
  • DETAILED DESCRIPTION
  • The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings may identify the same or similar elements. Also, the following detailed description does not limit the invention.
  • Systems and methods consistent with the principles of the invention may provide search results that are clustered by address (or telephone number) to provide search results that are meaningful to users looking for information associated with particular geographic locations. The search results may also be more meaningful to the users because they may include information associated with other geographic locations that are geographically close (or next) to the geographic locations in which the users are interested.
  • Exemplary Network Configuration
  • FIG. 1 is an exemplary diagram of a network 100 in which systems and methods consistent with the principles of the invention may be implemented. Network 100 may include multiple clients 110 connected to multiple servers 120-140 via a network 150. Network 150 may include a local area network (LAN), a wide area network (WAN), a telephone network, such as the Public Switched Telephone Network (PSTN), an intranet, the Internet, a memory device, another type of network, or a combination of networks. Two clients 110 and three servers 120-140 have been illustrated as connected to network 150 for simplicity. In practice, there may be more or fewer clients and servers. Also, in some instances, a client may perform the functions of a server and a server may perform the functions of a client.
  • Clients 110 may include client entities. An entity may be defined as a device, such as a wireless telephone, a personal computer, a personal digital assistant (PDA), a lap top, or another type of computation or communication device, a thread or process running on one of these devices, and/or an object executable by one of these device. Servers 120-140 may include server entities that gather, process, search, and/or maintain documents in a manner consistent with the principles of the invention. Clients 110 and servers 120-140 may connect to network 150 via wired, wireless, and/or optical connections.
  • In an implementation consistent with the principles of the invention, server 120 may optionally include a search engine 125 usable by clients 110. Server 120 may crawl documents (e.g., web pages) and store information associated with these documents in a repository of crawled documents. Servers 130 and 140 may store or maintain documents that may be crawled by server 120. While servers 120-140 are shown as separate entities, it may be possible for one or more of servers 120-140 to perform one or more of the functions of another one or more of servers 120-140. It may be possible that two or more of servers 120-140 are implemented as a single server or that one of servers 120-140 is implemented as multiple computing devices.
  • Exemplary Client/Server Architecture
  • FIG. 2 is an exemplary diagram of a client or server entity (hereinafter called “client/server entity”), which may correspond to one or more of clients 110 and servers 120-140, according to an implementation consistent with the principles of the invention. The client/server entity may include a bus 210, a processor 220, a main memory 230, a read only memory (ROM) 240, a storage device 250, one or more input devices 260, one or more output devices 270, and a communication interface 280. Bus 210 may include one or more conductors that permit communication among the components of the client/server entity.
  • Processor 220 may include any type of conventional processor or microprocessor that interprets and executes instructions. Main memory 230 may include a random access memory (RAM) or another type of dynamic storage device that stores information and instructions for execution by processor 220. ROM 240 may include a conventional ROM device or another type of static storage device that stores static information and instructions for use by processor 220. Storage device 250 may include a magnetic and/or optical recording medium and its corresponding drive.
  • Input device(s) 260 may include one or more conventional mechanisms that permit an operator to input information to the client/server entity, such as a keyboard, a mouse, a pen, voice recognition and/or biometric mechanisms, etc. Output device(s) 270 may include one or more conventional mechanisms that output information to the operator, including a display, a printer, a speaker, etc. Communication interface 280 may include any transceiver-like mechanism that enables the client/server entity to communicate with other devices and/or systems. For example, communication interface 280 may include mechanisms for communicating with another device or system via a network, such as network 150.
  • As will be described in detail below, the client/server entity, consistent with the principles of the invention, perform certain searching-related operations. The client/server entity may perform these operations in response to processor 220 executing software instructions contained in a computer-readable medium, such as memory 230. A computer-readable medium may be defined as one or more physical or logical memory devices and/or carrier waves.
  • The software instructions may be read into memory 230 from another computer-readable medium, such as data storage device 250, or from another device via communication interface 280. The software instructions contained in memory 230 causes processor 220 to perform processes that will be described later. Alternatively, hardwired circuitry may be used in place of or in combination with software instructions to implement processes consistent with the principles of the invention. Thus, implementations consistent with the principles of the invention are not limited to any specific combination of hardware circuitry and software.
  • Exemplary Computer-Readable Medium
  • FIG. 3 is a diagram of an exemplary computer-readable medium that may be associated with a server, such as server 120 in FIG. 1, according to an implementation consistent with the principles of the invention. The contents of computer-readable medium may physically reside in one or more memory devices accessible by server 120.
  • Computer-readable medium may include a database 300 of entries corresponding to documents with associated addresses (e.g., postal addresses). For example, server 120 may analyze a repository of crawled documents to locate documents that contain one or more addresses. Server 120 may then identify and extract the addresses from the documents using a technique, such as the one described in U.S. patent application, Ser. No. ______, entitled “ADDRESS GEOCODING,” filed concurrently herewith, and incorporated herein by reference. In another implementation, the addresses could be manually extracted from the documents.
  • In yet another implementation, an address associated with a document may be inferred from other information sources. For example, the geographical locations of people accessing the document may be used to infer the address of the document. The geographical locations of the people accessing the document may be determined based on their IP addresses. If most of the people accessing a document are in the same town, it can be inferred that the document has an address associated with the town. A business name included in the document may also be used to infer the address of the document. From the business name, an address may be determined using, for example, yellow page data. The geographical location of the server hosting the document may also be used to infer the address of the document. These and other techniques for inferring an address in a document are described in U.S. patent application, Ser. No. ______, entitled “DETERMINING GEOGRAPHICAL RELEVANCE OF WEB DOCUMENTS,” filed concurrently herewith, and incorporated herein by reference. In any event, server 120 may use these addresses to populate database 300.
  • Each of the entries in database 300 may include a document identifier field 310 and an address field 320, which may be separately searchable. In other implementations consistent with the principles of the invention, the entries in database 300 may include more fields, such as additional address fields, and/or different fields, such as telephone number fields and/or fields for latitude and longitude coordinates corresponding to the information in address field 320.
  • Document identifier field 310 may include information that uniquely identifies documents. In one implementation, document identifier field 310 includes a uniform resource locator (URL) associated with a document. Address field 320 may include information regarding an address associated with the corresponding document. It may be beneficial to note that the same address may be associated with more than one document. For example, as shown in FIG. 3, address_K is associated with document doc 1 and document doc 2. In other words, there may be between zero and hundreds of documents that have the same associated address.
  • Exemplary Processing
  • FIGS. 4A and 4B are flowcharts of exemplary processing for clustering search results by address according to an implementation consistent with the principles of the invention. Processing may begin with server 120 receiving a search query from a user (act 410) (FIG. 4A). For example, a user may use conventional web browser software on client 110 to access search engine 125 of server 120. The user may then enter the search query via a graphical user interface provided by server 120.
  • The search query may take different forms. For example, the search query may include one or more keywords relating to a business or organization in which the user is interested and, possibly, one or more geographical identifiers relating to a location at which the business or organization is located. The keyword(s) may include temm(s) associated with the business or organization in which the user is interested. For example, if the user is looking for a pharmacy, the user may include the term “pharmacy” as a keyword. Likewise, if the user is looking for restaurants that serve pizza, the user may include the term “pizza” as a keyword.
  • The geographical identifier(s) may include location-specific information that approximately identifies the location of the business or organization in which the user is interested. The geographical identifier(s) may include information, such as an entire or partial address or an entire or partial telephone number associated with a business or organization of interest. For example, the user might specify address-specific data, such as the state, city, zip code, street name, or some combination of this information. Alternatively, the user might specify telephone-specific data, such as the area code, prefix, or some combination of this information.
  • Both the address-specific data and the telephone-specific data include information by which server 120 may determine a geographic location. The geographic location may be as broad as a state, city, zip code, or area code or as specific as a street address or area code and prefix. Server 120 may determine a geographic center of the geographic location (act 420). For example, if the user specified “Palo Alto,” then server 120 may identify the geographic center of Palo Alto. Likewise, if the user specified the zip code 22030, then server 120 may identify the geographic center of the region covered by that zip code. Server 120 may express the geographic center in terms of its latitude and longitude coordinates.
  • In other implementations, server 120 may identify a relevant geographic center based on information other than that explicitly provided by the user. For example, the user's IP address or past browsing history may be used to estimate a geographic center. Alternatively, the user may register a “home” location of the user with server 120.
  • Server 120 may then identify an area that covers locations within a certain distance of the geographic center as an area of interest (act 430). For example, sever 120, in effect, may draw a circle with a certain radius around the geographic center and identify the area within the circle as the area of interest. The radius may be a predetermined radius, such as 5 miles or 10 miles. In another implementation, server 120 may determine the radius based, at least in part, on the specificity of the geographical identifier(s). For example, server 120 may provide a smaller radius when the geographical identifier(s) correspond to a specific address (e.g., a street address) and a larger radius when the geographical identifier(s) correspond to a very broad address (e.g., a state). In yet another implementation, server 120 may permit the user to define the radius. This may be a dynamic feature. For example, if the user is unhappy with the search results (e.g., the search results provide too many or too few results), the user may be permitted to either increase or decrease the radius. In a further implementation, the size of the radius may be dynamically set based on the type of the keyword(s) provided by the user. For example, the radius may be set at 5 miles for a restaurant search and 20 miles for a car dealership search.
  • Instead of using a radius, server 120 may use driving distance to identify the area of interest. Alternatively, server 120 may use distance (either straight line distance or driving distance) to a driving route to identify the area of interest. For example, server 120 may specify the area of interest as “along Highway 101 when driving from Mountain View to San Francisco.” Server 120 might use yet other ways to identify the area of interest.
  • Server 120 may identify documents that are associated with one or more addresses located within the area of interest as potential “hits” (act 440). For example, server 120 may use a database that matches documents from the repository of crawled documents to their associated addresses, such as database 300 (FIG. 3), to identify documents that are associated with one or more addresses located within the area of interest. To facilitate the document identification, server 120 may search database 300 for addresses that fall within the area of interest and then identify the documents associated with these addresses.
  • Server 120 may then identify documents, of the potential hits, that include the one or more keywords provided by the user, as relevant results (act 450). For example, server 120 may analyze the words within the documents and determine whether these words match the one or more keywords. Documents that have words that match the one or more keywords may be classified as relevant results.
  • In another implementation consistent with the principles of the invention, acts 440 and 450 may be reversed. For example, server 120 may determine documents matching the one or more keywords and then determine which of these documents are associated with an address within the area of interest. In yet another implementation, acts 440 and 450 may be performed concurrently. In this case, server 120 may determine the intersection of the two-separately identified groups of documents to identify the documents that are associated with an address within the area of interest. In any event, a set of documents may be identified as relevant results.
  • Server 120 may score the relevant results (act 460) (FIG. 4B). Server 120 may use different factors in scoring the relevant results. For example, server 120 may consider distance and/or relevancy when determining the score for a document. Distance may refer to the distance that the address of a document is from the geographic center. Documents associated with addresses closer to the geographic center may be given a higher score than documents associated with addresses further from the geographic center. Relevancy may refer to the number of the keywords that the document contains and/or how prominently the one or more keywords are presented in the document. Documents containing all of the one or more keywords may be given a higher score than documents containing fewer than all of the one or more keywords. Documents containing the one or more keywords in a more prominent location, such as in a title, may be given a higher score than documents containing the one or more keywords in a less prominent location, such as in fine print.
  • Server 120 may cluster documents, of the relevant results, associated with the same address (act 470). To facilitate this, server 120 may optionally sort the documents based on their scores. Server 120 may consider an address associated with a first one of the documents (e.g., a highest scoring document) and determine whether there are any other documents that are associated with this same address. Server 120 may then cluster these documents together, as being associated with the same address. Server 120 may then consider another address associated with the first document, if there is one that is also located within the area of interest, or an address associated with a second one of the documents (e.g., a next highest scoring document) and determine whether there are any other documents that are associated with this same address. Server 120 may then cluster these documents together. Server 120 may continue until all of the documents have been included in at least one cluster, even if the cluster is a cluster of one (which would occur when the document is associated with an address that is not associated with any other document). Server 120 may sort the documents within each of the clusters based on their scores, if they are not already in order from an earlier sorting (described above).
  • Server 120 may rank the clusters to form a result list (act 480). Server 120 may use different factors in ranking the clusters. For example, server 120 may consider distance and relevancy when ranking the clusters. Distance for a cluster may refer to the distance that the address associated with the cluster is from the geographic center. Clusters with addresses closer to the geographic center may be ranked higher than clusters with addresses further from the geographic center.
  • Relevancy for a cluster may refer to the number of the keywords that the documents in the cluster contain and/or how prominently the one or more keywords are presented in the documents. When considering the documents in a cluster, server 120 may consider a predetermined number (e.g., one, three, all, etc.) of the highest scoring documents in the cluster. Clusters with document(s) containing all of the one or more keywords may be ranked higher than clusters with document(s) containing fewer than all of the one or more keywords. Further, clusters with document(s) containing the one or more keywords in a more prominent location, such as in a title, may be ranked higher than clusters with documents containing the one or more keywords in a less prominent location, such as in fine print.
  • Server 120 may give more weight to either distance or relevancy based at least in part, for example, on the specificity of the geographical identifier(s). For example, if the geographical identifier(s) are broad (e.g., the geographical identifier(s) correspond to a large geographical area, such as a state or large city), then server 120 may give relevancy more weight. If the geographical identifier(s) are narrow (e.g., the geographical identifier(s) correspond to a small geographical area, such as a small town, an exact address, or a nearly-exact address), then server 120 may give distance more weight.
  • Instead of the above-described ranking scheme, server 120 may rank the clusters based on the scores of the documents they contain. When determining the rank of a cluster, server 120 may consider a predetermined number (e.g., one, three, all, etc.) of the highest scoring documents in the cluster. Server 120 may add the scores of these documents together or use another technique, such as an averaging technique, to determine the cluster rank.
  • Server 120 may then sort and present the result list to the user (act 490). For example, server 120 may sort the clusters based on their rank. Server 120 may then create a result output for each cluster, which may be presented to the user. A result output for a cluster may include the title (which may contain a hypertext link that will direct the user, when selected, to the actual document) and a snippet (i.e., a text excerpt) from the highest-scoring document in the cluster. The result output may also include titles (e.g., hypertext links) of one or more other next-highest-scoring documents, possibly also with a snippet or the URLs associated with these documents. The result output may further include a “See More” option that, when selected by the user, may display titles, snippets, and/or URLs of additional ones of the remaining documents in the cluster. Server 120 may present the result outputs along with a map that illustrates locations corresponding to the addresses associated with the result outputs.
  • In another implementation, the result output for a cluster may include a business or organization name associated with the cluster, possibly, along with the address associated with the cluster. Server 120 may analyze the words of the documents in the cluster to determine a business or organization corresponding to the address associated with the cluster. The result output, in this implementation, may also include a title, snippet, and/or URL for a predetermined number (e.g., four) of the highest-scoring documents in the cluster. The result output, in this implementation, may further include a “See More” option that, when selected by the user, may display titles, snippets, and/or URLs of additional ones of the remaining documents in the cluster. Server 120 may present the result outputs along with a map that illustrates locations corresponding to the addresses associated with the result outputs.
  • It may be beneficial to note that a good result may not necessarily include the word(s) making up the geographical identifier(s). In other words, a good result may be associated with a location different from the location of the geographical identifier(s) (though still within the area of interest). It may also be beneficial to note that for a document to be excluded from the result list, the document may not be associated with an address located within the area of interest or any of the keyword(s) provided by the user.
  • EXAMPLE
  • Assume that a user desires to find restaurants that serve pizza in the Palo Alto area. The user may access a server using conventional web browser software. Assume that the user provides the following search query: “pizza Palo Alto.” The server may recognize the search query as a search for a business or organization in a certain area based at least in part on the presence of one or more keywords (“pizza”) and one or more geographical identifiers (“Palo Alto”).
  • FIG. 5 is a functional block diagram of a portion of the server according to this exemplary implementation consistent with the principles of the invention. The server may determine the geographic center of Palo Alto, possibly in terms of its latitude and longitude coordinates. The server may then identify the area of interest. Assume that the radius is set at 10 miles. The server may determine the area of interest to include the area within a circle centered on the geographic center of Palo Alto with a 10 mile radius.
  • The server may analyze documents from the repository of crawled documents to identify documents that are associated with one or more addresses located within the area of interest as potential hits. Assume that the potential hits include 50 documents: documents 1 through 50. Some of these documents may be associated with more than one address located within the area of interest. For example, document 1 is associated with four addresses, three of which are located within the area of interest. The shaded block indicates an address that is not located within the area of interest. Similarly, document 2 is associated with three addresses, all of which are located within the area of interest. Document 50 is associated with three addresses, two of which are located within the area of interest.
  • The server may then identify which of documents 1 though 50 include the keyword “pizza.” For example, the server may analyze the words within the documents and determine whether any of these words match the keyword “pizza.” Documents that include the word “pizza” may be classified as relevant results. Assume that only documents 1 through 25 include the word “pizza” and, thus, make up the relevant results.
  • The server may score the relevant results based, for example, on distance and/or relevancy. Assume that document 1 contains an address that is closest to the geographic center of Palo Alto and includes the word “pizza” in a prominent place, like its title. The server may then score document 1 higher than the rest of the documents. Assume further that document 25 contains a single address that is furthest from the geographic center and includes the word “pizza” in very small print. The server may score document 25 lower than the rest of the documents.
  • The server may cluster documents 1 through 25 based on the addresses they contain. The server may sort the documents based on their scores and consider an address associated with one of the documents (e.g., document 1) to determine whether there are any other documents that are associated with this same address. Assume that documents 3 and 12 are associated with the same address. The server may cluster documents 1, 3, and 12 as being associated with the same address.
  • The server may then consider another address, such as another address associated with document 1 or another document. The server may then determine whether there are any other documents that are associated with this address. Assume that there are several documents that are associated with the address of which document 25 is one. The server may then cluster documents 1, . . . , 25 as being associated with the same address.
  • The server may continue this process until no additional clusters can be formed. There should be one cluster formed for each distinct address contained in one or more of documents 1 through 25. Assume that there are 10 distinct addresses and, thus, 10 clusters formed. As shown in FIG. 5, some of the clusters may include the same documents. For example, both clusters 1 and 2 include document 1.
  • The server may then rank and sort the clusters to form a result list and present the result list to the user. FIG. 6A is a diagram of an exemplary result list according to an implementation consistent with the principles of the invention. As shown in FIG. 6A, the result list contains two result outputs 610 and 620, corresponding to two clusters. Result output 610 refers to four documents 612, 614, 616, and 618 in the cluster. Document 612 may correspond to the highest-scoring document in the cluster. For document 612, the server may include the title and a snippet. Documents 614-618 may correspond to lesser-scoring documents. As shown in FIG. 6A, the server may include the title and/or the URL associated with these documents.
  • FIG. 6B is a diagram of an exemplary result list according to another implementation consistent with the principles of the invention. As shown in FIG. 6B, the result list contains two result outputs 650 and 660, corresponding to two clusters. Result output 650 includes a business name and, possibly, the address associated with the cluster 652 and refers to three documents 654, 656, and 658 in the cluster. Documents 654-658 may be ordered by their scores. As shown in FIG. 6B, the server may include the title and/or the URL associated with these documents.
  • CONCLUSION
  • Systems and methods consistent with the principles of the invention cluster search results based on locations (or telephone numbers) of interest to users. The users might provide data associated with a business or organization and, possibly, a location of the business or organization. The users might provide the location data as broadly or narrowly as they desire. They may also dynamically broaden or narrow the location data to obtain more or fewer results.
  • The foregoing description of preferred embodiments of the present invention provides illustration and description, but is not intended to be exhaustive or to limit the invention to the precise form disclosed. Modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. For example, while series of acts have been described with regard to FIGS. 4A and 4B, the order of the acts may be modified in other implementations consistent with the principles of the invention. Also, non-dependent acts may be performed in parallel.
  • Also, clustering has been described thus far as grouping documents based on the addresses with which they are associated. In other implementations consistent with the principles of the invention, clustering may be performed to group documents based on the telephone numbers with which they are associated. For example, a user might provide a partial telephone number in the search query. The server may identify documents that are associated with the partial telephone number and match any keyword(s) also included in the search query. The server may then cluster the documents based on the telephone numbers with which they are associated and present the clusters as search results to the user.
  • Further, while the preceding description focused on generating search results that are meaningful to a user, it is not so limited. For example, the previously-described acts may be used to target, and possibly cluster, advertisements to users. The keyword(s) and geographical identifier(s) may be used to determine interests and locations of the users. The server may use these interests and locations to identify advertisements to present to the users along with the search results. In the example in which a user is searching for restaurants that serve pizza in the Palo Alto area, the server might present the user with advertisements regarding other restaurants (maybe ones not serving pizza) in the Palo Alto area (or within the area of interest). These, or other, advertisements may be clustered in a manner similar to that described above.

Claims (33)

1. A method for clustering by address, comprising:
receiving a search query that includes one or more keywords;
obtaining one or more geographical identifiers;
identifying an area of interest based, at least in part, on the one or more geographical identifiers;
identifying documents that are associated with addresses located within the area of interest;
determining ones of the identified documents that match the one or more keywords as relevant documents;
grouping the relevant documents into clusters based, at least in part, on the addresses located within the area of interest; and
presenting the clusters.
2. The method of claim 1, wherein the geographical identifiers are received as part of the search query.
3. The method of claim 1, wherein the geographical identifiers are inferred independent of the search query.
4. The method of claim 1, wherein the one or more keywords relate to a business or organization.
5. The method of claim 4, wherein the one or more geographical identifiers include location-specific information that approximately identifies a location of the business or organization.
6. The method of claim 1, wherein the one or more geographical identifiers include at least one of a partial address, a partial telephone number, an entire address, and an entire telephone number.
7. The method of claim 1, wherein the identifying an area of interest includes:
determining a geographic location based, at least in part, on the one or more geographical identifiers,
determining a geographic center of the geographic location, and
identifying locations within a certain distance of the geographic center as the area of interest.
8. The method of claim 7, wherein the identifying locations includes:
determining a radius, and
identifying the area of interest as a circle centered on the geographic center with the determined radius.
9. The method of claim 8, wherein the radius is one of a predetermined radius and a radius set based on a specificity of the one or more geographical identifiers.
10. The method of claim 8, wherein the radius is a user-configurable radius.
11. The method of claim 8, wherein the radius is dynamically set based, at least in part, on the one or more keywords.
12. The method of claim 1, wherein the identifying documents includes:
accessing a database that associates documents from a repository of crawled documents to addresses associated with the documents.
13. The method of claim 1, further comprising:
scoring the relevant documents based on at least one of a distance factor and a relevancy factor.
14. The method of claim 13, wherein the distance factor for one of the relevant documents refers to a distance that an address associated with the one of the relevant documents is from a geographic center of the area of interest.
15. The method of claim 13, wherein the relevancy factor for one of the relevant documents refers to at least one of a number of the one or more keywords present in the one of the relevant documents and how prominently the one or more keywords appear in the one of the relevant documents.
16. The method of claim 1, wherein the grouping the relevant documents into clusters includes:
forming a separate one of the clusters for each of the addresses located within the area of interest.
17. The method of claim 1, wherein the grouping the relevant documents into clusters includes:
identifying a first one of the addresses associated with a first one of the relevant documents,
determining one or more second ones of the relevant documents that are also associated with the first address, and
grouping the first relevant document and the one or more second relevant documents into a cluster.
18. The method of claim 1, wherein the grouping the relevant documents into clusters includes:
placing each of the relevant documents into at least one cluster.
19. The method of claim 1, wherein the grouping the relevant documents into clusters includes:
placing at least one of the relevant documents into a plurality of the clusters.
20. The method of claim 1, wherein the presenting the clusters includes:
generating scores for the relevant documents within each of the clusters, and
sorting the relevant documents within each of the clusters based, at least in part, on the scores.
21. The method of claim 1, wherein the presenting the clusters includes:
ranking the clusters based on at least one of a distance factor and a relevancy factor, and
sorting the clusters based, at least in part, on the ranking.
22. The method of claim 21, wherein the distance factor for one of the clusters refers to a distance that an address associated with the one cluster is from a geographic center of the area of interest.
23. The method of claim 22, wherein the relevancy factor for one of the clusters refers to at least one of a number of the one or more keywords present in at least one of the relevant documents in the one cluster and how prominently the one or more keywords appear in at least one of the relevant documents in the one cluster.
24. The method of claim 21, wherein the presenting the clusters further includes:
weighting the distance factor and the relevancy factor differently based, at least in part, on the search query.
25. The method of claim 1, wherein the presenting the clusters includes:
forming a result output for each of the clusters, the result output including at least one of a title and a snippet for one of the relevant documents in the cluster and a title for another one or more of the relevant documents in the cluster.
26. The method of claim 1, wherein the presenting the clusters includes:
forming a result output for each of the clusters, the result output including a name of a business or organization and a title for one or more of the relevant documents in the cluster.
27. A system for forming search results, comprising:
means for receiving a search query;
means for identifying a geographical location;
means for determining a geographical center of the geographical location;
means for identifying locations within a certain distance of the geographical center as a geographical area of interest;
means for identifying documents that are associated with addresses located within the geographical area of interest; and
means for determining relevant ones of the identified documents, as relevant documents, based, at least in part, on the search query, the relevant documents forming the search results.
28. A system for forming search results, comprising:
a memory configured to store information that matches documents to addresses associated with the documents; and
a processor connected to the memory and configured to:
receive a search query,
determine a geographical area of interest based, at least in part, on the search query,
identify documents that are associated with addresses located within the geographical area of interest based, at least in part, on the information stored in the memory,
group the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest, and
provide the clusters as the search results.
29. A method for clustering by address, comprising:
receiving a search query;
identifying a geographical area of interest based, at least in part, on the search query;
identifying documents that are associated with addresses located within the geographical area of interest;
grouping the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest; and
presenting the clusters.
30. A method for forming search results, comprising:
receiving a search query that includes at least one portion of a telephone number;
identifying a geographical area of interest based, at least in part, on the at least one portion of the telephone number;
identifying documents that are associated with addresses located within the geographical area of interest;
grouping the identified documents into clusters based, at least in part, on the addresses located within the geographical area of interest; and
presenting the clusters as the search results.
31. The method of claim 30, wherein the at least one portion of the telephone number includes at least one of an area code and a prefix associated with the telephone number.
32. A method for forming search results, comprising:
receiving a search query that includes one or more keywords and at least one portion of a telephone number;
identifying documents that are associated with telephone numbers that match the at least one portion of the telephone number;
determining ones of the identified documents that match the one or more keywords as relevant documents;
grouping the relevant documents into clusters based on the telephone numbers included in the relevant documents; and
presenting the clusters as the search results.
33. A method for forming search results, comprising:
receiving a search query that includes at least one portion of a telephone number;
identifying documents that are associated with telephone numbers that match the at least one portion of the telephone number;
grouping the identified documents into clusters based on the telephone numbers included in the identified documents; and
presenting the clusters as the search results.
US10/664,929 2003-09-22 2003-09-22 Systems and methods for clustering search results Active 2028-09-22 US8346770B2 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US10/664,929 US8346770B2 (en) 2003-09-22 2003-09-22 Systems and methods for clustering search results
PCT/US2004/030983 WO2005031614A1 (en) 2003-09-22 2004-09-20 Systems and methods for clustering search results
EP04784728A EP1665101A1 (en) 2003-09-22 2004-09-20 Systems and methods for clustering search results
KR1020067005695A KR100814667B1 (en) 2003-09-22 2004-09-20 Systems and methods for clustering search results
NO20061794A NO337806B1 (en) 2003-09-22 2006-04-24 Systems and methods for grouping search results

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US10/664,929 US8346770B2 (en) 2003-09-22 2003-09-22 Systems and methods for clustering search results

Publications (2)

Publication Number Publication Date
US20050065959A1 true US20050065959A1 (en) 2005-03-24
US8346770B2 US8346770B2 (en) 2013-01-01

Family

ID=34312824

Family Applications (1)

Application Number Title Priority Date Filing Date
US10/664,929 Active 2028-09-22 US8346770B2 (en) 2003-09-22 2003-09-22 Systems and methods for clustering search results

Country Status (5)

Country Link
US (1) US8346770B2 (en)
EP (1) EP1665101A1 (en)
KR (1) KR100814667B1 (en)
NO (1) NO337806B1 (en)
WO (1) WO2005031614A1 (en)

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060074902A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation Forming intent-based clusters and employing same by search
US20060206624A1 (en) * 2005-03-10 2006-09-14 Microsoft Corporation Method and system for web resource location classification and detection
US20060271531A1 (en) * 2005-05-27 2006-11-30 O'clair Brian Scoring local search results based on location prominence
US20060271518A1 (en) * 2005-05-27 2006-11-30 Microsoft Corporation Search query dominant location detection
US20070143345A1 (en) * 2005-10-12 2007-06-21 Jones Michael T Entity display priority in a distributed geographic information system
US20070156671A1 (en) * 2005-12-30 2007-07-05 Yip Kai K K Category search for structured documents
US20070233864A1 (en) * 2006-03-28 2007-10-04 Microsoft Corporation Detecting Serving Area of a Web Resource
US7302645B1 (en) 2003-12-10 2007-11-27 Google Inc. Methods and systems for identifying manipulated articles
US20080005071A1 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Search guided by location and context
US20080005073A1 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Data management in social networks
US20080052413A1 (en) * 2006-08-28 2008-02-28 Microsoft Corporation Serving locally relevant advertisements
US20080059899A1 (en) * 2003-10-14 2008-03-06 Microsoft Corporation System and process for presenting search results in a histogram/cluster format
US20080189177A1 (en) * 2007-02-02 2008-08-07 Anderton Jared M Systems and methods for providing advertisements
US20080208847A1 (en) * 2007-02-26 2008-08-28 Fabian Moerchen Relevance ranking for document retrieval
WO2009017926A2 (en) * 2007-07-31 2009-02-05 Microsoft Corporation Generalized location identification
US20090138445A1 (en) * 2007-11-26 2009-05-28 Urban Mapping, Inc. Generating geographical keywords for geotargeting search engine-offered advertisements
US20090265388A1 (en) * 2008-04-22 2009-10-22 Microsoft Corporation Discovering co-located queries in geographic search logs
US20090265363A1 (en) * 2008-04-16 2009-10-22 Microsoft Corporation Forum web page clustering based on repetitive regions
US20090285487A1 (en) * 2008-05-14 2009-11-19 Geosemble Technologies Inc. Systems and methods for linking content to individual image features
US20100036807A1 (en) * 2008-08-05 2010-02-11 Yellowpages.Com Llc Systems and Methods to Sort Information Related to Entities Having Different Locations
US20100088647A1 (en) * 2006-01-23 2010-04-08 Microsoft Corporation User interface for viewing clusters of images
US7746343B1 (en) 2005-06-27 2010-06-29 Google Inc. Streaming and interactive visualization of filled polygon data in a geographic information system
EP2122860A4 (en) * 2007-01-17 2010-12-01 Google Inc Location in search queries
US20110004399A1 (en) * 2003-12-19 2011-01-06 Smartt Brian E Geocoding Locations Near A Specified City
US20110007941A1 (en) * 2004-07-09 2011-01-13 Ching-Chien Chen Precisely locating features on geospatial imagery
US20110047151A1 (en) * 2004-12-30 2011-02-24 Google Inc. Local item extraction
US20110078160A1 (en) * 2009-09-25 2011-03-31 International Business Machines Corporation Recommending one or more concepts related to a current analytic activity of a user
US20110078101A1 (en) * 2009-09-25 2011-03-31 International Business Machines Corporation Recommending one or more existing notes related to a current analytic activity of a user
US20110119265A1 (en) * 2009-11-16 2011-05-19 Cyrus Shahabi Dynamically linking relevant documents to regions of interest
US20110184932A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation Search using proximity for clustering information
WO2011130290A2 (en) * 2010-04-12 2011-10-20 Visa International Service Association Authentication process using search technology
US8122013B1 (en) 2006-01-27 2012-02-21 Google Inc. Title based local search ranking
US20120215769A1 (en) * 2011-02-23 2012-08-23 Novell, Inc. Structured relevance - a mechanism to reveal how data is related
US8385964B2 (en) 2005-04-04 2013-02-26 Xone, Inc. Methods and apparatuses for geospatial-based sharing of information by multiple devices
US8463772B1 (en) 2010-05-13 2013-06-11 Google Inc. Varied-importance proximity values
US20130311511A1 (en) * 2012-05-15 2013-11-21 Alibaba Group Holding Limited Information searching method and system based on geographic location
US8631007B1 (en) 2008-12-09 2014-01-14 Google Inc. Disambiguating keywords and other query terms used to select sponsored content
US20140032533A1 (en) * 2012-05-03 2014-01-30 Salesforce.Com, Inc. System and method for geo-location data type searching in an on demand environment
US8666821B2 (en) 2006-08-28 2014-03-04 Microsoft Corporation Selecting advertisements based on serving area and map area
US8676790B1 (en) * 2003-12-05 2014-03-18 Google Inc. Methods and systems for improving search rankings using advertising data
US20140207959A1 (en) * 2012-10-31 2014-07-24 Virtualbeam, Inc. Distributed association engine
US20140258331A1 (en) * 2013-03-07 2014-09-11 Ricoh Co., Ltd. Form Aggregation Based on Marks in Graphic Form Fields
US8855281B2 (en) 2012-06-08 2014-10-07 International Business Machines Corporation Systems for retrieving content in a unified communications environment
US8953887B2 (en) 2004-07-09 2015-02-10 Terrago Technologies, Inc. Processing time-based geospatial data
US9047368B1 (en) * 2013-02-19 2015-06-02 Symantec Corporation Self-organizing user-centric document vault
US20150169789A1 (en) * 2012-08-10 2015-06-18 Google Inc. Providing local data with search results
US9396269B2 (en) 2006-06-28 2016-07-19 Microsoft Technology Licensing, Llc Search engine that identifies and uses social networks in communications, retrieval, and electronic commerce
US20160321346A1 (en) * 2015-05-01 2016-11-03 Kevin A. Li Clustering Search Results
US20170024456A1 (en) * 2015-07-24 2017-01-26 Samsung Sds Co., Ltd. Method and apparatus for providing documents reflecting user pattern
WO2017034518A1 (en) * 2015-08-21 2017-03-02 Hewlett-Packard Development Company, L.P. Identifying documents
US20170099342A1 (en) * 2015-10-04 2017-04-06 Anthony Ko-Ping Chien Dynamically Served Content
CN107305577A (en) * 2016-04-25 2017-10-31 北京京东尚科信息技术有限公司 Correct-distribute address date processing method and system based on K-means
US20180137125A1 (en) * 2015-04-17 2018-05-17 Steven Michael VITTORIO Content Search and Results
US10037357B1 (en) * 2010-08-17 2018-07-31 Google Llc Selecting between global and location-specific search results
US20190005051A1 (en) * 2017-06-29 2019-01-03 Microsoft Technology Licensing, Llc Clustering search results in an enterprise search system
US10599738B1 (en) 2013-04-09 2020-03-24 Google Llc Real-time generation of an improved graphical user interface for overlapping electronic content
US20200380056A1 (en) * 2019-06-03 2020-12-03 Overwatch Systems, Ltd. Integrating platform for managing gis data and images
US11222084B2 (en) 2013-10-22 2022-01-11 Steven Michael VITTORIO Content search and results
US11238114B2 (en) 2013-10-22 2022-02-01 Steven Michael VITTORIO Educational content search and results
US11586680B2 (en) * 2014-03-31 2023-02-21 International Business Machines Corporation Fast and accurate geomapping

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100842080B1 (en) * 2006-05-03 2008-06-30 중앙대학교 산학협력단 Data Classificating method
KR100868379B1 (en) 2006-10-25 2008-11-12 활로 커뮤니케이션즈(주) Method for furnishing phone number search ranking and/or economic trend index using database of phone number and System for the same
KR100932843B1 (en) * 2008-01-29 2009-12-21 엔에이치엔(주) Method and system for providing clustered search results based on the degree of association between search results and method and system for clustering search results
JP2011138197A (en) * 2009-12-25 2011-07-14 Sony Corp Information processing apparatus, method of evaluating degree of association, and program
CA2760624C (en) 2010-12-07 2015-04-07 Rakuten, Inc. Server, dictionary creation method, dictionary creation program, and computer-readable recording medium recording the program
JP4828653B1 (en) * 2010-12-07 2011-11-30 楽天株式会社 Server, dictionary generation method, dictionary generation program, and computer-readable recording medium for recording the program
US9201964B2 (en) 2012-01-23 2015-12-01 Microsoft Technology Licensing, Llc Identifying related entities
US9298358B1 (en) * 2012-08-21 2016-03-29 Google Inc. Scrollable notifications
US9858291B1 (en) 2013-10-30 2018-01-02 Google Inc. Detection of related local entities
US10025830B1 (en) 2013-10-30 2018-07-17 Google Llc Aggregation of disparate entity lists for local entities
US10505893B1 (en) * 2013-11-19 2019-12-10 El Toro.Com, Llc Generating content based on search instances
US10333890B1 (en) 2013-11-19 2019-06-25 El Toro.Com, Llc Determining IP addresses that are associated with physical locations with new occupants and providing advertisements tailored to new movers to one or more of those IP addresses
US10348842B1 (en) 2013-11-19 2019-07-09 El Toro.Com, Llc Generating content based on a captured IP address associated with a visit to an electronic resource
US9515984B1 (en) 2013-11-19 2016-12-06 El Toro.Com, Llc Determining and utilizing one or more attributes of IP addresses
US10932118B1 (en) 2018-05-25 2021-02-23 El Toro.Com, Llc Systems, methods, and apparatuses for providing content according to geolocation
KR102185703B1 (en) * 2019-04-05 2020-12-02 연세대학교 산학협력단 Method and Apparatus for Processing Group Keyword Query Based on Spatial Knowledge Base
US11934416B2 (en) * 2021-04-13 2024-03-19 UiPath, Inc. Task and process mining by robotic process automations across a computing environment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270A (en) * 1854-07-11 Improvement in the construction of inkstands
US78035A (en) * 1868-05-19 H a e v b t w e b st e e
US87505A (en) * 1869-03-02 Improved cotton-press
US6101496A (en) * 1998-06-08 2000-08-08 Mapinfo Corporation Ordered information geocoding method and apparatus
US20020042789A1 (en) * 2000-10-04 2002-04-11 Zbigniew Michalewicz Internet search engine with interactive search criteria construction
US20030061211A1 (en) * 2000-06-30 2003-03-27 Shultz Troy L. GIS based search engine
US20030217052A1 (en) * 2000-08-24 2003-11-20 Celebros Ltd. Search engine method and apparatus

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5875446A (en) 1997-02-24 1999-02-23 International Business Machines Corporation System and method for hierarchically grouping and ranking a set of objects in a query context based on one or more relationships
US6701307B2 (en) 1998-10-28 2004-03-02 Microsoft Corporation Method and apparatus of expanding web searching capabilities
GB2356948A (en) 1999-11-30 2001-06-06 Saeed Mohamed Moghul Search system
EP3367268A1 (en) 2000-02-22 2018-08-29 Nokia Technologies Oy Spatially coding and displaying information
AUPQ599700A0 (en) 2000-03-03 2000-03-23 Super Internet Site System Pty Ltd On-line geographical directory
US20040230461A1 (en) 2000-03-30 2004-11-18 Talib Iqbal A. Methods and systems for enabling efficient retrieval of data from data collections
KR20020046494A (en) 2000-12-14 2002-06-21 박은수 Commercial dealings method using the local unit retrieval system
US6868396B2 (en) 2000-12-29 2005-03-15 Nortel Networks Limited Method and apparatus for monitoring internet based sales transactions by local vendors

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11270A (en) * 1854-07-11 Improvement in the construction of inkstands
US78035A (en) * 1868-05-19 H a e v b t w e b st e e
US87505A (en) * 1869-03-02 Improved cotton-press
US6101496A (en) * 1998-06-08 2000-08-08 Mapinfo Corporation Ordered information geocoding method and apparatus
US20030061211A1 (en) * 2000-06-30 2003-03-27 Shultz Troy L. GIS based search engine
US20030217052A1 (en) * 2000-08-24 2003-11-20 Celebros Ltd. Search engine method and apparatus
US20020042789A1 (en) * 2000-10-04 2002-04-11 Zbigniew Michalewicz Internet search engine with interactive search criteria construction

Cited By (171)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059899A1 (en) * 2003-10-14 2008-03-06 Microsoft Corporation System and process for presenting search results in a histogram/cluster format
US7698657B2 (en) * 2003-10-14 2010-04-13 Microsoft Corporation System and process for presenting search results in a histogram/cluster format
US20100199205A1 (en) * 2003-10-14 2010-08-05 Microsoft Corporation System and process for presenting search results in a histogram/cluster format
US8214764B2 (en) * 2003-10-14 2012-07-03 Microsoft Corporation System and process for presenting search results in a histogram/cluster format
US8676790B1 (en) * 2003-12-05 2014-03-18 Google Inc. Methods and systems for improving search rankings using advertising data
US7302645B1 (en) 2003-12-10 2007-11-27 Google Inc. Methods and systems for identifying manipulated articles
US20110004399A1 (en) * 2003-12-19 2011-01-06 Smartt Brian E Geocoding Locations Near A Specified City
US10713285B2 (en) * 2003-12-19 2020-07-14 Uber Technologies, Inc. Geocoding locations near a specified city
US9453739B2 (en) * 2003-12-19 2016-09-27 Uber Technologies, Inc. Geocoding locations near a specified city
US20160364411A1 (en) * 2003-12-19 2016-12-15 Uber Technologies, Inc. Geocoding Locations Near A Specified City
US20110123066A9 (en) * 2004-07-09 2011-05-26 Ching-Chien Chen Precisely locating features on geospatial imagery
US20110007941A1 (en) * 2004-07-09 2011-01-13 Ching-Chien Chen Precisely locating features on geospatial imagery
US8953887B2 (en) 2004-07-09 2015-02-10 Terrago Technologies, Inc. Processing time-based geospatial data
US8675995B2 (en) 2004-07-09 2014-03-18 Terrago Technologies, Inc. Precisely locating features on geospatial imagery
US20060074902A1 (en) * 2004-09-30 2006-04-06 Microsoft Corporation Forming intent-based clusters and employing same by search
US7657519B2 (en) * 2004-09-30 2010-02-02 Microsoft Corporation Forming intent-based clusters and employing same by search
US20110047151A1 (en) * 2004-12-30 2011-02-24 Google Inc. Local item extraction
US8433704B2 (en) * 2004-12-30 2013-04-30 Google Inc. Local item extraction
US7574530B2 (en) * 2005-03-10 2009-08-11 Microsoft Corporation Method and system for web resource location classification and detection
US8073789B2 (en) 2005-03-10 2011-12-06 Microsoft Corporation Method and system for web resource location classification and detection
US20060206624A1 (en) * 2005-03-10 2006-09-14 Microsoft Corporation Method and system for web resource location classification and detection
US10341808B2 (en) 2005-04-04 2019-07-02 X One, Inc. Location sharing for commercial and proprietary content applications
US9167558B2 (en) 2005-04-04 2015-10-20 X One, Inc. Methods and systems for sharing position data between subscribers involving multiple wireless providers
US10165059B2 (en) 2005-04-04 2018-12-25 X One, Inc. Methods, systems and apparatuses for the formation and tracking of location sharing groups
US10149092B1 (en) 2005-04-04 2018-12-04 X One, Inc. Location sharing service between GPS-enabled wireless devices, with shared target location exchange
US9967704B1 (en) 2005-04-04 2018-05-08 X One, Inc. Location sharing group map management
US9955298B1 (en) 2005-04-04 2018-04-24 X One, Inc. Methods, systems and apparatuses for the formation and tracking of location sharing groups
US10299071B2 (en) 2005-04-04 2019-05-21 X One, Inc. Server-implemented methods and systems for sharing location amongst web-enabled cell phones
US9942705B1 (en) 2005-04-04 2018-04-10 X One, Inc. Location sharing group for services provision
US9883360B1 (en) 2005-04-04 2018-01-30 X One, Inc. Rendez vous management using mobile phones or other mobile devices
US10313826B2 (en) 2005-04-04 2019-06-04 X One, Inc. Location sharing and map support in connection with services request
US9854394B1 (en) 2005-04-04 2017-12-26 X One, Inc. Ad hoc location sharing group between first and second cellular wireless devices
US10341809B2 (en) 2005-04-04 2019-07-02 X One, Inc. Location sharing with facilitated meeting point definition
US8712441B2 (en) 2005-04-04 2014-04-29 Xone, Inc. Methods and systems for temporarily sharing position data between mobile-device users
US9854402B1 (en) 2005-04-04 2017-12-26 X One, Inc. Formation of wireless device location sharing group
US8750898B2 (en) 2005-04-04 2014-06-10 X One, Inc. Methods and systems for annotating target locations
US8798647B1 (en) 2005-04-04 2014-08-05 X One, Inc. Tracking proximity of services provider to services consumer
US8798593B2 (en) 2005-04-04 2014-08-05 X One, Inc. Location sharing and tracking using mobile phones or other wireless devices
US10750311B2 (en) 2005-04-04 2020-08-18 X One, Inc. Application-based tracking and mapping function in connection with vehicle-based services provision
US9749790B1 (en) 2005-04-04 2017-08-29 X One, Inc. Rendez vous management using mobile phones or other mobile devices
US9736618B1 (en) 2005-04-04 2017-08-15 X One, Inc. Techniques for sharing relative position between mobile devices
US9654921B1 (en) 2005-04-04 2017-05-16 X One, Inc. Techniques for sharing position data between first and second devices
US8538458B2 (en) 2005-04-04 2013-09-17 X One, Inc. Location sharing and tracking using mobile phones or other wireless devices
US9615204B1 (en) 2005-04-04 2017-04-04 X One, Inc. Techniques for communication within closed groups of mobile devices
US9584960B1 (en) 2005-04-04 2017-02-28 X One, Inc. Rendez vous management using mobile phones or other mobile devices
US10750309B2 (en) 2005-04-04 2020-08-18 X One, Inc. Ad hoc location sharing group establishment for wireless devices with designated meeting point
US10750310B2 (en) 2005-04-04 2020-08-18 X One, Inc. Temporary location sharing group with event based termination
US8798645B2 (en) 2005-04-04 2014-08-05 X One, Inc. Methods and systems for sharing position data and tracing paths between mobile-device users
US9467832B2 (en) 2005-04-04 2016-10-11 X One, Inc. Methods and systems for temporarily sharing position data between mobile-device users
US8831635B2 (en) 2005-04-04 2014-09-09 X One, Inc. Methods and apparatuses for transmission of an alert to multiple devices
US10791414B2 (en) 2005-04-04 2020-09-29 X One, Inc. Location sharing for commercial and proprietary content applications
US10856099B2 (en) 2005-04-04 2020-12-01 X One, Inc. Application-based two-way tracking and mapping function with selected individuals
US9253616B1 (en) 2005-04-04 2016-02-02 X One, Inc. Apparatus and method for obtaining content on a cellular wireless device based on proximity
US9185522B1 (en) 2005-04-04 2015-11-10 X One, Inc. Apparatus and method to transmit content to a cellular wireless device based on proximity to other wireless devices
US11356799B2 (en) 2005-04-04 2022-06-07 X One, Inc. Fleet location sharing application in association with services provision
US10200811B1 (en) 2005-04-04 2019-02-05 X One, Inc. Map presentation on cellular device showing positions of multiple other wireless device users
US9031581B1 (en) 2005-04-04 2015-05-12 X One, Inc. Apparatus and method for obtaining content on a cellular wireless device based on proximity to other wireless devices
US11778415B2 (en) 2005-04-04 2023-10-03 Xone, Inc. Location sharing application in association with services provision
US8385964B2 (en) 2005-04-04 2013-02-26 Xone, Inc. Methods and apparatuses for geospatial-based sharing of information by multiple devices
US8046371B2 (en) * 2005-05-27 2011-10-25 Google Inc. Scoring local search results based on location prominence
US20060271518A1 (en) * 2005-05-27 2006-11-30 Microsoft Corporation Search query dominant location detection
JP4790014B2 (en) * 2005-05-27 2011-10-12 グーグル インコーポレイテッド Scoring local search results based on location saliency
CN101223526A (en) * 2005-05-27 2008-07-16 谷歌公司 Scoring local search results based on location prominence
US20110022604A1 (en) * 2005-05-27 2011-01-27 Google Inc. Scoring local search results based on location prominence
US7424472B2 (en) * 2005-05-27 2008-09-09 Microsoft Corporation Search query dominant location detection
JP2008542883A (en) * 2005-05-27 2008-11-27 グーグル インコーポレイテッド Scoring local search results based on location saliency
US7822751B2 (en) * 2005-05-27 2010-10-26 Google Inc. Scoring local search results based on location prominence
US20060271531A1 (en) * 2005-05-27 2006-11-30 O'clair Brian Scoring local search results based on location prominence
US7933395B1 (en) 2005-06-27 2011-04-26 Google Inc. Virtual tour of user-defined paths in a geographic information system
US7746343B1 (en) 2005-06-27 2010-06-29 Google Inc. Streaming and interactive visualization of filled polygon data in a geographic information system
US7933929B1 (en) 2005-06-27 2011-04-26 Google Inc. Network link for providing dynamic data layer in a geographic information system
US9471625B2 (en) 2005-06-27 2016-10-18 Google Inc. Dynamic view-based data layer in a geographic information system
US8350849B1 (en) 2005-06-27 2013-01-08 Google Inc. Dynamic view-based data layer in a geographic information system
US10198521B2 (en) * 2005-06-27 2019-02-05 Google Llc Processing ambiguous search requests in a geographic information system
US10990638B2 (en) 2005-06-27 2021-04-27 Google Llc Processing ambiguous search requests in a geographic information system
US10496724B2 (en) 2005-06-27 2019-12-03 Google Llc Intelligent distributed geographic information system
US10795958B2 (en) 2005-06-27 2020-10-06 Google Llc Intelligent distributed geographic information system
US10592537B2 (en) 2005-10-12 2020-03-17 Google Llc Entity display priority in a distributed geographic information system
US9870409B2 (en) 2005-10-12 2018-01-16 Google Llc Entity display priority in a distributed geographic information system
US8290942B2 (en) 2005-10-12 2012-10-16 Google Inc. Entity display priority in a distributed geographic information system
US20070143345A1 (en) * 2005-10-12 2007-06-21 Jones Michael T Entity display priority in a distributed geographic information system
US9715530B2 (en) 2005-10-12 2017-07-25 Google Inc. Entity display priority in a distributed geographic information system
US9785648B2 (en) 2005-10-12 2017-10-10 Google Inc. Entity display priority in a distributed geographic information system
US7933897B2 (en) 2005-10-12 2011-04-26 Google Inc. Entity display priority in a distributed geographic information system
US8965884B2 (en) 2005-10-12 2015-02-24 Google Inc. Entity display priority in a distributed geographic information system
US11288292B2 (en) 2005-10-12 2022-03-29 Google Llc Entity display priority in a distributed geographic information system
US20070156671A1 (en) * 2005-12-30 2007-07-05 Yip Kai K K Category search for structured documents
US10120883B2 (en) 2006-01-23 2018-11-06 Microsoft Technology Licensing, Llc User interface for viewing clusters of images
US9396214B2 (en) * 2006-01-23 2016-07-19 Microsoft Technology Licensing, Llc User interface for viewing clusters of images
US20100088647A1 (en) * 2006-01-23 2010-04-08 Microsoft Corporation User interface for viewing clusters of images
US8122013B1 (en) 2006-01-27 2012-02-21 Google Inc. Title based local search ranking
US20070233864A1 (en) * 2006-03-28 2007-10-04 Microsoft Corporation Detecting Serving Area of a Web Resource
US7606875B2 (en) 2006-03-28 2009-10-20 Microsoft Corporation Detecting serving area of a web resource
US9141704B2 (en) 2006-06-28 2015-09-22 Microsoft Technology Licensing, Llc Data management in social networks
US8874592B2 (en) * 2006-06-28 2014-10-28 Microsoft Corporation Search guided by location and context
US20080005071A1 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Search guided by location and context
US10592569B2 (en) 2006-06-28 2020-03-17 Microsoft Technology Licensing, Llc Search guided by location and context
US9536004B2 (en) 2006-06-28 2017-01-03 Microsoft Technology Licensing, Llc Search guided by location and context
US20080005073A1 (en) * 2006-06-28 2008-01-03 Microsoft Corporation Data management in social networks
US9396269B2 (en) 2006-06-28 2016-07-19 Microsoft Technology Licensing, Llc Search engine that identifies and uses social networks in communications, retrieval, and electronic commerce
US20080052413A1 (en) * 2006-08-28 2008-02-28 Microsoft Corporation Serving locally relevant advertisements
US8666821B2 (en) 2006-08-28 2014-03-04 Microsoft Corporation Selecting advertisements based on serving area and map area
US7650431B2 (en) 2006-08-28 2010-01-19 Microsoft Corporation Serving locally relevant advertisements
EP2122860A4 (en) * 2007-01-17 2010-12-01 Google Inc Location in search queries
US20080189177A1 (en) * 2007-02-02 2008-08-07 Anderton Jared M Systems and methods for providing advertisements
US20080208847A1 (en) * 2007-02-26 2008-08-28 Fabian Moerchen Relevance ranking for document retrieval
WO2009017926A3 (en) * 2007-07-31 2009-03-19 Microsoft Corp Generalized location identification
WO2009017926A2 (en) * 2007-07-31 2009-02-05 Microsoft Corporation Generalized location identification
US9384291B2 (en) 2007-11-26 2016-07-05 Urban Mapping, Inc. Generating geographical keywords for geotargeting search engine-offered advertisements
WO2009070501A1 (en) * 2007-11-26 2009-06-04 Urban Mapping, Inc. Generating geographical keywords for geotargeting search engine-offered advertisements
US20090138445A1 (en) * 2007-11-26 2009-05-28 Urban Mapping, Inc. Generating geographical keywords for geotargeting search engine-offered advertisements
US8825683B2 (en) 2007-11-26 2014-09-02 Urban Mapping, Inc. Generating geographical keywords for geotargeting search engine-offered advertisements
US8051083B2 (en) * 2008-04-16 2011-11-01 Microsoft Corporation Forum web page clustering based on repetitive regions
US20090265363A1 (en) * 2008-04-16 2009-10-22 Microsoft Corporation Forum web page clustering based on repetitive regions
US20090265388A1 (en) * 2008-04-22 2009-10-22 Microsoft Corporation Discovering co-located queries in geographic search logs
US9092454B2 (en) 2008-04-22 2015-07-28 Microsoft Technology Licensing, Llc Discovering co-located queries in geographic search logs
US8670617B2 (en) 2008-05-14 2014-03-11 Terrago Technologies, Inc. Systems and methods for linking content to individual image features
US20090285487A1 (en) * 2008-05-14 2009-11-19 Geosemble Technologies Inc. Systems and methods for linking content to individual image features
US8423536B2 (en) * 2008-08-05 2013-04-16 Yellowpages.Com Llc Systems and methods to sort information related to entities having different locations
US20100036807A1 (en) * 2008-08-05 2010-02-11 Yellowpages.Com Llc Systems and Methods to Sort Information Related to Entities Having Different Locations
US8676789B2 (en) 2008-08-05 2014-03-18 Yellowpages.Com Llc Systems and methods to sort information related to entities having different locations
US8631007B1 (en) 2008-12-09 2014-01-14 Google Inc. Disambiguating keywords and other query terms used to select sponsored content
US20110078101A1 (en) * 2009-09-25 2011-03-31 International Business Machines Corporation Recommending one or more existing notes related to a current analytic activity of a user
US20110078160A1 (en) * 2009-09-25 2011-03-31 International Business Machines Corporation Recommending one or more concepts related to a current analytic activity of a user
US20110119265A1 (en) * 2009-11-16 2011-05-19 Cyrus Shahabi Dynamically linking relevant documents to regions of interest
US8635228B2 (en) * 2009-11-16 2014-01-21 Terrago Technologies, Inc. Dynamically linking relevant documents to regions of interest
US8756231B2 (en) * 2010-01-28 2014-06-17 International Business Machines Corporation Search using proximity for clustering information
US20110184932A1 (en) * 2010-01-28 2011-07-28 International Business Machines Corporation Search using proximity for clustering information
US8600875B2 (en) 2010-04-12 2013-12-03 Visa International Service Association Authentication process using search technology
WO2011130290A3 (en) * 2010-04-12 2012-02-02 Visa International Service Association Authentication process using search technology
WO2011130290A2 (en) * 2010-04-12 2011-10-20 Visa International Service Association Authentication process using search technology
US8463772B1 (en) 2010-05-13 2013-06-11 Google Inc. Varied-importance proximity values
US11461336B2 (en) 2010-08-17 2022-10-04 Google Llc Selecting between global and location-specific search results
US10037357B1 (en) * 2010-08-17 2018-07-31 Google Llc Selecting between global and location-specific search results
US8666973B2 (en) * 2011-02-23 2014-03-04 Novell, Inc. Structured relevance—a mechanism to reveal how data is related
US20120215769A1 (en) * 2011-02-23 2012-08-23 Novell, Inc. Structured relevance - a mechanism to reveal how data is related
US9275104B2 (en) 2011-02-23 2016-03-01 Novell, Inc. Structured relevance—a mechanism to reveal how data is related
US9418077B2 (en) * 2012-05-03 2016-08-16 Salesforce.Com, Inc. System and method for geo-location data type searching in an on demand environment
US9110959B2 (en) * 2012-05-03 2015-08-18 Salesforce.Com, Inc. System and method for geo-location data type searching in an on demand environment
US20140032533A1 (en) * 2012-05-03 2014-01-30 Salesforce.Com, Inc. System and method for geo-location data type searching in an on demand environment
US20150310039A1 (en) * 2012-05-03 2015-10-29 Salesforce.Com, Inc. System and method for geo-location data type searching in an on demand environment
US9390103B2 (en) * 2012-05-15 2016-07-12 Alibaba Group Holding Limited Information searching method and system based on geographic location
US20160357766A1 (en) * 2012-05-15 2016-12-08 Alibaba Group Holding Limited Information searching method and system based on geographic location
US20130311511A1 (en) * 2012-05-15 2013-11-21 Alibaba Group Holding Limited Information searching method and system based on geographic location
US8929526B2 (en) 2012-06-08 2015-01-06 International Business Machines Corporation Methods for retrieving content in a unified communications environment
US8855281B2 (en) 2012-06-08 2014-10-07 International Business Machines Corporation Systems for retrieving content in a unified communications environment
US20150169789A1 (en) * 2012-08-10 2015-06-18 Google Inc. Providing local data with search results
US9418156B2 (en) * 2012-08-10 2016-08-16 Google Inc. Providing local data with search results
US9462015B2 (en) * 2012-10-31 2016-10-04 Virtualbeam, Inc. Distributed association engine
US20140207959A1 (en) * 2012-10-31 2014-07-24 Virtualbeam, Inc. Distributed association engine
US9047368B1 (en) * 2013-02-19 2015-06-02 Symantec Corporation Self-organizing user-centric document vault
US20140258331A1 (en) * 2013-03-07 2014-09-11 Ricoh Co., Ltd. Form Aggregation Based on Marks in Graphic Form Fields
US9483522B2 (en) * 2013-03-07 2016-11-01 Ricoh Company, Ltd. Form aggregation based on marks in graphic form fields
US10599738B1 (en) 2013-04-09 2020-03-24 Google Llc Real-time generation of an improved graphical user interface for overlapping electronic content
US11347821B2 (en) 2013-04-09 2022-05-31 Google Llc Real-time generation of an improved graphical user interface for overlapping electronic content
US11238114B2 (en) 2013-10-22 2022-02-01 Steven Michael VITTORIO Educational content search and results
US11222084B2 (en) 2013-10-22 2022-01-11 Steven Michael VITTORIO Content search and results
US11586680B2 (en) * 2014-03-31 2023-02-21 International Business Machines Corporation Fast and accurate geomapping
US20180137125A1 (en) * 2015-04-17 2018-05-17 Steven Michael VITTORIO Content Search and Results
US11704323B2 (en) * 2015-04-17 2023-07-18 Steven Michael VITTORIO Content search and results
US11250008B2 (en) * 2015-04-17 2022-02-15 Steven Michael VITTORIO Content search and results
US20220121671A1 (en) * 2015-04-17 2022-04-21 Steven Michael VITTORIO Content search and results
US20160321346A1 (en) * 2015-05-01 2016-11-03 Kevin A. Li Clustering Search Results
US20170024456A1 (en) * 2015-07-24 2017-01-26 Samsung Sds Co., Ltd. Method and apparatus for providing documents reflecting user pattern
WO2017034518A1 (en) * 2015-08-21 2017-03-02 Hewlett-Packard Development Company, L.P. Identifying documents
US20170099342A1 (en) * 2015-10-04 2017-04-06 Anthony Ko-Ping Chien Dynamically Served Content
CN107305577A (en) * 2016-04-25 2017-10-31 北京京东尚科信息技术有限公司 Correct-distribute address date processing method and system based on K-means
US20190005051A1 (en) * 2017-06-29 2019-01-03 Microsoft Technology Licensing, Llc Clustering search results in an enterprise search system
US10747800B2 (en) * 2017-06-29 2020-08-18 Microsoft Technology Licensing, Llc Clustering search results in an enterprise search system
US20200380056A1 (en) * 2019-06-03 2020-12-03 Overwatch Systems, Ltd. Integrating platform for managing gis data and images
US11803603B2 (en) * 2019-06-03 2023-10-31 Overwatch Systems, Ltd. Integrating platform for managing GIS data and images

Also Published As

Publication number Publication date
KR100814667B1 (en) 2008-03-18
EP1665101A1 (en) 2006-06-07
NO20061794L (en) 2006-04-24
WO2005031614A1 (en) 2005-04-07
KR20060095979A (en) 2006-09-05
NO337806B1 (en) 2016-06-27
US8346770B2 (en) 2013-01-01

Similar Documents

Publication Publication Date Title
US8346770B2 (en) Systems and methods for clustering search results
US8972371B2 (en) Search engine and indexing technique
US8108383B2 (en) Enhanced search results
US7483881B2 (en) Determining unambiguous geographic references
US8046371B2 (en) Scoring local search results based on location prominence
US9189496B2 (en) Indexing documents according to geographical relevance
US7346604B1 (en) Method for ranking hypertext search results by analysis of hyperlinks from expert documents and keyword scope
US10691765B1 (en) Personalized search results
US20080010252A1 (en) Bookmarks and ranking
US8977630B1 (en) Personalizing search results
US20090222440A1 (en) Search engine for carrying out a location-dependent search
US7257766B1 (en) Site finding
CA2548948A1 (en) Assigning geographic location identifiers to web pages
US8595225B1 (en) Systems and methods for correlating document topicality and popularity
Asadi et al. Using local popularity of web resources for geo-ranking of search engine results

Legal Events

Date Code Title Description
AS Assignment

Owner name: GOOGLE INC., CALIFORNIA

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SMITH, ADAM;GE, XIANPING;HAMON, ELIZABETH;AND OTHERS;REEL/FRAME:015793/0924

Effective date: 20030922

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

AS Assignment

Owner name: GOOGLE LLC, CALIFORNIA

Free format text: CHANGE OF NAME;ASSIGNOR:GOOGLE INC.;REEL/FRAME:044101/0405

Effective date: 20170929

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 8