US20080077588A1 - Identifying and measuring related queries - Google Patents
Identifying and measuring related queries Download PDFInfo
- Publication number
- US20080077588A1 US20080077588A1 US11/948,374 US94837407A US2008077588A1 US 20080077588 A1 US20080077588 A1 US 20080077588A1 US 94837407 A US94837407 A US 94837407A US 2008077588 A1 US2008077588 A1 US 2008077588A1
- Authority
- US
- United States
- Prior art keywords
- query
- queries
- chinese
- search
- keywords
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
- Y10S707/99934—Query formulation, input preparation, or translation
Definitions
- Online advertising may be an important source of revenue for enterprises engaged in electronic commerce.
- a number of different kinds of web page based online advertisements are currently in use, along with various associated distribution requirements, advertising metrics, and pricing mechanisms.
- Processes associated with technologies such as Hypertext Markup Language (HTML) and Hypertext Transfer Protocol (HTTP) enable a web page to be configured to contain a location for inclusion of an advertisement.
- HTML Hypertext Markup Language
- HTTP Hypertext Transfer Protocol
- a page may not only be a web page, but any other electronically created page or document.
- An advertisement can be selected for display each time the page is requested, for example, by a browser or server application.
- Online advertising may be linked to online searching.
- Online searching is a common way for consumers to locate information, goods, or services on the Internet.
- a consumer may use an online search engine to type in a query to search for other pages or web sites with information related to that query.
- the search may be referred to as a sponsored search.
- Sponsored searching may require advertisers to bid for search keywords.
- the search keywords are associated with the search query for displaying advertisements with the search results. It may be difficult to identify which keyword(s) that a search query is related to. In particular, users may enter search queries that are misspelled or that are in a different language.
- FIG. 1 is a block diagram of an exemplary network system
- FIG. 2 is a block diagram of a language analyzer
- FIG. 3 is a block diagram of exemplary conversion forms
- FIG. 4 is a block diagram of exemplary comparisons of queries
- FIG. 5 is a flow diagram for identifying related queries.
- FIG. 6 is a block diagram of a general computer system for use with the disclosed embodiments.
- the embodiments described below include a system and method for identifying and measuring related queries.
- the embodiments relate to identifying similar Chinese queries.
- a user query may be compared with known search keywords or other search queries.
- the search keywords may be used by advertisers for sponsored searching.
- the user query may be a non-native language query, such as a Chinese related query in an English language website or a query in a Chinese website.
- the user query is converted into a different form before comparing with other converted queries or the search keywords.
- the embodiments are described in terms of a Chinese related query, but other languages or query platforms may be used.
- a similarity score based on various features may be used for comparing the queries. Based on the similarity score or other comparison features, the original user query may be substituted by other queries or be associated with one or more search keywords.
- the associated search keywords may be used for selecting the advertisements that are displayed with the search results for that search query.
- related queries may be identified from a reformulation of the original query.
- the reformulation may be based on stored query logs and used to compare the original query with stored queries.
- various features including language specific features, may be used to measure query similarity.
- the original query may be substituted for a stored query or search keyword for identifying the relevant advertisements to display.
- a user's query may be misspelled and the system may identify a related query that is correctly spelled that replaces the initial user query.
- Chinese related queries may be identified and measured due to an increased interest in Chinese search and advertising markets.
- FIG. 1 provides a simplified view of a network system 100 in which the present embodiments may be implemented. Not all of the depicted components may be required, however, and some embodiments of the invention may include additional components not shown in the figure. Variations in the arrangement and type of the components may be made without departing from the spirit or scope of the claims as set forth herein. Additional, different or fewer components may be provided.
- FIG. 1 is a block diagram illustrating an embodiment of an exemplary network system 100 for language analysis and comparison.
- system 100 includes a language analyzer 104 that may receive and convert a user's search query for comparison with other queries or search keywords.
- a user device 106 is coupled with a search engine 102 through the network 109 .
- the search engine 102 is coupled with a search log database 112 , and both are coupled with the language analyzer 104 .
- the search log database 112 is coupled with a data source 113 and a unit dictionary 116 .
- An ad server 103 may be coupled with the search engine 102 and/or coupled with the language analyzer 104 .
- the phrase “coupled with” is defined to mean directly connected to or indirectly connected through one or more intermediate components. Such intermediate components may include both hardware and software based components. Variations in the arrangement and type of the components may be made without departing from the spirit or scope of the claims as set forth herein.
- the user device 106 may be a computing device for a user to connect to a network 109 , such as the Internet. Examples of a user device include but are not limited to a personal computer, personal digital assistant (“PDA”), cellular phone, or other electronic device.
- PDA personal digital assistant
- the user device 106 may be configured to access other data/information in addition to web pages over the network 109 with a web browser, such as INTERNET EXPLORER® (sold by Microsoft Corp., Redmond, Wash.).
- the user device 106 may enable a user to view pages over the network 109 , such as the Internet.
- the user device 106 may be the user device described below with respect to FIG. 6 .
- the user device 106 may be configured to allow a user to interact with the search engine 102 , the ad server 103 , or other components of the system 100 .
- the user device 106 may receive and display a site or page provided by the search engine 102 .
- the user device 106 may include a keyboard, keypad or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to allow a user to interact with the page(s) provided by the search engine 102 and/or the ad server 103 .
- the search engine 102 is coupled with the user device 106 through the network 109 , as well as being coupled with the language analyzer 104 , the ad server 103 and/or the search log database 112 .
- the search engine 102 is a web server.
- the search engine 102 may provide a site or a page over a network, such as the network 109 or the Internet.
- a site or page may refer to a web page or a series of related web pages which may be received or viewed over a network.
- the site or page is not limited to a web page, and may include any information accessible over a network that may be displayed at the user device 106 .
- a site may refer to a series of pages which are linked by a site map.
- the web site of www.yahoo.com may include thousands of pages, which are included at yahoo.com.
- a page will be described as a web page, a web site, or any other site/page accessible over a network.
- a user of the user device 106 may access a page provided by the search engine 102 over the network 109 .
- the page provided by the search engine 102 may be a search page that receives a search query from the user device 106 and provides search results that are based on the received search query.
- the search engine 102 may include an interface, such as a web page, e.g., the web page which may be accessed on the World Wide Web at yahoo.com, which is used to search for pages which are accessible via the network 109 .
- the user device 106 autonomously or at the direction of the user, may input a search query (also referred to as a user query, original query, search term or a search keyword) for the search engine 102 .
- a single search query may include multiple words or phrases.
- the search engine 102 may perform a search for the search query and display the results of the search on the user device 106 .
- the results of a search may include a listing of related pages or sites that is provided by the search engine 102 in response to receiving the search query.
- the ad server 103 is coupled with the search engine 102 and/or the language analyzer 104 .
- the ad server 103 may be configured to provide advertisements to the search engine 102 .
- the search engine 102 and the ad server 103 may be a common component and/or the search engine 102 may select and provide advertisements.
- the ad server 103 may include or be coupled with an advertisement database that includes advertisements that are available to be displayed by the search engine 102 for sponsored searching.
- the advertisements may be associated with one or more search keywords.
- the search keywords may be purchased or bid on by advertisers. Accordingly, when that search keyword is searched for, the advertiser who purchased or placed the highest bid is selected and their advertisement is displayed.
- the ad server 103 may include or be coupled with a database, such as an advertisement database, that stores search keywords and the respective price or bid for each keyword from advertisers that is referenced for each search query.
- a search query is received and compared with known search keywords or other search queries when the ad server 103 selects and provides the advertisement to the search engine 102 .
- the search log database 112 includes records or logs of at least a subset of the search queries entered in the search engine 102 over a period of time and may also be referred to as a search query log, search term database, keyword database or query database.
- the search log database 112 may store the search keywords that are used by the ad server 103 in selecting an advertisement for a particular search query.
- the search log database 112 may include search queries from any number of users over any period of time.
- the search log database 112 may include records or logs of a subset of the queries or requests for data entered at the search engine 102 over a period of time.
- the search log database 112 may also store associations between search queries from the search engine 102 . For example, a search query may be associated with a search keyword or other search queries after a conversion and comparison by the language analyzer 104 as discussed below.
- the search log database 112 may also be coupled with a data source 113 .
- the data source 113 may be an internal source of data, an external source of search data, or a combination of the two.
- An external data source may include search results from other search engines or other sources.
- a search engine other than search engine 102 may be an external data source and provide search logs to the search log database 112 .
- An internal data source may include search data or other data from the search engine 102 .
- Other data may include other searching or web browsing tendencies identified by the search engine 102 .
- the search log database 112 may also be coupled with a unit dictionary 116 .
- the unit dictionary 116 may be a database of user queries or search keywords that are coupled with one another as units. Units may also be referred to as concepts or topics and are sequences of one or more words that appear in search queries.
- the search query “New York City law enforcement” may include two units, e.g. “New York City” may be one unit and “law enforcement” may be another unit.
- a unit is a phrase of common words that identify a single concept.
- the search query “Chicago art museums” may include two units, e.g. “Chicago” and “art museums.” The “Chicago” unit is a single word, and “art museums” is a two-word unit.
- Units identify common groups of keywords to maximize the efficiency and relevance of search results.
- the unit dictionary 116 may include Chinese related queries, as well as Chinese related units that include Chinese characters. Categorization of search queries into units is discussed in commonly owned U.S. Pat. No. 7,051,023 issued May 23, 2006, entitled “SYSTEMS AND METHODS FOR GENERATING CONCEPT UNITS FROM SEARCH QUERIES,” which is hereby incorporated by reference.
- the unit dictionary 116 and the categorization of search queries into units may be used to compare and analyze search queries received by the search engine 102 .
- a search query may be broken into units that are compared with units from other queries or search keywords.
- past search queries and search keywords are stored in the search log database 112 as units that may be used in an analysis by the language analyzer 104 .
- the ad server 103 , the search engine 102 and/or the search log database 112 may be coupled with the language analyzer 104 .
- the language analyzer 104 receives a user query from the user device 106 and matches or identifies other queries or search keywords.
- the user query may be converted to a different form for comparing various features of the user query with search keywords as discussed with respect to FIG. 2 .
- the language analyzer 104 may be a computing device as described below with respect FIG. 6 .
- the language analyzer 104 includes a processor 105 , memory 107 , software 108 and an interface 110 .
- the language analyzer 104 may be a separate component from the search engine 102 and the ad server 103 .
- any of the language analyzer 104 , search engine 102 , and the ad server 103 may be combined as a single component.
- the interface 110 may communicate with any of the search engine 102 , search log database 112 , and ad server 103 .
- the interface 110 may include a user interface configured to allow a user to interact with any of the components of the language analyzer 104 . For example, a user may be able to modify the conversion form or comparison features that are used by the language analyzer 104 .
- the processor 105 in the language analyzer 104 may include a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP) or other type of processing device.
- the processor 105 may be a component in any one of a variety of systems.
- the processor 105 may be part of a standard personal computer or a workstation.
- the processor 105 may be one or more general processors, digital signal processors, application specific integrated circuits, field programmable gate arrays, servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data.
- the processor 105 may operate in conjunction with a software program, such as code generated manually (i.e., programmed).
- the processor 105 may be coupled with a memory 107 , or the memory 107 may be a separate component.
- the interface 110 and/or the software 108 may be stored in the memory 107 .
- the memory 107 may include, but is not limited to computer readable storage media such as various types of volatile and non-volatile storage media, including to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like.
- the memory 107 includes a random access memory for the processor 105 .
- the memory 107 is separate from the processor 105 , such as a cache memory of a processor, the system memory, or other memory.
- the memory 107 may be an external storage device or database for storing recorded image data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store image data.
- the memory 107 is operable to store instructions executable by the processor 105 .
- the functions, acts or tasks illustrated in the figures or described herein may be performed by the programmed processor executing the instructions stored in the memory 107 .
- the functions, acts or tasks are independent of the particular type of instruction set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firm-ware, micro-code and the like, operating alone or in combination.
- processing strategies may include multiprocessing, multitasking, parallel processing and the like.
- the processor 105 is configured to execute the software 108 .
- the software 108 may include instructions for analyzing and converting search queries and comparing features with other queries or search keywords.
- the interface 110 may be a user input device or a display.
- the interface 110 may include a keyboard, keypad or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to interact with the language analyzer 104 .
- the interface 110 may include a display coupled with the processor 105 and configured to display an output from the processor 105 .
- the display may be a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, a cathode ray tube (CRT), a projector, a printer or other now known or later developed display device for outputting determined information.
- LCD liquid crystal display
- OLED organic light emitting diode
- CRT cathode ray tube
- projector a printer or other now known or later developed display device for outputting determined information.
- the display may act as an interface for the user to see the functioning of the processor 105 , or as an interface with the software 108 for providing input parameters.
- the interface 110 may allow a user to interact with the language analyzer 104 to establish a conversion of a user query and the features that are compared in matching a query with a search keyword.
- any of the components in system 100 may be coupled with one another through a network.
- the language analyzer 104 may be coupled with the search engine 102 , search log database 112 , or ad server 103 via a network.
- Any of the components in system 100 may include communication ports configured to connect with a network.
- the present disclosure contemplates a computer-readable medium that includes instructions or receives and executes instructions responsive to a propagated signal, so that a device connected to a network can communicate voice, video, audio, images or any other data over a network.
- the instructions may be transmitted or received over the network via a communication port or may be a separate component.
- the communication port may be created in software or may be a physical connection in hardware.
- the communication port may be configured to connect with a network, external media, display, or any other components in system 100 , or combinations thereof.
- the connection with the network may be a physical connection, such as a wired Ethernet connection or may be established wirelessly as discussed below.
- the connections with other components of the system 100 may be physical connections or may be established wirelessly.
- the network or networks that may connect any of the components in the system 100 to enable communication of data between the devices may include wired networks, wireless networks, or combinations thereof.
- the wireless network may be a cellular telephone network, a network operating according to a standardized protocol such as IEEE 802.11, 802.16, 802.20, published by the Institute of Electrical and Electronics Engineers, Inc., or a WiMax network.
- the network(s) may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols.
- the network(s) may include one or more of a local area network (LAN), a wide area network (WAN), a direct connection such as through a Universal Serial Bus (USB) port, and the like, and may include the set of interconnected networks that make up the Internet.
- the network(s) may include any communication method or employ any form of machine-readable media for communicating information from one device to another.
- the ad server 103 or the search engine 102 may provide pages to the user device 106 over a network, such as the network 109 .
- the network or networks described above, including the network 109 may be the network discussed below with respect to FIG. 6 .
- the ad server 103 , the search engine 102 , the search log database 112 , the language analyzer 104 , the unit dictionary 116 and/or the user device 106 may represent computing devices of various kinds, such as the components described with respect to FIG. 6 .
- Such computing devices may generally include any device that is configured to perform computation and that is capable of sending and receiving data communications by way of one or more wired and/or wireless communication interfaces.
- Such devices may be configured to communicate in accordance with any of a variety of network protocols, as discussed above.
- the user device 106 may be configured to execute a browser application that employs HTTP to request information, such as a web page, from the search engine 102 or ad server 103 .
- the present disclosure contemplates a computer-readable medium that includes instructions or receives and executes instructions responsive to a propagated signal, so that any device connected to a network can communicate voice, video, audio, images or any other data over a network.
- FIG. 2 illustrates an embodiment of a language analyzer.
- the language analyzer 104 may convert a search query into a different form for comparing its features with other queries or search keywords that are used for selecting matching advertisements to be displayed on a search results page.
- the language analyzer 104 may include a receiver 202 , a converter 204 , a comparator 206 , and a calculator 208 .
- the language analyzer 104 or any of its components may represent computing devices of various kinds, such as the components described with respect to FIG. 6 .
- the receiver 202 may receive a user query from the search engine 102 , which may receive the user query from the user device 106 .
- the receiver 202 may also receive search keywords from the ad server 103 .
- the search keywords may be matched with advertisements, such that when a user inputs the search keyword in a search engine, the search results page includes the matched advertisement.
- the language analyzer 104 may match user queries with search keywords for selecting advertisements to be displayed on the search query results page.
- the converter 204 is coupled with the receiver 202 .
- the converter 204 receives the user query or other search keywords and converts them into a different form for comparison.
- the user query may be a Chinese related query and the converter 204 may convert the Chinese related query into a different form to aid comparison.
- a Chinese related query may include any Chinese characters, including Roman characters that represent a Chinese character or phrase.
- Chinese related queries may also include queries that originate from or are received by a Chinese search engine and may be simplified Chinese and/or traditional Chinese.
- FIG. 3 illustrates exemplary conversion forms.
- the converter 204 may utilize any of the conversion forms 302 to convert a Chinese related query.
- the converter 204 may convert a search query into any of the conversion forms 302 to compare the query with other converted queries or converted search terms.
- the conversion may include a transformation of the query by adding, deleting, and/or substituting characters or words in the queries.
- the conversion or transformation may result in a common format or common form that may be used for comparing the queries.
- the conversion forms 302 shown in FIG. 3 are merely exemplary. In alternate embodiments, there may be additional conversions forms 302 that are not illustrated or described.
- the conversion may receive a Chinese related query and convert each element or selected elements of the query into an array that represents the converted form of the Chinese related query.
- a first conversion form is a conversion into Chinese soundex 304 .
- the Chinese characters are converted into pinyin without tone, while the roman letters remain.
- the query is then converted into a Chinese soundex-like representation by first retaining the first letter of a string. Second, all occurrences of a, e, h, i, o, and u are removed, unless it is the first letter.
- characters may be replaced, such as, replacing “zh” with “z,” “ch” with “c,” “sh” with “s,” “ng” with “n,” “rd” with “d,” “rl” with “l,” “rn” with “n,” “rs” with “s,” and/or “rt” with “t.”
- Fifth if two or more letters are adjacent, then the first letter remains and the others are omitted. Sixth, the spaces are removed. Seventh, all characters remaining are returned.
- a second conversion form is converting the Chinese characters into the keyboard input form zhuyin (Bopomofo) 306 .
- Each element in the array is either all zhuyin characters for one corresponding Chinese character or a roman character originally in the query without transformation.
- a third conversion form is a similar zhuyin (Bopomofo) conversion 308 , except each element in the array is either one zhuyin character or a roman character originally in the query without transformation.
- a fourth conversion form is converting Chinese characters into radicals 310 .
- Each element in the array is either the radical for a Chinese character or the roman character originally in the query without transformation.
- a radical 310 may be the semantic root (i.e., portion bearing the meaning) of a Chinese character.
- a radical may be part of a Chinese character and/or the semantic component of this Chinese character. For example, in the character pronounced as jie with a meaning of “sister”, the left part (pronounced n ⁇ umlaut over ( ⁇ hacek over (u) ⁇ ) ⁇ in Mandarin Chinese) is the semantic component.
- Chinese characters may have at least one or two radicals.
- the radicals may be used for Chinese Hanzi.
- a dictionary may be used to match a Chinese character with its radical(s). When a Chinese character has multiple radicals, the most meaningful radical (which may be identified in a dictionary) may be considered for comparison.
- a fifth conversion form is converting Chinese characters into pinyin without tone 412 .
- Each element in the array is either the complete pinyin without tone for one corresponding Chinese character or a roman character originally in the query without any transformation.
- a sixth conversion form is converting Chinese characters into pinyin without tone 414 in which each element in the array is either one pinyin character or a roman character originally in the query without transformation.
- Pinyin may be a Standard Mandarin Romanization system. In pinyin, the pin refers to a “spelling” and the yin refers to a “sound.” There may be a pinyin corresponding to each Chinese Character. One pinyin may include more than two roman characters.
- each pinyin may be a unit for similarity comparison.
- each character within pinyin may be a unit for comparison.
- a seventh conversion form is converting Chinese characters into pinyin with tone 416 . Each element in the array is either the complete pinyin and its tone for one corresponding Chinese character or a roman character originally in the query without transformation.
- An eighth conversion form is converting Chinese characters into pinyin with tone 418 in which each element in the array is either one pinyin character, its tone, or a roman character originally in the query without transformation.
- a ninth conversion form is converting queries into two character-based arrays 420 . In particular, if a character is Chinese, three bytes in Chinese (utf8) is an element in the array. In other words, each Chinese character is represented in three bytes. If a character is roman, then the roman character itself is an element.
- a tenth conversion form is the removal of Chinese characters 422 .
- the roman characters are left in the query and the Chinese characters are removed.
- an eleventh conversion form removes the roman characters 424 , and keeps the Chinese characters in the query.
- a twelve conversion form includes leaving the query as inputted 426 . In other words, the twelve conversion is no conversion 426 .
- the receiver 202 receives two queries that are to be compared to determine the similarity between those queries.
- the queries are converted into at least one of the conversion forms by the converter 204 .
- both queries are converted into the twelve exemplary conversion forms 302 and the queries are compared in all twelve converted forms.
- certain conversion forms are selected for converting the queries and the queries are compared for each of those converted forms.
- the queries may be compared by the comparator 206 .
- the comparator 206 may be configured to perform comparison of a user's search query with other queries or with search keywords that are used by the ad server 103 for displaying relevant advertisements that are linked to particular search keywords.
- the comparator 206 determines the similarity between two queries.
- the queries are first converted into a similar form or similar forms by the converter 204 and each of those forms are compared by the comparator 206 .
- the queries are converted into the twelve forms illustrated in FIG. 3 and the comparator 206 makes twelve comparisons between the queries for each of the twelve conversions of each query. In alternative embodiments, there may be more or fewer conversion forms that are compared by the comparator 206 .
- a user query may be compared with a candidate set of queries to determine which of the candidate set is most similar to the user query.
- the candidate set may be made up of search keywords which are compared with the user query to determine which search keyword is most similar.
- the candidate set of queries or keywords for comparison may be chosen based on an initial analysis of the user query compared with the search log database 112 .
- when the user query is received the candidate set is identified and each member of the candidate set is compared with the user query to determine which is most similar.
- a similarity score may be calculated for each member of the candidate set that represents a similarity with the user query.
- the member of the candidate set with the closest similarity score may be most similar to the user query.
- the candidate set may include one query or include all queries, such as those stored in the search log database 112 .
- FIG. 4 illustrates exemplary comparisons of queries.
- the comparator 206 may utilize comparison features 402 when comparing queries.
- the comparison features 402 shown in FIG. 4 are merely exemplary. In alternate embodiments, there may be additional comparison features 402 that are not illustrated or described.
- the comparison may involve comparing various forms of converted Chinese related queries.
- the comparator 206 may compare an array of elements that is generated by the converter 204 as a converted form of a Chinese related query.
- the comparator 402 may compare queries as described in the commonly owned U.S. application entitled, “SYSTEM AND METHOD FOR IDENTIFYING RELATED QUERIES FOR LANGUAGES WITH MULTIPLE WRITING SYSTEMS,” U.S. Pat. Pub. No. 2007/0203894, filed Feb. 28, 2006, the disclosure of which is hereby incorporated by reference.
- a first comparison feature may be an edit distance 404 between two queries.
- the edit distance may be a measure of the difference between two character strings, such as queries.
- the edit distance may be a minimum number of edit operations required to transform the first query into the second query.
- the edit operation may include inserting or deleting a character into a string or replacing a character by another character.
- weights may be assigned for different edit operations. For example, a higher weight may be placed on replacing the character s by the character p, than on replacing it by the character a.
- the edit distance may be the Levenshtein distance or the Damerau-Levenshtein distance when a transposition of characters counts as a single edit operation.
- a second comparison feature may be an edit distance without a domain 406 .
- the domain may be a web domain, such as “.com” that is removed. The removal of the domain may be helpful because a user querying “yahoo.com” and “yahoo.net” is likely making the same query.
- a third comparison feature may be a character level prefix overlap 408 .
- the character level prefix overlap 408 may be a measure of the characters/words that are the same at the beginning of the queries. For example, “auto cleaners” and “auto cleaning” have a prefix overlap of “auto clean.” The prefix overlap may indicate increased similarity.
- a fourth comparison feature may be a character level suffix overlap 410 .
- the character level suffix overlap 410 measures the similarity between queries at the end of the query. For example, “auto insurance agent” and “home insurance agent” share a suffix overlap of “insurance agent.” Similar, to the prefix overlap, the suffix overlap may indicate increased similarity.
- a fifth comparison feature may be a minimum edit distance 412 over all the conversion forms.
- a sixth comparison feature may be a maximum edit distance 414 over all the conversion forms. Given twelve conversion forms and twelve edit distances for each conversion, the minimum edit distance 412 and the maximum edit distance 414 may be identified. In one embodiment, the minimum and maximum may be removed as outliers. Alternatively, the minimum or maximum may be weighted higher when computing a similarity score.
- a seventh comparison feature may be a minimum edit distance without a domain 416 and an eighth comparison feature may be a maximum edit distance without a domain 418 . As discussed above, the domains in a query may not be valuable in terms of determining what the user is searching for, so the domains are removed before comparison.
- Additional comparison features may be a word level edit distance 420 , a word level prefix overlap 422 , or a word level suffix overlap 424 .
- the word level comparisons are similar to the character level comparisons, except entire words are compared rather than individual characters.
- a length difference 426 between two queries may also be used for comparing.
- the comparator 206 may be coupled with a calculator 208 that may calculate a similarity score.
- the similarity score may be a measure of the similarity between the queries.
- the similarity score may be calculated based on individual comparisons of different conversion forms of two queries with each individual comparison being assigned a weighted value.
- the multiple conversion forms described with respect to FIG. 3 may each result in a separate comparison between two queries. Accordingly, using the twelve conversion forms 302 , there may be twelve different edit distances or similarity scores, one for each conversion. Those twelve converted forms may be compared and multiplied by a weight for each form to get an overall similarity score between the queries. Alternatively, a subset of the twelve conversion forms or additional conversion forms not described may be utilized to convert Chinese related queries into different forms for comparison.
- the equation presented in Table A may be used to calculate a similarity score indicating the strength of similarity between a query pair.
- the query pair may include a given query q and a comparison query MODS(q), either of which may be written according to one or more Chinese writing systems.
- MODS(q) may represent a converted query.
- both q and MODS(q) may be converted to the same form for comparison, or MODS(q) is converted into a form for comparison with q.
- MODS(q) may represent a related query that is identified as a potential substitute for the user query q.
- MODS(q) When MODS(q) has good similarity score with the user original query q, MODS(q) may be used as a search keyword for fetching advertisements.
- MODS(q) may also be referred to as a rewritten query. Both user original query q and MODS(q) may be converted to the same form for comparison.
- the equation in Table A makes use of a subset of the conversion forms 302 and the comparison features 402 that are discussed above. In alternative embodiments, different conversion forms or comparison features may be utilized to generate a similarity score.
- the equation illustrated in Table A is merely exemplary and may be modified so as to provide for the calculation of a similarity score for multiple writing systems.
- a formula may be optimized based on the source of the query, because queries from Taiwan may be different from queries from Hong Kong. Accordingly, the conversions, comparisons, and weights may be modified for different types of queries.
- q represents a given query written according to one or more Chinese writing systems and MODS(q) represents a query selected from a candidate set of potential queries related to query Q.
- query q may be referred to as query q 1 and MODS(q) may be referred to as query q 2 or q′.
- the initial number before each feature is a weight that may be used to emphasize or deemphasize features.
- Pq 12 min may be a function for calculating the query substitution probability of query q 1 following query q 2 in a log of user query sessions, such as from the search log database 112 .
- the search log database 112 may identify the order of the one or more queries submitted by the user, for example, to provide an indication of how the user refined a query, how the user rewrote a query, how the user utilized one or more alternate writing systems of a language with multiple writings systems to express a query Q, etc.
- queries q 1 and q 2 follow one another in a search log database 112 , it may be an indication that they are similar because q 2 may be a refinement of q 1 .
- the pq 12 min function calculates a query substitution probability of a given query q 1 following a given query q 2 , and may also be used to calculate a unit substitution of a unit u following a given unit u′.
- pq 12 min prob(U_i ⁇ >U_i′
- pq 12 min may be the normalized probability of q 2 as q 1 's substitution.
- a normalized probability is computed of the units in q 1 substituted by corresponding units in q 2 , and take their minimum as pq 12 min.
- Levroman is a comparison using the roman characters of a query, such as with conversion form 322 , which removes Chinese characters. For each query all non-roman characters may be removed, but spaces are left in the query. The roman character parts are changed into arrays. Each roman character is an element in the array, including any spaces. The Levenshtein distance is measured between the two arrays. In the case that neither q 1 nor q 2 has roman characters, levroman is set to 0. In the case that one of q 1 or q 2 has roman characters but the other does not have roman characters, levroman is set to 1.
- the first query does not include a space before map, but the second query includes a space before map.
- the queries are converted into arrays, in which q 1 is represented as the array: and q 2 is represented as the array:
- q 1 is represented as the array:
- q 2 is represented as the array:
- Agreechar may relate to character agreement without removing a space regardless of the order of characters. Agreechar may be similar to wordr discussed below, except it is for the character level rather than the word level.
- q 1 and q 2 have 7 unique characters in total, which are “m”, “a”, “p” and a space.
- Query q 1 and q 2 share 5 unique characters, which are “m”, “a” and “p”. Therefore, agreechar is 0.714 (calculated by 5/7) for this query pair.
- Wordr is similar to agreechar except is matches words rather than characters.
- the queries are separated into words, segments, or units as described above.
- the percentage of unique words not in common is determined for wordr.
- Dlevpynchar utilizes the complete pinyin without tone 312 conversion form.
- the first query q 1 and second query q 2 first have a common domain removed and each roman character (including spaces) are kept, while each Chinese character is converted into pinyin without tone.
- the queries are then transformed into arrays. Each roman character is an element in the array and each Chinese character's pinyin without tone is an element in the array.
- the Levenshtein distance is then measured. In the example described above, when query q 1 map” and query q 2 map” where there is no space in query q 1 , but there is a space in query q 2 .
- the first query q 1 is converted into an array:
- the second query q 2 is converted into an array:
- Q 1 bidded is 1 if q 1 is bidded and q 1 bidded is 0 if q 1 is not bidded.
- q 1 is a user query and q 1 is bidded, it may mean that an advertiser chooses q 1 as a keyword for the advertisements they want to show. This bidding process may also identify a cost they would like to pay if web searchers click the ads fetched by the keyword.
- a query identifying system may identify a related query (e.g. MODS(q)) to substitute for the user query.
- Q 2 hasroman is 1 if q 2 contains any roman characters, but not including any spaces.
- Q 2 hasroman is 0 if q 2 does not contain any roman characters.
- the queries that are analyzed may be from Chinese search engine or in a search engine that receives Chinese related queries.
- a Chinese search engine may receive queries with roman characters due to the usage of roman characters in Chinese and the popularity of roman character based languages such as English.
- the Chinese characters and roman characters maybe processed differently. For example, a Chinese character may be converted into Pinyin for a similarity comparison, while Roman characters are not converted into Pinyin. Accordingly, a similarity score computation may be adjusted based on the presence of Roman characters.
- Pq 21 max may be a function for calculating the query substitution probability of query q 1 following query q 2 in a log of user query sessions, such as from the search log database 112 .
- pq 21 max prob(U_i ⁇ >U_i′
- the normalized probability may be calculated according above equation for each unit pair in the query pair and the maximum is used as pq 21 max.
- Lengthdiffn is the length difference in characters between q 1 and q 2 , which is normalized by their maximum length in characters.
- Entropy 21 min is an uncertainty that may be associated with a similarity between q 1 and q 2 .
- entropy ⁇ ⁇ 21 ⁇ min ⁇ i ⁇ ( freq ⁇ ( q 1 ⁇ q 2 i ) / freq ⁇ ( q 2 i ) ) ⁇ log ⁇ ( ( freq ⁇ ( q 1 ⁇ q 2 i ) / freq ⁇ ( q 2 i ) ) ) , where i is the number of possible q 1 query substitutions with q 2 .
- entropy ⁇ ⁇ 21 ⁇ min min j ⁇ ⁇ i ⁇ ( freq ⁇ ( q 1 ⁇ j ⁇ q 2 ⁇ j i ) / freq ( q 2 ⁇ j i ⁇ ) ) i ⁇ log ⁇ ( ( freq ⁇ ( q 1 ⁇ j ⁇ q 2 ⁇ j i ) / freq ⁇ ( q 2 ⁇ j i ) ) ) , where j is the number of unit substitution between q 1 and q 2 , and i is the number of possible q 1 j 's unit substitutions.
- LenthsubtminGT 3 utilizes a substitution of characters.
- lengthsubstminGT 3 is 1 if the minimum length of q 1 and q 2 is less than 3 in characters. Otherwise, lengthsubstminGT 3 is 0.
- lengthsubstminGT 3 is 1 if the minimum length of any of the substitution units in characters is greater than 3. Otherwise, lengthsubstminGT 3 is 0.
- Query suggestion may refer to a generation of related queries based on an original user query. The user query may be broken into units as described above. A related unit may be found for each unit and combined to form a related query.
- “New York hotel” when a user enters a query for “New York hotel,” it may be split into two units “New York” and “hotel.” “New York” may be rewritten to a related query “Manhattan” and “hotel” may be rewritten to “motel.” Accordingly, “Manhattan motel” may be a related candidate query for an original user query of “New York hotel.”
- the equation in Table A and the corresponding features that are used to calculate a similarity score in the calculator 208 are exemplary.
- a different equation, different weights and different features may be utilized to compute a similarity score.
- the edit distance may be computed for each of the comparison forms 302 and averaged to become the similarity score.
- weights may be added to each converted form, or additional comparison features 402 may be used.
- the equation that is used to determine the similarity score is analyzed by comparing with a human or editorial control set.
- the editorial control set may include a human review of the similarity scores for pairs of queries to determine an accuracy of the equation used for calculating the similarity score.
- the human review may be used to optimize the equation that calculates the similarity score.
- Human editors may label query pairs with a relevance score.
- the relevance score may be used as a training label for the similarity score calculation, such as for the weights used in the equation in Table A.
- the editorial score may be a response variable and/or a dependent variable.
- the model may be fitted using linear regression.
- FIG. 5 is an illustration for identifying related queries.
- a user query is received.
- the user query may be Chinese-related and include at least one Chinese character.
- the user query may be received by a search engine 102 .
- the user query may be compared with a selected candidate set of queries or search keywords in block 504 .
- the candidate set may be selected form the search log database 112 .
- the candidate set may be chosen based on an initial comparison of similarity with the user query.
- the user query and/or the candidate set of queries may be converted into a different form or format for comparison, such as the conversion forms 302 .
- the user query and a member of the candidate set are compared in block 508 .
- a similarity score is calculated to measure a similarity between the user query and the member of the candidate set.
- the similarity score may be based on utilizing any of the comparison features 402 for comparing a converted form of the user query with a converted form of the member.
- another comparison at block 508 occurs for another member from the candidate set and continues until all members of the candidate set have been compared and have a similarity score.
- the similarity scores between the candidate set may be reviewed to identify the member of the candidate set with the closest similarity score to the user query.
- the identification of a similar member such as a similar search keyword, may be used to identify which advertisements to display for sponsored searching.
- an illustrative embodiment of a general computer system is shown and is designated 600 .
- the user device 106 , ad server 103 , the search engine 102 , the search log database 112 , the data source 113 , the unit dictionary 116 , and/or the language analyzer 104 may be a computer or computing devices, such as the computer system 600 or any of its components.
- the computer system 600 can include a set of instructions that can be executed to cause the computer system 600 to perform any one or more of the methods or computer based functions disclosed herein.
- the computer system 600 may operate as a standalone device or may be connected, e.g., using a network, to other computer systems or peripheral devices.
- the computer system may operate in the capacity of a server or as a client user computer in a server-client user network environment, or as a peer computer system in a peer-to-peer (or distributed) network environment.
- the computer system 600 can also be implemented as or incorporated into various devices, such as a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile device, a palmtop computer, a laptop computer, a desktop computer, a communications device, a wireless telephone, a land-line telephone, a control system, a camera, a scanner, a facsimile machine, a printer, a pager, a personal trusted device, a web appliance, a network router, switch or bridge, or any other machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine.
- the computer system 600 can be implemented using electronic devices that provide voice, video or data communication.
- the term “system” shall also be taken to include any collection of systems or sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions.
- the computer system 600 may include a processor 602 , e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both.
- the processor 602 may be a component in a variety of systems.
- the processor 602 may be part of a standard personal computer or a workstation.
- the processor 602 may be one or more general processors, digital signal processors, application specific integrated circuits, field programmable gate arrays, servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data.
- the processor 602 may implement a software program, such as code generated manually (i.e., programmed).
- the computer system 600 may include a memory 604 that can communicate via a bus 608 .
- the memory 604 may be a main memory, a static memory, or a dynamic memory.
- the memory 604 may include, but is not limited to computer readable storage media such as various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like.
- the memory 604 includes a cache or random access memory for the processor 602 .
- the memory 604 is separate from the processor 602 , such as a cache memory of a processor, the system memory, or other memory.
- the memory 604 may be an external storage device or database for storing data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store data.
- the memory 604 is operable to store instructions executable by the processor 602 .
- the functions, acts or tasks illustrated in the figures or described herein may be performed by the programmed processor 602 executing the instructions stored in the memory 604 .
- processing strategies may include multiprocessing, multitasking, parallel processing and the like.
- the computer system 600 may further include a display unit 614 , such as a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, a cathode ray tube (CRT), a projector, a printer or other now known or later developed display device for outputting determined information.
- a display unit 614 such as a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, a cathode ray tube (CRT), a projector, a printer or other now known or later developed display device for outputting determined information.
- the display 614 may act as an interface for the user to see the functioning of the processor 602 , or specifically as an interface with the software stored in the memory 604 or in the drive unit 606 .
- the computer system 600 may include an input device 616 configured to allow a user to interact with any of the components of system 600 .
- the input device 616 may be a number pad, a keyboard, or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to interact with the system 600 .
- the computer system 600 may also include a disk or optical drive unit 606 .
- the disk drive unit 606 may include a computer-readable medium 610 in which one or more sets of instructions 612 , e.g. software, can be embedded. Further, the instructions 612 may embody one or more of the methods or logic as described herein. In a particular embodiment, the instructions 612 may reside completely, or at least partially, within the memory 604 and/or within the processor 602 during execution by the computer system 600 .
- the memory 604 and the processor 602 also may include computer-readable media as discussed above.
- the present disclosure contemplates a computer-readable medium that includes instructions 612 or receives and executes instructions 612 responsive to a propagated signal, so that a device connected to a network 620 can communicate voice, video, audio, images or any other data over the network 620 .
- the instructions 612 may be transmitted or received over the network 620 via a communication port 618 .
- the communication port 618 may be a part of the processor 602 or may be a separate component.
- the communication port 618 may be created in software or may be a physical connection in hardware.
- the communication port 618 is configured to connect with a network 620 , external media, the display 614 , or any other components in system 600 , or combinations thereof.
- the connection with the network 620 may be a physical connection, such as a wired Ethernet connection or may be established wirelessly as discussed below.
- the additional connections with other components of the system 600 may be physical connections or may be established wirelessly.
- the network 620 may include wired networks, wireless networks, or combinations thereof.
- the wireless network may be a cellular telephone network, an 802.11, 802.16, 802.20, or WiMax network.
- the network 620 may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols.
- While the computer-readable medium is shown to be a single medium, the term “computer-readable medium” includes a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions.
- the term “computer-readable medium” shall also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein.
- the computer-readable medium can include a solid-state memory such as a memory card or other package that houses one or more non-volatile read-only memories. Further, the computer-readable medium can be a random access memory or other volatile re-writable memory. Additionally, the computer-readable medium can include a magneto-optical or optical medium, such as a disk or tapes or other storage device to capture carrier wave signals such as a signal communicated over a transmission medium. A digital file attachment to an e-mail or other self-contained information archive or set of archives may be considered a distribution medium that is a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a computer-readable medium or a distribution medium and other equivalents and successor media, in which data or instructions may be stored.
- dedicated hardware implementations such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the methods described herein.
- Applications that may include the apparatus and systems of various embodiments can broadly include a variety of electronic and computer systems.
- One or more embodiments described herein may implement functions using two or more specific interconnected hardware modules or devices with related control and data signals that can be communicated between and through the modules, or as portions of an application-specific integrated circuit. Accordingly, the present system encompasses software, firmware, and hardware implementations.
- the methods described herein may be implemented by software programs executable by a computer system.
- implementations can include distributed processing, component/object distributed processing, and parallel processing.
- virtual computer system processing can be constructed to implement one or more of the methods or functionality as described herein.
- inventions of the disclosure may be referred to herein, individually and/or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any particular invention or inventive concept.
- inventions merely for convenience and without intending to voluntarily limit the scope of this application to any particular invention or inventive concept.
- specific embodiments have been illustrated and described herein, it should be appreciated that any subsequent arrangement designed to achieve the same or similar purpose may be substituted for the specific embodiments shown.
- This disclosure is intended to cover any and all subsequent adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the description.
Abstract
Description
- This application is a continuation-in-part application to U.S. patent application Ser. No. 11/363,315 (U.S. Pat. Pub. No. 2007/0203894), entitled “SYSTEM AND METHOD FOR IDENTIFYING RELATED QUERIES FOR LANGUAGES WITH MULTIPLE WRITING SYSTEMS,” filed Feb. 28, 2006, the disclosure of which is hereby incorporated by reference.
- Online advertising may be an important source of revenue for enterprises engaged in electronic commerce. A number of different kinds of web page based online advertisements are currently in use, along with various associated distribution requirements, advertising metrics, and pricing mechanisms. Processes associated with technologies such as Hypertext Markup Language (HTML) and Hypertext Transfer Protocol (HTTP) enable a web page to be configured to contain a location for inclusion of an advertisement. A page may not only be a web page, but any other electronically created page or document. An advertisement can be selected for display each time the page is requested, for example, by a browser or server application.
- Online advertising may be linked to online searching. Online searching is a common way for consumers to locate information, goods, or services on the Internet. A consumer may use an online search engine to type in a query to search for other pages or web sites with information related to that query. When the advertising that is shown on the search engine page is related to the query, the search may be referred to as a sponsored search. Sponsored searching may require advertisers to bid for search keywords. The search keywords are associated with the search query for displaying advertisements with the search results. It may be difficult to identify which keyword(s) that a search query is related to. In particular, users may enter search queries that are misspelled or that are in a different language.
- The system and method may be better understood with reference to the following drawings and description. Non-limiting and non-exhaustive embodiments are described with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the invention. In the drawings, like referenced numerals designate corresponding parts throughout the different views.
-
FIG. 1 is a block diagram of an exemplary network system; -
FIG. 2 is a block diagram of a language analyzer; -
FIG. 3 is a block diagram of exemplary conversion forms; -
FIG. 4 is a block diagram of exemplary comparisons of queries; -
FIG. 5 is a flow diagram for identifying related queries; and -
FIG. 6 is a block diagram of a general computer system for use with the disclosed embodiments. - By way of introduction, the embodiments described below include a system and method for identifying and measuring related queries. The embodiments relate to identifying similar Chinese queries. A user query may be compared with known search keywords or other search queries. The search keywords may be used by advertisers for sponsored searching. The user query may be a non-native language query, such as a Chinese related query in an English language website or a query in a Chinese website. The user query is converted into a different form before comparing with other converted queries or the search keywords. For explanation purposes, the embodiments are described in terms of a Chinese related query, but other languages or query platforms may be used. A similarity score based on various features may be used for comparing the queries. Based on the similarity score or other comparison features, the original user query may be substituted by other queries or be associated with one or more search keywords. The associated search keywords may be used for selecting the advertisements that are displayed with the search results for that search query.
- Alternatively, related queries may be identified from a reformulation of the original query. The reformulation may be based on stored query logs and used to compare the original query with stored queries. As part of the comparison, various features, including language specific features, may be used to measure query similarity. Based on the query similarity the original query may be substituted for a stored query or search keyword for identifying the relevant advertisements to display. A user's query may be misspelled and the system may identify a related query that is correctly spelled that replaces the initial user query. Chinese related queries may be identified and measured due to an increased interest in Chinese search and advertising markets.
- Other systems, methods, features and advantages will be, or will become, apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the invention, and be protected by the following claims. Nothing in this section should be taken as a limitation on those claims. Further aspects and advantages are discussed below in conjunction with the embodiments.
-
FIG. 1 provides a simplified view of anetwork system 100 in which the present embodiments may be implemented. Not all of the depicted components may be required, however, and some embodiments of the invention may include additional components not shown in the figure. Variations in the arrangement and type of the components may be made without departing from the spirit or scope of the claims as set forth herein. Additional, different or fewer components may be provided. -
FIG. 1 is a block diagram illustrating an embodiment of anexemplary network system 100 for language analysis and comparison. In particular,system 100 includes alanguage analyzer 104 that may receive and convert a user's search query for comparison with other queries or search keywords. Auser device 106 is coupled with asearch engine 102 through thenetwork 109. Thesearch engine 102 is coupled with asearch log database 112, and both are coupled with thelanguage analyzer 104. Thesearch log database 112 is coupled with adata source 113 and aunit dictionary 116. Anad server 103 may be coupled with thesearch engine 102 and/or coupled with thelanguage analyzer 104. Herein, the phrase “coupled with” is defined to mean directly connected to or indirectly connected through one or more intermediate components. Such intermediate components may include both hardware and software based components. Variations in the arrangement and type of the components may be made without departing from the spirit or scope of the claims as set forth herein. - The
user device 106 may be a computing device for a user to connect to anetwork 109, such as the Internet. Examples of a user device include but are not limited to a personal computer, personal digital assistant (“PDA”), cellular phone, or other electronic device. Theuser device 106 may be configured to access other data/information in addition to web pages over thenetwork 109 with a web browser, such as INTERNET EXPLORER® (sold by Microsoft Corp., Redmond, Wash.). Theuser device 106 may enable a user to view pages over thenetwork 109, such as the Internet. Theuser device 106 may be the user device described below with respect toFIG. 6 . - The
user device 106 may be configured to allow a user to interact with thesearch engine 102, thead server 103, or other components of thesystem 100. In one embodiment, theuser device 106 may receive and display a site or page provided by thesearch engine 102. Theuser device 106 may include a keyboard, keypad or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to allow a user to interact with the page(s) provided by thesearch engine 102 and/or thead server 103. - The
search engine 102 is coupled with theuser device 106 through thenetwork 109, as well as being coupled with thelanguage analyzer 104, thead server 103 and/or thesearch log database 112. In one embodiment, thesearch engine 102 is a web server. Thesearch engine 102 may provide a site or a page over a network, such as thenetwork 109 or the Internet. A site or page may refer to a web page or a series of related web pages which may be received or viewed over a network. The site or page is not limited to a web page, and may include any information accessible over a network that may be displayed at theuser device 106. In one embodiment, a site may refer to a series of pages which are linked by a site map. For example, the web site of www.yahoo.com (operated by Yahoo! Inc., in Sunnyvale, Calif.) may include thousands of pages, which are included at yahoo.com. Hereinafter, a page will be described as a web page, a web site, or any other site/page accessible over a network. A user of theuser device 106 may access a page provided by thesearch engine 102 over thenetwork 109. As described below, the page provided by thesearch engine 102 may be a search page that receives a search query from theuser device 106 and provides search results that are based on the received search query. - The
search engine 102 may include an interface, such as a web page, e.g., the web page which may be accessed on the World Wide Web at yahoo.com, which is used to search for pages which are accessible via thenetwork 109. Theuser device 106, autonomously or at the direction of the user, may input a search query (also referred to as a user query, original query, search term or a search keyword) for thesearch engine 102. A single search query may include multiple words or phrases. Thesearch engine 102 may perform a search for the search query and display the results of the search on theuser device 106. The results of a search may include a listing of related pages or sites that is provided by thesearch engine 102 in response to receiving the search query. - The
ad server 103 is coupled with thesearch engine 102 and/or thelanguage analyzer 104. Thead server 103 may be configured to provide advertisements to thesearch engine 102. In an alternate embodiment, thesearch engine 102 and thead server 103 may be a common component and/or thesearch engine 102 may select and provide advertisements. Thead server 103 may include or be coupled with an advertisement database that includes advertisements that are available to be displayed by thesearch engine 102 for sponsored searching. In addition, the advertisements may be associated with one or more search keywords. The search keywords may be purchased or bid on by advertisers. Accordingly, when that search keyword is searched for, the advertiser who purchased or placed the highest bid is selected and their advertisement is displayed. Thead server 103 may include or be coupled with a database, such as an advertisement database, that stores search keywords and the respective price or bid for each keyword from advertisers that is referenced for each search query. In one embodiment, a search query is received and compared with known search keywords or other search queries when thead server 103 selects and provides the advertisement to thesearch engine 102. - The
search log database 112 includes records or logs of at least a subset of the search queries entered in thesearch engine 102 over a period of time and may also be referred to as a search query log, search term database, keyword database or query database. In one embodiment, thesearch log database 112 may store the search keywords that are used by thead server 103 in selecting an advertisement for a particular search query. Thesearch log database 112 may include search queries from any number of users over any period of time. Alternatively, thesearch log database 112 may include records or logs of a subset of the queries or requests for data entered at thesearch engine 102 over a period of time. Thesearch log database 112 may also store associations between search queries from thesearch engine 102. For example, a search query may be associated with a search keyword or other search queries after a conversion and comparison by thelanguage analyzer 104 as discussed below. - The
search log database 112 may also be coupled with adata source 113. Thedata source 113 may be an internal source of data, an external source of search data, or a combination of the two. An external data source may include search results from other search engines or other sources. For example, a search engine other thansearch engine 102 may be an external data source and provide search logs to thesearch log database 112. An internal data source may include search data or other data from thesearch engine 102. Other data may include other searching or web browsing tendencies identified by thesearch engine 102. - The
search log database 112 may also be coupled with aunit dictionary 116. Theunit dictionary 116 may be a database of user queries or search keywords that are coupled with one another as units. Units may also be referred to as concepts or topics and are sequences of one or more words that appear in search queries. For example, the search query “New York City law enforcement” may include two units, e.g. “New York City” may be one unit and “law enforcement” may be another unit. A unit is a phrase of common words that identify a single concept. As another example, the search query “Chicago art museums” may include two units, e.g. “Chicago” and “art museums.” The “Chicago” unit is a single word, and “art museums” is a two-word unit. Units identify common groups of keywords to maximize the efficiency and relevance of search results. Theunit dictionary 116 may include Chinese related queries, as well as Chinese related units that include Chinese characters. Categorization of search queries into units is discussed in commonly owned U.S. Pat. No. 7,051,023 issued May 23, 2006, entitled “SYSTEMS AND METHODS FOR GENERATING CONCEPT UNITS FROM SEARCH QUERIES,” which is hereby incorporated by reference. - The
unit dictionary 116 and the categorization of search queries into units may be used to compare and analyze search queries received by thesearch engine 102. A search query may be broken into units that are compared with units from other queries or search keywords. In one embodiment, past search queries and search keywords are stored in thesearch log database 112 as units that may be used in an analysis by thelanguage analyzer 104. - In one embodiment, the
ad server 103, thesearch engine 102 and/or thesearch log database 112 may be coupled with thelanguage analyzer 104. Thelanguage analyzer 104 receives a user query from theuser device 106 and matches or identifies other queries or search keywords. The user query may be converted to a different form for comparing various features of the user query with search keywords as discussed with respect toFIG. 2 . - The
language analyzer 104 may be a computing device as described below with respectFIG. 6 . In one embodiment, thelanguage analyzer 104 includes aprocessor 105,memory 107,software 108 and aninterface 110. Thelanguage analyzer 104 may be a separate component from thesearch engine 102 and thead server 103. In an alternative embodiment, any of thelanguage analyzer 104,search engine 102, and thead server 103 may be combined as a single component. Theinterface 110 may communicate with any of thesearch engine 102,search log database 112, andad server 103. In one embodiment, theinterface 110 may include a user interface configured to allow a user to interact with any of the components of thelanguage analyzer 104. For example, a user may be able to modify the conversion form or comparison features that are used by thelanguage analyzer 104. - The
processor 105 in thelanguage analyzer 104 may include a central processing unit (CPU), a graphics processing unit (GPU), a digital signal processor (DSP) or other type of processing device. Theprocessor 105 may be a component in any one of a variety of systems. For example, theprocessor 105 may be part of a standard personal computer or a workstation. Theprocessor 105 may be one or more general processors, digital signal processors, application specific integrated circuits, field programmable gate arrays, servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data. Theprocessor 105 may operate in conjunction with a software program, such as code generated manually (i.e., programmed). - The
processor 105 may be coupled with amemory 107, or thememory 107 may be a separate component. Theinterface 110 and/or thesoftware 108 may be stored in thememory 107. Thememory 107 may include, but is not limited to computer readable storage media such as various types of volatile and non-volatile storage media, including to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. In one embodiment, thememory 107 includes a random access memory for theprocessor 105. In alternative embodiments, thememory 107 is separate from theprocessor 105, such as a cache memory of a processor, the system memory, or other memory. Thememory 107 may be an external storage device or database for storing recorded image data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store image data. Thememory 107 is operable to store instructions executable by theprocessor 105. - The functions, acts or tasks illustrated in the figures or described herein may be performed by the programmed processor executing the instructions stored in the
memory 107. The functions, acts or tasks are independent of the particular type of instruction set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firm-ware, micro-code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing and the like. Theprocessor 105 is configured to execute thesoftware 108. Thesoftware 108 may include instructions for analyzing and converting search queries and comparing features with other queries or search keywords. - The
interface 110 may be a user input device or a display. Theinterface 110 may include a keyboard, keypad or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to interact with thelanguage analyzer 104. Theinterface 110 may include a display coupled with theprocessor 105 and configured to display an output from theprocessor 105. The display may be a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, a cathode ray tube (CRT), a projector, a printer or other now known or later developed display device for outputting determined information. The display may act as an interface for the user to see the functioning of theprocessor 105, or as an interface with thesoftware 108 for providing input parameters. In particular, theinterface 110 may allow a user to interact with thelanguage analyzer 104 to establish a conversion of a user query and the features that are compared in matching a query with a search keyword. - Any of the components in
system 100 may be coupled with one another through a network. For example, thelanguage analyzer 104 may be coupled with thesearch engine 102,search log database 112, orad server 103 via a network. Any of the components insystem 100 may include communication ports configured to connect with a network. The present disclosure contemplates a computer-readable medium that includes instructions or receives and executes instructions responsive to a propagated signal, so that a device connected to a network can communicate voice, video, audio, images or any other data over a network. The instructions may be transmitted or received over the network via a communication port or may be a separate component. The communication port may be created in software or may be a physical connection in hardware. The communication port may be configured to connect with a network, external media, display, or any other components insystem 100, or combinations thereof. The connection with the network may be a physical connection, such as a wired Ethernet connection or may be established wirelessly as discussed below. Likewise, the connections with other components of thesystem 100 may be physical connections or may be established wirelessly. - The network or networks that may connect any of the components in the
system 100 to enable communication of data between the devices may include wired networks, wireless networks, or combinations thereof. The wireless network may be a cellular telephone network, a network operating according to a standardized protocol such as IEEE 802.11, 802.16, 802.20, published by the Institute of Electrical and Electronics Engineers, Inc., or a WiMax network. Further, the network(s) may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols. The network(s) may include one or more of a local area network (LAN), a wide area network (WAN), a direct connection such as through a Universal Serial Bus (USB) port, and the like, and may include the set of interconnected networks that make up the Internet. The network(s) may include any communication method or employ any form of machine-readable media for communicating information from one device to another. For example, thead server 103 or thesearch engine 102 may provide pages to theuser device 106 over a network, such as thenetwork 109. The network or networks described above, including thenetwork 109, may be the network discussed below with respect toFIG. 6 . - The
ad server 103, thesearch engine 102, thesearch log database 112, thelanguage analyzer 104, theunit dictionary 116 and/or theuser device 106 may represent computing devices of various kinds, such as the components described with respect toFIG. 6 . Such computing devices may generally include any device that is configured to perform computation and that is capable of sending and receiving data communications by way of one or more wired and/or wireless communication interfaces. Such devices may be configured to communicate in accordance with any of a variety of network protocols, as discussed above. For example, theuser device 106 may be configured to execute a browser application that employs HTTP to request information, such as a web page, from thesearch engine 102 orad server 103. The present disclosure contemplates a computer-readable medium that includes instructions or receives and executes instructions responsive to a propagated signal, so that any device connected to a network can communicate voice, video, audio, images or any other data over a network. -
FIG. 2 illustrates an embodiment of a language analyzer. As described with respect toFIG. 1 , thelanguage analyzer 104 may convert a search query into a different form for comparing its features with other queries or search keywords that are used for selecting matching advertisements to be displayed on a search results page. Thelanguage analyzer 104 may include areceiver 202, aconverter 204, acomparator 206, and acalculator 208. As shown, thelanguage analyzer 104 or any of its components may represent computing devices of various kinds, such as the components described with respect toFIG. 6 . - The
receiver 202 may receive a user query from thesearch engine 102, which may receive the user query from theuser device 106. Thereceiver 202 may also receive search keywords from thead server 103. The search keywords may be matched with advertisements, such that when a user inputs the search keyword in a search engine, the search results page includes the matched advertisement. Accordingly, thelanguage analyzer 104 may match user queries with search keywords for selecting advertisements to be displayed on the search query results page. - The
converter 204 is coupled with thereceiver 202. Theconverter 204 receives the user query or other search keywords and converts them into a different form for comparison. As described, the user query may be a Chinese related query and theconverter 204 may convert the Chinese related query into a different form to aid comparison. A Chinese related query may include any Chinese characters, including Roman characters that represent a Chinese character or phrase. Chinese related queries may also include queries that originate from or are received by a Chinese search engine and may be simplified Chinese and/or traditional Chinese. -
FIG. 3 illustrates exemplary conversion forms. In particular, theconverter 204 may utilize any of the conversion forms 302 to convert a Chinese related query. Theconverter 204 may convert a search query into any of the conversion forms 302 to compare the query with other converted queries or converted search terms. As described below, the conversion may include a transformation of the query by adding, deleting, and/or substituting characters or words in the queries. The conversion or transformation may result in a common format or common form that may be used for comparing the queries. The conversion forms 302 shown inFIG. 3 are merely exemplary. In alternate embodiments, there may beadditional conversions forms 302 that are not illustrated or described. The conversion may receive a Chinese related query and convert each element or selected elements of the query into an array that represents the converted form of the Chinese related query. - A first conversion form is a conversion into
Chinese soundex 304. The Chinese characters are converted into pinyin without tone, while the roman letters remain. The query is then converted into a Chinese soundex-like representation by first retaining the first letter of a string. Second, all occurrences of a, e, h, i, o, and u are removed, unless it is the first letter. Third, characters may be replaced, such as, replacing “zh” with “z,” “ch” with “c,” “sh” with “s,” “ng” with “n,” “rd” with “d,” “rl” with “l,” “rn” with “n,” “rs” with “s,” and/or “rt” with “t.” Fourth, the remaining letters after the first letter are assigned a number, such as, (m, n, l)=1, (b, p)=2, (f, v, w, h)=3, (d, t)=4, (j, z, s, x, q, c, g, k)=5, (r)=6, (y)=7, and (a)=8. Fifth, if two or more letters are adjacent, then the first letter remains and the others are omitted. Sixth, the spaces are removed. Seventh, all characters remaining are returned. - A second conversion form is converting the Chinese characters into the keyboard input form zhuyin (Bopomofo) 306. Each element in the array is either all zhuyin characters for one corresponding Chinese character or a roman character originally in the query without transformation. A third conversion form is a similar zhuyin (Bopomofo)
conversion 308, except each element in the array is either one zhuyin character or a roman character originally in the query without transformation. - A fourth conversion form is converting Chinese characters into
radicals 310. Each element in the array is either the radical for a Chinese character or the roman character originally in the query without transformation. A radical 310 may be the semantic root (i.e., portion bearing the meaning) of a Chinese character. A radical may be part of a Chinese character and/or the semantic component of this Chinese character. For example, in the character pronounced as jie with a meaning of “sister”, the left part (pronounced n{umlaut over ({hacek over (u)})} in Mandarin Chinese) is the semantic component. Chinese characters may have at least one or two radicals. The radicals may be used for Chinese Hanzi. A dictionary may be used to match a Chinese character with its radical(s). When a Chinese character has multiple radicals, the most meaningful radical (which may be identified in a dictionary) may be considered for comparison. - A fifth conversion form is converting Chinese characters into pinyin without
tone 412. Each element in the array is either the complete pinyin without tone for one corresponding Chinese character or a roman character originally in the query without any transformation. A sixth conversion form is converting Chinese characters into pinyin withouttone 414 in which each element in the array is either one pinyin character or a roman character originally in the query without transformation. Pinyin may be a Standard Mandarin Romanization system. In pinyin, the pin refers to a “spelling” and the yin refers to a “sound.” There may be a pinyin corresponding to each Chinese Character. One pinyin may include more than two roman characters. In the fifth conversion form, each pinyin may be a unit for similarity comparison. In the sixth conversion form, each character within pinyin may be a unit for comparison. - A seventh conversion form is converting Chinese characters into pinyin with
tone 416. Each element in the array is either the complete pinyin and its tone for one corresponding Chinese character or a roman character originally in the query without transformation. An eighth conversion form is converting Chinese characters into pinyin withtone 418 in which each element in the array is either one pinyin character, its tone, or a roman character originally in the query without transformation. A ninth conversion form is converting queries into two character-basedarrays 420. In particular, if a character is Chinese, three bytes in Chinese (utf8) is an element in the array. In other words, each Chinese character is represented in three bytes. If a character is roman, then the roman character itself is an element. - A tenth conversion form is the removal of
Chinese characters 422. The roman characters are left in the query and the Chinese characters are removed. Likewise, an eleventh conversion form removes theroman characters 424, and keeps the Chinese characters in the query. A twelve conversion form includes leaving the query asinputted 426. In other words, the twelve conversion is noconversion 426. - In one embodiment, the
receiver 202 receives two queries that are to be compared to determine the similarity between those queries. The queries are converted into at least one of the conversion forms by theconverter 204. In one embodiment, both queries are converted into the twelve exemplary conversion forms 302 and the queries are compared in all twelve converted forms. Alternatively, certain conversion forms are selected for converting the queries and the queries are compared for each of those converted forms. - After being converted, the queries may be compared by the
comparator 206. Thecomparator 206 may be configured to perform comparison of a user's search query with other queries or with search keywords that are used by thead server 103 for displaying relevant advertisements that are linked to particular search keywords. In one embodiment, thecomparator 206 determines the similarity between two queries. The queries are first converted into a similar form or similar forms by theconverter 204 and each of those forms are compared by thecomparator 206. In one embodiment, the queries are converted into the twelve forms illustrated inFIG. 3 and thecomparator 206 makes twelve comparisons between the queries for each of the twelve conversions of each query. In alternative embodiments, there may be more or fewer conversion forms that are compared by thecomparator 206. - In one embodiment, a user query may be compared with a candidate set of queries to determine which of the candidate set is most similar to the user query. The candidate set may be made up of search keywords which are compared with the user query to determine which search keyword is most similar. The candidate set of queries or keywords for comparison may be chosen based on an initial analysis of the user query compared with the
search log database 112. In one embodiment, when the user query is received the candidate set is identified and each member of the candidate set is compared with the user query to determine which is most similar. As described below, a similarity score may be calculated for each member of the candidate set that represents a similarity with the user query. The member of the candidate set with the closest similarity score may be most similar to the user query. In an alternative embodiment, the candidate set may include one query or include all queries, such as those stored in thesearch log database 112. -
FIG. 4 illustrates exemplary comparisons of queries. In particular, thecomparator 206 may utilize comparison features 402 when comparing queries. The comparison features 402 shown inFIG. 4 are merely exemplary. In alternate embodiments, there may be additional comparison features 402 that are not illustrated or described. The comparison may involve comparing various forms of converted Chinese related queries. In particular, thecomparator 206 may compare an array of elements that is generated by theconverter 204 as a converted form of a Chinese related query. In one embodiment, thecomparator 402 may compare queries as described in the commonly owned U.S. application entitled, “SYSTEM AND METHOD FOR IDENTIFYING RELATED QUERIES FOR LANGUAGES WITH MULTIPLE WRITING SYSTEMS,” U.S. Pat. Pub. No. 2007/0203894, filed Feb. 28, 2006, the disclosure of which is hereby incorporated by reference. - A first comparison feature may be an
edit distance 404 between two queries. The edit distance may be a measure of the difference between two character strings, such as queries. In one embodiment, the edit distance may be a minimum number of edit operations required to transform the first query into the second query. The edit operation may include inserting or deleting a character into a string or replacing a character by another character. In an alternative embodiment, weights may be assigned for different edit operations. For example, a higher weight may be placed on replacing the character s by the character p, than on replacing it by the character a. The edit distance may be the Levenshtein distance or the Damerau-Levenshtein distance when a transposition of characters counts as a single edit operation. In alternative embodiments, there may be other algorithms that are used for determining the edit distance between queries or there may be more or fewer edit operations that are used in determining an edit distance between queries. - A second comparison feature may be an edit distance without a
domain 406. In particular, two queries may have their domains removed before computing the edit distance. The domain may be a web domain, such as “.com” that is removed. The removal of the domain may be helpful because a user querying “yahoo.com” and “yahoo.net” is likely making the same query. A third comparison feature may be a characterlevel prefix overlap 408. The characterlevel prefix overlap 408 may be a measure of the characters/words that are the same at the beginning of the queries. For example, “auto cleaners” and “auto cleaning” have a prefix overlap of “auto clean.” The prefix overlap may indicate increased similarity. A fourth comparison feature may be a characterlevel suffix overlap 410. The character level suffix overlap 410 measures the similarity between queries at the end of the query. For example, “auto insurance agent” and “home insurance agent” share a suffix overlap of “insurance agent.” Similar, to the prefix overlap, the suffix overlap may indicate increased similarity. - A fifth comparison feature may be a
minimum edit distance 412 over all the conversion forms. Likewise, a sixth comparison feature may be amaximum edit distance 414 over all the conversion forms. Given twelve conversion forms and twelve edit distances for each conversion, theminimum edit distance 412 and themaximum edit distance 414 may be identified. In one embodiment, the minimum and maximum may be removed as outliers. Alternatively, the minimum or maximum may be weighted higher when computing a similarity score. A seventh comparison feature may be a minimum edit distance without adomain 416 and an eighth comparison feature may be a maximum edit distance without adomain 418. As discussed above, the domains in a query may not be valuable in terms of determining what the user is searching for, so the domains are removed before comparison. - Additional comparison features may be a word
level edit distance 420, a wordlevel prefix overlap 422, or a wordlevel suffix overlap 424. The word level comparisons are similar to the character level comparisons, except entire words are compared rather than individual characters. Alength difference 426 between two queries may also be used for comparing. - The
comparator 206 may be coupled with acalculator 208 that may calculate a similarity score. The similarity score may be a measure of the similarity between the queries. The similarity score may be calculated based on individual comparisons of different conversion forms of two queries with each individual comparison being assigned a weighted value. The multiple conversion forms described with respect toFIG. 3 may each result in a separate comparison between two queries. Accordingly, using the twelveconversion forms 302, there may be twelve different edit distances or similarity scores, one for each conversion. Those twelve converted forms may be compared and multiplied by a weight for each form to get an overall similarity score between the queries. Alternatively, a subset of the twelve conversion forms or additional conversion forms not described may be utilized to convert Chinese related queries into different forms for comparison. - In one embodiment, the equation presented in Table A may be used to calculate a similarity score indicating the strength of similarity between a query pair. The query pair may include a given query q and a comparison query MODS(q), either of which may be written according to one or more Chinese writing systems. MODS(q) may represent a converted query. In alternative embodiments, both q and MODS(q) may be converted to the same form for comparison, or MODS(q) is converted into a form for comparison with q. MODS(q) may represent a related query that is identified as a potential substitute for the user query q. When MODS(q) has good similarity score with the user original query q, MODS(q) may be used as a search keyword for fetching advertisements. MODS(q) may also be referred to as a rewritten query. Both user original query q and MODS(q) may be converted to the same form for comparison. The equation in Table A makes use of a subset of the conversion forms 302 and the comparison features 402 that are discussed above. In alternative embodiments, different conversion forms or comparison features may be utilized to generate a similarity score. Those of skill in the art recognize that the equation illustrated in Table A is merely exemplary and may be modified so as to provide for the calculation of a similarity score for multiple writing systems. A formula may be optimized based on the source of the query, because queries from Taiwan may be different from queries from Hong Kong. Accordingly, the conversions, comparisons, and weights may be modified for different types of queries.
TABLE A - According to the equation presented in Table A, q represents a given query written according to one or more Chinese writing systems and MODS(q) represents a query selected from a candidate set of potential queries related to query Q. Alternatively, query q may be referred to as query q1 and MODS(q) may be referred to as query q2 or q′. The initial number before each feature is a weight that may be used to emphasize or deemphasize features. The exemplary features utilized in the equation presented in Table A are described below.
- Pq12min may be a function for calculating the query substitution probability of query q1 following query q2 in a log of user query sessions, such as from the
search log database 112. Thesearch log database 112 may identify the order of the one or more queries submitted by the user, for example, to provide an indication of how the user refined a query, how the user rewrote a query, how the user utilized one or more alternate writing systems of a language with multiple writings systems to express a query Q, etc. When queries q1 and q2 follow one another in asearch log database 112, it may be an indication that they are similar because q2 may be a refinement of q1. According to one embodiment, the pq12min function calculates a query substitution probability of a given query q1 following a given query q2, and may also be used to calculate a unit substitution of a unit u following a given unit u′. In one embodiment, pq12min=prob(U_i−>U_i′|U_i)/max_j prob(U_i−>U_j|U_i), where U_i is q1 or its units, U_i′ is possible U_i substitutions, and U_j is q2 or its units. For query suggestions, pq12min may be the normalized probability of q2 as q1's substitution. In one embodiment, a normalized probability is computed of the units in q1 substituted by corresponding units in q2, and take their minimum as pq12min. - Levroman is a comparison using the roman characters of a query, such as with
conversion form 322, which removes Chinese characters. For each query all non-roman characters may be removed, but spaces are left in the query. The roman character parts are changed into arrays. Each roman character is an element in the array, including any spaces. The Levenshtein distance is measured between the two arrays. In the case that neither q1 nor q2 has roman characters, levroman is set to 0. In the case that one of q1 or q2 has roman characters but the other does not have roman characters, levroman is set to 1. As an example, consider a first query q1= map” and a second query q2= map.” The first query does not include a space before map, but the second query includes a space before map. After the Chinese characters are removed, the queries are converted into arrays, in which q1 is represented as the array:
and q2 is represented as the array:
The Levenshtein distance between the two arrays is one because of the space in the first element of q2. Accordingly because there are four elements, the Levenshtein distance may be represented as ¼=0.25 and levroman is 0.25 for this query pair. - Agreechar may relate to character agreement without removing a space regardless of the order of characters. Agreechar may be similar to wordr discussed below, except it is for the character level rather than the word level. In one embodiment, agreechar is the proportion of unique characters in common between a query pair, such as:
in which Cq1 is the set of unique characters (including space) in q1, and Cq2 is the set of unique characters (including space) in q2. In the levroman example, q1 and q2 have 7 unique characters in total, which are “m”, “a”, “p” and a space. Query q1 and q2 share 5 unique characters, which are “m”, “a” and “p”. Therefore, agreechar is 0.714 (calculated by 5/7) for this query pair. - Wordr is similar to agreechar except is matches words rather than characters. The queries are separated into words, segments, or units as described above. The percentage of unique words not in common is determined for wordr. In other words, wordr=1−proportion of unique words in common, such as
in which wq1 is the set of unique words in q1, and wq2 is the set of unique words (including space) in q2. In the previous example of levroman, map” is segmented into two words and “map” and map” is segmented into two words and “map”. There are three unique words and one of them is common between q1 and q2, so wordr is 1−⅓=0.666. - Dlevpynchar utilizes the complete pinyin without
tone 312 conversion form. The first query q1 and second query q2 first have a common domain removed and each roman character (including spaces) are kept, while each Chinese character is converted into pinyin without tone. The queries are then transformed into arrays. Each roman character is an element in the array and each Chinese character's pinyin without tone is an element in the array. The Levenshtein distance is then measured. In the example described above, when query q1 map” and query q2 map” where there is no space in query q1, but there is a space in query q2. The first query q1 is converted into an array:
The second query q2 is converted into an array:
The Levenshtein distance is computed between the two arrays to be ⅙=0.167, which may also be the dlevpynchar value for this query pair. - Q1bidded is 1 if q1 is bidded and q1bidded is 0 if q1 is not bidded. When q1 is a user query and q1 is bidded, it may mean that an advertiser chooses q1 as a keyword for the advertisements they want to show. This bidding process may also identify a cost they would like to pay if web searchers click the ads fetched by the keyword. When q1 is not bidded that may mean there are no matched keywords in the advertisement database. Therefore, a query identifying system may identify a related query (e.g. MODS(q)) to substitute for the user query.
- Q2hasroman is 1 if q2 contains any roman characters, but not including any spaces. Q2hasroman is 0 if q2 does not contain any roman characters. The queries that are analyzed may be from Chinese search engine or in a search engine that receives Chinese related queries. A Chinese search engine may receive queries with roman characters due to the usage of roman characters in Chinese and the popularity of roman character based languages such as English. The Chinese characters and roman characters maybe processed differently. For example, a Chinese character may be converted into Pinyin for a similarity comparison, while Roman characters are not converted into Pinyin. Accordingly, a similarity score computation may be adjusted based on the presence of Roman characters.
- Pq21max may be a function for calculating the query substitution probability of query q1 following query q2 in a log of user query sessions, such as from the
search log database 112. In one embodiment, pq21max=prob(U_i−>U_i′|U_i′)/max_j prob(U_i−>U_j|U_j), where U_i is q1 or its units, U_i′ is possible U_i substitutions, and U_j is q2 or its units. The normalized probability may be calculated according above equation for each unit pair in the query pair and the maximum is used as pq21max. - Levtaiwanchar utilizes the removal of
roman characters 324 conversion. In particular, all non-Chinese characters are removed and the remaining Chinese character parts are put into an array where each Chinese character is an element in the array. The Levenshtein distance is measured between the two arrays. When neither query q1 nor query q2 includes Chinese characters, levtaiwanchar is 0. When only one of q1 or q2 has Chinese characters levtaiwanchar is 1. In the example described above, when query q1= map” and query q2= where there is no space in query q1, but there is a space in query q2. The first query q1 is converted into an array: - Accordingly, the Levenshtein distance is computed between the two arrays, which is ⅓=0.333 and levtaiwanchar is 0.333 for this query pair.
- Lengthdiffn is the length difference in characters between q1 and q2, which is normalized by their maximum length in characters. In one embodiment, lengthdiffn is:
- Entropy21min is an uncertainty that may be associated with a similarity between q1 and q2. For a whole query substitution,
where i is the number of possible q1 query substitutions with q2. For unit substitution,
where j is the number of unit substitution between q1 and q2, and i is the number of possible q1 j's unit substitutions. - LenthsubtminGT3 utilizes a substitution of characters. For query suggestions, lengthsubstminGT3 is 1 if the minimum length of q1 and q2 is less than 3 in characters. Otherwise, lengthsubstminGT3 is 0. For unit suggestions, lengthsubstminGT3 is 1 if the minimum length of any of the substitution units in characters is greater than 3. Otherwise, lengthsubstminGT3 is 0. Query suggestion may refer to a generation of related queries based on an original user query. The user query may be broken into units as described above. A related unit may be found for each unit and combined to form a related query. For example, when a user enters a query for “New York hotel,” it may be split into two units “New York” and “hotel.” “New York” may be rewritten to a related query “Manhattan” and “hotel” may be rewritten to “motel.” Accordingly, “Manhattan motel” may be a related candidate query for an original user query of “New York hotel.”
- As described, the equation in Table A and the corresponding features that are used to calculate a similarity score in the
calculator 208 are exemplary. Alternatively, a different equation, different weights and different features may be utilized to compute a similarity score. For example, the edit distance may be computed for each of the comparison forms 302 and averaged to become the similarity score. Alternatively, weights may be added to each converted form, or additional comparison features 402 may be used. - In one embodiment, the equation that is used to determine the similarity score, such as the equation in Table A, is analyzed by comparing with a human or editorial control set. The editorial control set may include a human review of the similarity scores for pairs of queries to determine an accuracy of the equation used for calculating the similarity score. In one embodiment, the human review may be used to optimize the equation that calculates the similarity score. Human editors may label query pairs with a relevance score. The relevance score may be used as a training label for the similarity score calculation, such as for the weights used in the equation in Table A. The editorial score may be a response variable and/or a dependent variable. The model may be fitted using linear regression.
-
FIG. 5 is an illustration for identifying related queries. Inblock 502, a user query is received. The user query may be Chinese-related and include at least one Chinese character. The user query may be received by asearch engine 102. The user query may be compared with a selected candidate set of queries or search keywords inblock 504. The candidate set may be selected form thesearch log database 112. In one embodiment, the candidate set may be chosen based on an initial comparison of similarity with the user query. The user query and/or the candidate set of queries may be converted into a different form or format for comparison, such as the conversion forms 302. The user query and a member of the candidate set are compared inblock 508. Inblock 510, a similarity score is calculated to measure a similarity between the user query and the member of the candidate set. The similarity score may be based on utilizing any of the comparison features 402 for comparing a converted form of the user query with a converted form of the member. Inblock 512, another comparison atblock 508 occurs for another member from the candidate set and continues until all members of the candidate set have been compared and have a similarity score. Inblock 514, the similarity scores between the candidate set may be reviewed to identify the member of the candidate set with the closest similarity score to the user query. The identification of a similar member, such as a similar search keyword, may be used to identify which advertisements to display for sponsored searching. - Referring to
FIG. 6 , an illustrative embodiment of a general computer system is shown and is designated 600. Theuser device 106,ad server 103, thesearch engine 102, thesearch log database 112, thedata source 113, theunit dictionary 116, and/or thelanguage analyzer 104 may be a computer or computing devices, such as thecomputer system 600 or any of its components. Thecomputer system 600 can include a set of instructions that can be executed to cause thecomputer system 600 to perform any one or more of the methods or computer based functions disclosed herein. Thecomputer system 600 may operate as a standalone device or may be connected, e.g., using a network, to other computer systems or peripheral devices. - In a networked deployment, the computer system may operate in the capacity of a server or as a client user computer in a server-client user network environment, or as a peer computer system in a peer-to-peer (or distributed) network environment. The
computer system 600 can also be implemented as or incorporated into various devices, such as a personal computer (PC), a tablet PC, a set-top box (STB), a personal digital assistant (PDA), a mobile device, a palmtop computer, a laptop computer, a desktop computer, a communications device, a wireless telephone, a land-line telephone, a control system, a camera, a scanner, a facsimile machine, a printer, a pager, a personal trusted device, a web appliance, a network router, switch or bridge, or any other machine capable of executing a set of instructions (sequential or otherwise) that specify actions to be taken by that machine. In a particular embodiment, thecomputer system 600 can be implemented using electronic devices that provide voice, video or data communication. Further, while asingle computer system 600 is illustrated, the term “system” shall also be taken to include any collection of systems or sub-systems that individually or jointly execute a set, or multiple sets, of instructions to perform one or more computer functions. - As illustrated in
FIG. 6 , thecomputer system 600 may include aprocessor 602, e.g., a central processing unit (CPU), a graphics processing unit (GPU), or both. Theprocessor 602 may be a component in a variety of systems. For example, theprocessor 602 may be part of a standard personal computer or a workstation. Theprocessor 602 may be one or more general processors, digital signal processors, application specific integrated circuits, field programmable gate arrays, servers, networks, digital circuits, analog circuits, combinations thereof, or other now known or later developed devices for analyzing and processing data. Theprocessor 602 may implement a software program, such as code generated manually (i.e., programmed). - The
computer system 600 may include amemory 604 that can communicate via abus 608. Thememory 604 may be a main memory, a static memory, or a dynamic memory. Thememory 604 may include, but is not limited to computer readable storage media such as various types of volatile and non-volatile storage media, including but not limited to random access memory, read-only memory, programmable read-only memory, electrically programmable read-only memory, electrically erasable read-only memory, flash memory, magnetic tape or disk, optical media and the like. In one embodiment, thememory 604 includes a cache or random access memory for theprocessor 602. In alternative embodiments, thememory 604 is separate from theprocessor 602, such as a cache memory of a processor, the system memory, or other memory. Thememory 604 may be an external storage device or database for storing data. Examples include a hard drive, compact disc (“CD”), digital video disc (“DVD”), memory card, memory stick, floppy disc, universal serial bus (“USB”) memory device, or any other device operative to store data. Thememory 604 is operable to store instructions executable by theprocessor 602. The functions, acts or tasks illustrated in the figures or described herein may be performed by the programmedprocessor 602 executing the instructions stored in thememory 604. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firm-ware, micro-code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing and the like. - As shown, the
computer system 600 may further include a display unit 614, such as a liquid crystal display (LCD), an organic light emitting diode (OLED), a flat panel display, a solid state display, a cathode ray tube (CRT), a projector, a printer or other now known or later developed display device for outputting determined information. The display 614 may act as an interface for the user to see the functioning of theprocessor 602, or specifically as an interface with the software stored in thememory 604 or in the drive unit 606. - Additionally, the
computer system 600 may include aninput device 616 configured to allow a user to interact with any of the components ofsystem 600. Theinput device 616 may be a number pad, a keyboard, or a cursor control device, such as a mouse, or a joystick, touch screen display, remote control or any other device operative to interact with thesystem 600. - In a particular embodiment, as depicted in
FIG. 6 , thecomputer system 600 may also include a disk or optical drive unit 606. The disk drive unit 606 may include a computer-readable medium 610 in which one or more sets ofinstructions 612, e.g. software, can be embedded. Further, theinstructions 612 may embody one or more of the methods or logic as described herein. In a particular embodiment, theinstructions 612 may reside completely, or at least partially, within thememory 604 and/or within theprocessor 602 during execution by thecomputer system 600. Thememory 604 and theprocessor 602 also may include computer-readable media as discussed above. - The present disclosure contemplates a computer-readable medium that includes
instructions 612 or receives and executesinstructions 612 responsive to a propagated signal, so that a device connected to anetwork 620 can communicate voice, video, audio, images or any other data over thenetwork 620. Further, theinstructions 612 may be transmitted or received over thenetwork 620 via a communication port 618. The communication port 618 may be a part of theprocessor 602 or may be a separate component. The communication port 618 may be created in software or may be a physical connection in hardware. The communication port 618 is configured to connect with anetwork 620, external media, the display 614, or any other components insystem 600, or combinations thereof. The connection with thenetwork 620 may be a physical connection, such as a wired Ethernet connection or may be established wirelessly as discussed below. Likewise, the additional connections with other components of thesystem 600 may be physical connections or may be established wirelessly. - The
network 620 may include wired networks, wireless networks, or combinations thereof. The wireless network may be a cellular telephone network, an 802.11, 802.16, 802.20, or WiMax network. Further, thenetwork 620 may be a public network, such as the Internet, a private network, such as an intranet, or combinations thereof, and may utilize a variety of networking protocols now available or later developed including, but not limited to TCP/IP based networking protocols. - While the computer-readable medium is shown to be a single medium, the term “computer-readable medium” includes a single medium or multiple media, such as a centralized or distributed database, and/or associated caches and servers that store one or more sets of instructions. The term “computer-readable medium” shall also include any medium that is capable of storing, encoding or carrying a set of instructions for execution by a processor or that cause a computer system to perform any one or more of the methods or operations disclosed herein.
- In a particular non-limiting, exemplary embodiment, the computer-readable medium can include a solid-state memory such as a memory card or other package that houses one or more non-volatile read-only memories. Further, the computer-readable medium can be a random access memory or other volatile re-writable memory. Additionally, the computer-readable medium can include a magneto-optical or optical medium, such as a disk or tapes or other storage device to capture carrier wave signals such as a signal communicated over a transmission medium. A digital file attachment to an e-mail or other self-contained information archive or set of archives may be considered a distribution medium that is a tangible storage medium. Accordingly, the disclosure is considered to include any one or more of a computer-readable medium or a distribution medium and other equivalents and successor media, in which data or instructions may be stored.
- In an alternative embodiment, dedicated hardware implementations, such as application specific integrated circuits, programmable logic arrays and other hardware devices, can be constructed to implement one or more of the methods described herein. Applications that may include the apparatus and systems of various embodiments can broadly include a variety of electronic and computer systems. One or more embodiments described herein may implement functions using two or more specific interconnected hardware modules or devices with related control and data signals that can be communicated between and through the modules, or as portions of an application-specific integrated circuit. Accordingly, the present system encompasses software, firmware, and hardware implementations.
- In accordance with various embodiments of the present disclosure, the methods described herein may be implemented by software programs executable by a computer system. Further, in an exemplary, non-limited embodiment, implementations can include distributed processing, component/object distributed processing, and parallel processing. Alternatively, virtual computer system processing can be constructed to implement one or more of the methods or functionality as described herein.
- Although the present specification describes components and functions that may be implemented in particular embodiments with reference to particular standards and protocols, the invention is not limited to such standards and protocols. For example, standards for Internet and other packet switched network transmission (e.g., TCP/IP, UDP/IP, HTML, HTTP) represent examples of the state of the art. Such standards are periodically superseded by faster or more efficient equivalents having essentially the same functions. Accordingly, replacement standards and protocols having the same or similar functions as those disclosed herein are considered equivalents thereof.
- The illustrations of the embodiments described herein are intended to provide a general understanding of the structure of the various embodiments. The illustrations are not intended to serve as a complete description of all of the elements and features of apparatus and systems that utilize the structures or methods described herein. Many other embodiments may be apparent to those of skill in the art upon reviewing the disclosure. Other embodiments may be utilized and derived from the disclosure, such that structural and logical substitutions and changes may be made without departing from the scope of the disclosure. Additionally, the illustrations are merely representational and may not be drawn to scale. Certain proportions within the illustrations may be exaggerated, while other proportions may be minimized. Accordingly, the disclosure and the figures are to be regarded as illustrative rather than restrictive.
- One or more embodiments of the disclosure may be referred to herein, individually and/or collectively, by the term “invention” merely for convenience and without intending to voluntarily limit the scope of this application to any particular invention or inventive concept. Moreover, although specific embodiments have been illustrated and described herein, it should be appreciated that any subsequent arrangement designed to achieve the same or similar purpose may be substituted for the specific embodiments shown. This disclosure is intended to cover any and all subsequent adaptations or variations of various embodiments. Combinations of the above embodiments, and other embodiments not specifically described herein, will be apparent to those of skill in the art upon reviewing the description.
- The Abstract of the Disclosure is provided to comply with 37 C.F.R. § 1.72(b) and is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, various features may be grouped together or described in a single embodiment for the purpose of streamlining the disclosure. This disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter may be directed to less than all of the features of any of the disclosed embodiments. Thus, the following claims are incorporated into the Detailed Description, with each claim standing on its own as defining separately claimed subject matter.
- The above disclosed subject matter is to be considered illustrative, and not restrictive, and the appended claims are intended to cover all such modifications, enhancements, and other embodiments, which fall within the true spirit and scope of the present invention. Thus, to the maximum extent allowed by law, the scope of the present invention is to be determined by the broadest permissible interpretation of the following claims and their equivalents, and shall not be restricted or limited by the foregoing detailed description. While various embodiments of the invention have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible within the scope of the invention. Accordingly, the invention is not to be restricted except in light of the attached claims and their equivalents.
Claims (23)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/948,374 US20080077588A1 (en) | 2006-02-28 | 2007-11-30 | Identifying and measuring related queries |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/365,315 US7689554B2 (en) | 2006-02-28 | 2006-02-28 | System and method for identifying related queries for languages with multiple writing systems |
US11/948,374 US20080077588A1 (en) | 2006-02-28 | 2007-11-30 | Identifying and measuring related queries |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/365,315 Continuation-In-Part US7689554B2 (en) | 2006-02-28 | 2006-02-28 | System and method for identifying related queries for languages with multiple writing systems |
Publications (1)
Publication Number | Publication Date |
---|---|
US20080077588A1 true US20080077588A1 (en) | 2008-03-27 |
Family
ID=38445252
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/365,315 Expired - Fee Related US7689554B2 (en) | 2006-02-28 | 2006-02-28 | System and method for identifying related queries for languages with multiple writing systems |
US11/948,374 Abandoned US20080077588A1 (en) | 2006-02-28 | 2007-11-30 | Identifying and measuring related queries |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/365,315 Expired - Fee Related US7689554B2 (en) | 2006-02-28 | 2006-02-28 | System and method for identifying related queries for languages with multiple writing systems |
Country Status (7)
Country | Link |
---|---|
US (2) | US7689554B2 (en) |
EP (2) | EP3301591A1 (en) |
JP (1) | JP2009528636A (en) |
KR (1) | KR101098703B1 (en) |
CN (2) | CN102750323B (en) |
HK (2) | HK1130912A1 (en) |
WO (1) | WO2007101194A2 (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080312911A1 (en) * | 2007-06-14 | 2008-12-18 | Po Zhang | Dictionary word and phrase determination |
US20090248634A1 (en) * | 2008-03-31 | 2009-10-01 | International Business Machines Corporation | Method and system for a metadata driven query |
US20090299974A1 (en) * | 2008-05-29 | 2009-12-03 | Fujitsu Limited | Character sequence map generating apparatus, information searching apparatus, character sequence map generating method, information searching method, and computer product |
US20100106704A1 (en) * | 2008-10-29 | 2010-04-29 | Yahoo! Inc. | Cross-lingual query classification |
US20100217781A1 (en) * | 2008-12-30 | 2010-08-26 | Thales | Optimized method and system for managing proper names to optimize the management and interrogation of databases |
US20110295897A1 (en) * | 2010-06-01 | 2011-12-01 | Microsoft Corporation | Query correction probability based on query-correction pairs |
US20120047151A1 (en) * | 2010-08-19 | 2012-02-23 | Yahoo! Inc. | Method and system for providing contents based on past queries |
WO2012074704A2 (en) * | 2010-11-29 | 2012-06-07 | Microsoft Corporation | Display of search ads in local language |
WO2012148427A1 (en) * | 2011-04-29 | 2012-11-01 | Hewlett-Packard Development Company, L.P. | Systems and methods for in-memory processing of events |
US20130054225A1 (en) * | 2010-06-23 | 2013-02-28 | Business Objects Software Limited | Searching and matching of data |
US8417718B1 (en) * | 2011-07-11 | 2013-04-09 | Google Inc. | Generating word completions based on shared suffix analysis |
US20130090916A1 (en) * | 2011-10-05 | 2013-04-11 | Daniel M. Wang | System and Method for Detecting and Correcting Mismatched Chinese Character |
US20150213142A1 (en) * | 2007-12-03 | 2015-07-30 | Ebay Inc. | Live search chat room |
US20190295012A1 (en) * | 2018-03-23 | 2019-09-26 | International Business Machines Corporation | Predicting employee performance metrics |
US20210240751A1 (en) * | 2018-12-26 | 2021-08-05 | Paypal, Inc. | Machine learning approach to cross-language translation and search |
US11170183B2 (en) * | 2018-09-17 | 2021-11-09 | International Business Machines Corporation | Language entity identification |
US20230185857A1 (en) * | 2015-12-08 | 2023-06-15 | Yahoo Assets Llc | Method and system for providing context based query suggestions |
RU2813239C1 (en) * | 2022-12-21 | 2024-02-08 | Акционерное общество "Лаборатория Касперского" | Method for filtering events for transmission to remote device |
Families Citing this family (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7821503B2 (en) | 2003-04-09 | 2010-10-26 | Tegic Communications, Inc. | Touch screen and graphical user interface |
US7750891B2 (en) | 2003-04-09 | 2010-07-06 | Tegic Communications, Inc. | Selective input system based on tracking of motion parameters of an input device |
US7286115B2 (en) | 2000-05-26 | 2007-10-23 | Tegic Communications, Inc. | Directional input system with automatic correction |
US7030863B2 (en) * | 2000-05-26 | 2006-04-18 | America Online, Incorporated | Virtual keyboard system with automatic correction |
US7689554B2 (en) * | 2006-02-28 | 2010-03-30 | Yahoo! Inc. | System and method for identifying related queries for languages with multiple writing systems |
US8762358B2 (en) * | 2006-04-19 | 2014-06-24 | Google Inc. | Query language determination using query terms and interface language |
US8442965B2 (en) | 2006-04-19 | 2013-05-14 | Google Inc. | Query language identification |
US7689548B2 (en) * | 2006-09-22 | 2010-03-30 | Microsoft Corporation | Recommending keywords based on bidding patterns |
US7925498B1 (en) | 2006-12-29 | 2011-04-12 | Google Inc. | Identifying a synonym with N-gram agreement for a query phrase |
US8201087B2 (en) * | 2007-02-01 | 2012-06-12 | Tegic Communications, Inc. | Spell-check for a keyboard system with automatic correction |
US8225203B2 (en) | 2007-02-01 | 2012-07-17 | Nuance Communications, Inc. | Spell-check for a keyboard system with automatic correction |
US20080250008A1 (en) * | 2007-04-04 | 2008-10-09 | Microsoft Corporation | Query Specialization |
US8290921B2 (en) * | 2007-06-28 | 2012-10-16 | Microsoft Corporation | Identification of similar queries based on overall and partial similarity of time series |
US8090709B2 (en) * | 2007-06-28 | 2012-01-03 | Microsoft Corporation | Representing queries and determining similarity based on an ARIMA model |
US7831588B2 (en) * | 2008-02-05 | 2010-11-09 | Yahoo! Inc. | Context-sensitive query expansion |
US8171021B2 (en) * | 2008-06-23 | 2012-05-01 | Google Inc. | Query identification and association |
US8745051B2 (en) * | 2008-07-03 | 2014-06-03 | Google Inc. | Resource locator suggestions from input character sequence |
US9053197B2 (en) * | 2008-11-26 | 2015-06-09 | Red Hat, Inc. | Suggesting websites |
CN101464897A (en) | 2009-01-12 | 2009-06-24 | 阿里巴巴集团控股有限公司 | Word matching and information query method and device |
EP2328366A1 (en) * | 2009-11-20 | 2011-06-01 | Alcatel Lucent | Method and system for conducting surveys |
US20110153423A1 (en) * | 2010-06-21 | 2011-06-23 | Jon Elvekrog | Method and system for creating user based summaries for content distribution |
US20110153414A1 (en) * | 2009-12-23 | 2011-06-23 | Jon Elvekrog | Method and system for dynamic advertising based on user actions |
US8751305B2 (en) * | 2010-05-24 | 2014-06-10 | 140 Proof, Inc. | Targeting users based on persona data |
CN102567408B (en) | 2010-12-31 | 2014-06-04 | 阿里巴巴集团控股有限公司 | Method and device for recommending search keyword |
KR101461062B1 (en) * | 2011-10-24 | 2014-11-17 | 네이버 주식회사 | System and method for recommendding japanese language automatically using tranformatiom of romaji |
US8756241B1 (en) * | 2012-08-06 | 2014-06-17 | Google Inc. | Determining rewrite similarity scores |
US9971837B2 (en) * | 2013-12-16 | 2018-05-15 | Excalibur Ip, Llc | Contextual based search suggestion |
US9690860B2 (en) | 2014-06-30 | 2017-06-27 | Yahoo! Inc. | Recommended query formulation |
CN104572836A (en) * | 2014-12-10 | 2015-04-29 | 百度在线网络技术(北京)有限公司 | Method and device for confirming comprehensive relevancy of candidate inquiry sequence |
US10169414B2 (en) | 2016-04-26 | 2019-01-01 | International Business Machines Corporation | Character matching in text processing |
CN110162593B (en) * | 2018-11-29 | 2023-03-21 | 腾讯科技(深圳)有限公司 | Search result processing and similarity model training method and device |
US11194850B2 (en) * | 2018-12-14 | 2021-12-07 | Business Objects Software Ltd. | Natural language query system |
CN110008237B (en) * | 2019-01-14 | 2023-05-02 | 创新先进技术有限公司 | Similar query recognition method and device |
CN111629020A (en) * | 2019-12-03 | 2020-09-04 | 蘑菇车联信息科技有限公司 | Remote input method, device, PC (personal computer) terminal, android device and system |
Citations (40)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020052871A1 (en) * | 2000-11-02 | 2002-05-02 | Simpleact Incorporated | Chinese natural language query system and method |
US20020165717A1 (en) * | 2001-04-06 | 2002-11-07 | Solmer Robert P. | Efficient method for information extraction |
US20030069880A1 (en) * | 2001-09-24 | 2003-04-10 | Ask Jeeves, Inc. | Natural language query processing |
US20030144994A1 (en) * | 2001-10-12 | 2003-07-31 | Ji-Rong Wen | Clustering web queries |
US20040249801A1 (en) * | 2003-04-04 | 2004-12-09 | Yahoo! | Universal search interface systems and methods |
US20050038802A1 (en) * | 2000-12-21 | 2005-02-17 | Eric White | Method and system for platform-independent file system interaction |
US6876997B1 (en) * | 2000-05-22 | 2005-04-05 | Overture Services, Inc. | Method and apparatus for indentifying related searches in a database search system |
US20050080795A1 (en) * | 2003-10-09 | 2005-04-14 | Yahoo! Inc. | Systems and methods for search processing using superunits |
US20050102259A1 (en) * | 2003-11-12 | 2005-05-12 | Yahoo! Inc. | Systems and methods for search query processing using trend analysis |
US20050228780A1 (en) * | 2003-04-04 | 2005-10-13 | Yahoo! Inc. | Search system using search subdomain and hints to subdomains in search query statements and sponsored results on a subdomain-by-subdomain basis |
US6999932B1 (en) * | 2000-10-10 | 2006-02-14 | Intel Corporation | Language independent voice-based search system |
US7051119B2 (en) * | 2001-07-12 | 2006-05-23 | Yahoo! Inc. | Method and system for enabling a script on a first computer to communicate and exchange data with a script on a second computer over a network |
US7051023B2 (en) * | 2003-04-04 | 2006-05-23 | Yahoo! Inc. | Systems and methods for generating concept units from search queries |
US7051014B2 (en) * | 2003-06-18 | 2006-05-23 | Microsoft Corporation | Utilizing information redundancy to improve text searches |
US20060122994A1 (en) * | 2004-12-06 | 2006-06-08 | Yahoo! Inc. | Automatic generation of taxonomies for categorizing queries and search query processing using taxonomies |
US20060122979A1 (en) * | 2004-12-06 | 2006-06-08 | Shyam Kapur | Search processing with automatic categorization of queries |
US20060161520A1 (en) * | 2005-01-14 | 2006-07-20 | Microsoft Corporation | System and method for generating alternative search terms |
US20060167896A1 (en) * | 2004-12-06 | 2006-07-27 | Shyam Kapur | Systems and methods for managing and using multiple concept networks for assisted search processing |
US20060206474A1 (en) * | 2005-03-10 | 2006-09-14 | Yahoo!, Inc. | System for modifying queries before presentation to a sponsored search generator or other matching system where modifications improve coverage without a corresponding reduction in relevance |
US20060206476A1 (en) * | 2005-03-10 | 2006-09-14 | Yahoo!, Inc. | Reranking and increasing the relevance of the results of Internet searches |
US20070020705A1 (en) * | 2003-10-21 | 2007-01-25 | Shigehiko Mizutani | Method for prognostic evaluation of carcinoma using anti-p-lap antibody |
US20070038621A1 (en) * | 2005-08-10 | 2007-02-15 | Tina Weyand | System and method for determining alternate search queries |
US20070038602A1 (en) * | 2005-08-10 | 2007-02-15 | Tina Weyand | Alternative search query processing in a term bidding system |
US20070203894A1 (en) * | 2006-02-28 | 2007-08-30 | Rosie Jones | System and method for identifying related queries for languages with multiple writing systems |
US20070208709A1 (en) * | 2001-10-03 | 2007-09-06 | Malibu Engineering & Software Ltd. | Method and query application tool for searching hierarchical databases |
US20070208697A1 (en) * | 2001-06-18 | 2007-09-06 | Pavitra Subramaniam | System and method to enable searching across multiple databases and files using a single search |
US20070208701A1 (en) * | 2006-03-01 | 2007-09-06 | Microsoft Corporation | Comparative web search |
US20070208713A1 (en) * | 2006-03-01 | 2007-09-06 | Oracle International Corporation | Auto Generation of Suggested Links in a Search System |
US20070208719A1 (en) * | 2004-03-18 | 2007-09-06 | Bao Tran | Systems and methods for analyzing semantic documents over a network |
US20070208704A1 (en) * | 2006-03-06 | 2007-09-06 | Stephen Ives | Packaged mobile search results |
US20070208698A1 (en) * | 2002-06-07 | 2007-09-06 | Dougal Brindley | Avoiding duplicate service requests |
US20070208703A1 (en) * | 2006-03-03 | 2007-09-06 | Microsoft Corporation | Web forum crawler |
US20070208714A1 (en) * | 2006-03-01 | 2007-09-06 | Oracle International Corporation | Method for Suggesting Web Links and Alternate Terms for Matching Search Queries |
US20070208699A1 (en) * | 2004-09-07 | 2007-09-06 | Shigeki Uetabira | Information search provision apparatus and information search provision system |
US20070208720A1 (en) * | 2000-12-12 | 2007-09-06 | Home Box Office, Inc. | Digital asset data type definitions |
US20070208711A1 (en) * | 2005-12-21 | 2007-09-06 | Rhoads Geoffrey B | Rules Driven Pan ID Metadata Routing System and Network |
US20070208706A1 (en) * | 2006-03-06 | 2007-09-06 | Anand Madhavan | Vertical search expansion, disambiguation, and optimization of search queries |
US20070208702A1 (en) * | 2006-03-02 | 2007-09-06 | Morris Robert P | Method and system for delivering published information associated with a tuple using a pub/sub protocol |
US20070208700A1 (en) * | 2005-01-19 | 2007-09-06 | Konica Minolta Holdings, Inc. | Update detecting apparatus |
US20070214118A1 (en) * | 2005-09-27 | 2007-09-13 | Schoen Michael A | Delivery of internet ads |
Family Cites Families (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4833610A (en) * | 1986-12-16 | 1989-05-23 | International Business Machines Corporation | Morphological/phonetic method for ranking word similarities |
JP2809341B2 (en) * | 1994-11-18 | 1998-10-08 | 松下電器産業株式会社 | Information summarizing method, information summarizing device, weighting method, and teletext receiving device. |
US6006221A (en) * | 1995-08-16 | 1999-12-21 | Syracuse University | Multilingual document retrieval system and method using semantic vector matching |
US5778361A (en) * | 1995-09-29 | 1998-07-07 | Microsoft Corporation | Method and system for fast indexing and searching of text in compound-word languages |
ATE243869T1 (en) * | 1998-03-03 | 2003-07-15 | Amazon Com Inc | IDENTIFICATION OF THE MOST RELEVANT ANSWERS TO A CURRENT SEARCH QUERY BASED ON ANSWERS ALREADY SELECTED FOR SIMILAR QUERIES |
US6493709B1 (en) * | 1998-07-31 | 2002-12-10 | The Regents Of The University Of California | Method and apparatus for digitally shredding similar documents within large document sets in a data processing environment |
JP2001337980A (en) * | 2000-05-29 | 2001-12-07 | Sony Corp | Electronic program guide retrieving method and electronic program guide retrieving device |
US8706747B2 (en) * | 2000-07-06 | 2014-04-22 | Google Inc. | Systems and methods for searching using queries written in a different character-set and/or language from the target pages |
JP2003296443A (en) * | 2002-03-29 | 2003-10-17 | Konica Corp | Medical image pick-up device, display control method, and program |
JP2004280259A (en) * | 2003-03-13 | 2004-10-07 | National Institute Of Information & Communication Technology | Search device |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
US20040260681A1 (en) * | 2003-06-19 | 2004-12-23 | Dvorak Joseph L. | Method and system for selectively retrieving text strings |
EP1692626A4 (en) * | 2003-09-17 | 2008-11-19 | Ibm | Identifying related names |
WO2005124599A2 (en) * | 2004-06-12 | 2005-12-29 | Getty Images, Inc. | Content search in complex language, such as japanese |
JP4936650B2 (en) * | 2004-07-26 | 2012-05-23 | ヤフー株式会社 | Similar word search device, method thereof, program thereof, and information search device |
US20060106769A1 (en) * | 2004-11-12 | 2006-05-18 | Gibbs Kevin A | Method and system for autocompletion for languages having ideographs and phonetic characters |
-
2006
- 2006-02-28 US US11/365,315 patent/US7689554B2/en not_active Expired - Fee Related
-
2007
- 2007-02-27 CN CN201210167021.3A patent/CN102750323B/en active Active
- 2007-02-27 EP EP17183610.9A patent/EP3301591A1/en not_active Withdrawn
- 2007-02-27 WO PCT/US2007/062876 patent/WO2007101194A2/en active Application Filing
- 2007-02-27 CN CN200780006965XA patent/CN101390097B/en active Active
- 2007-02-27 KR KR1020087023584A patent/KR101098703B1/en active IP Right Grant
- 2007-02-27 JP JP2008557464A patent/JP2009528636A/en active Pending
- 2007-02-27 EP EP07757547A patent/EP1929415A4/en not_active Ceased
- 2007-11-30 US US11/948,374 patent/US20080077588A1/en not_active Abandoned
-
2009
- 2009-09-18 HK HK09108573.9A patent/HK1130912A1/en not_active IP Right Cessation
-
2013
- 2013-03-27 HK HK13103868.8A patent/HK1176711A1/en not_active IP Right Cessation
Patent Citations (41)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6876997B1 (en) * | 2000-05-22 | 2005-04-05 | Overture Services, Inc. | Method and apparatus for indentifying related searches in a database search system |
US6999932B1 (en) * | 2000-10-10 | 2006-02-14 | Intel Corporation | Language independent voice-based search system |
US20020052871A1 (en) * | 2000-11-02 | 2002-05-02 | Simpleact Incorporated | Chinese natural language query system and method |
US20070208720A1 (en) * | 2000-12-12 | 2007-09-06 | Home Box Office, Inc. | Digital asset data type definitions |
US20050038802A1 (en) * | 2000-12-21 | 2005-02-17 | Eric White | Method and system for platform-independent file system interaction |
US20020165717A1 (en) * | 2001-04-06 | 2002-11-07 | Solmer Robert P. | Efficient method for information extraction |
US20070208697A1 (en) * | 2001-06-18 | 2007-09-06 | Pavitra Subramaniam | System and method to enable searching across multiple databases and files using a single search |
US7051119B2 (en) * | 2001-07-12 | 2006-05-23 | Yahoo! Inc. | Method and system for enabling a script on a first computer to communicate and exchange data with a script on a second computer over a network |
US20030069880A1 (en) * | 2001-09-24 | 2003-04-10 | Ask Jeeves, Inc. | Natural language query processing |
US20070208709A1 (en) * | 2001-10-03 | 2007-09-06 | Malibu Engineering & Software Ltd. | Method and query application tool for searching hierarchical databases |
US20030144994A1 (en) * | 2001-10-12 | 2003-07-31 | Ji-Rong Wen | Clustering web queries |
US20070208698A1 (en) * | 2002-06-07 | 2007-09-06 | Dougal Brindley | Avoiding duplicate service requests |
US20040249801A1 (en) * | 2003-04-04 | 2004-12-09 | Yahoo! | Universal search interface systems and methods |
US7051023B2 (en) * | 2003-04-04 | 2006-05-23 | Yahoo! Inc. | Systems and methods for generating concept units from search queries |
US20050228780A1 (en) * | 2003-04-04 | 2005-10-13 | Yahoo! Inc. | Search system using search subdomain and hints to subdomains in search query statements and sponsored results on a subdomain-by-subdomain basis |
US7051014B2 (en) * | 2003-06-18 | 2006-05-23 | Microsoft Corporation | Utilizing information redundancy to improve text searches |
US20050080795A1 (en) * | 2003-10-09 | 2005-04-14 | Yahoo! Inc. | Systems and methods for search processing using superunits |
US20070020705A1 (en) * | 2003-10-21 | 2007-01-25 | Shigehiko Mizutani | Method for prognostic evaluation of carcinoma using anti-p-lap antibody |
US20050102259A1 (en) * | 2003-11-12 | 2005-05-12 | Yahoo! Inc. | Systems and methods for search query processing using trend analysis |
US7240049B2 (en) * | 2003-11-12 | 2007-07-03 | Yahoo! Inc. | Systems and methods for search query processing using trend analysis |
US20070208719A1 (en) * | 2004-03-18 | 2007-09-06 | Bao Tran | Systems and methods for analyzing semantic documents over a network |
US20070208699A1 (en) * | 2004-09-07 | 2007-09-06 | Shigeki Uetabira | Information search provision apparatus and information search provision system |
US20060122979A1 (en) * | 2004-12-06 | 2006-06-08 | Shyam Kapur | Search processing with automatic categorization of queries |
US20060167896A1 (en) * | 2004-12-06 | 2006-07-27 | Shyam Kapur | Systems and methods for managing and using multiple concept networks for assisted search processing |
US20060122994A1 (en) * | 2004-12-06 | 2006-06-08 | Yahoo! Inc. | Automatic generation of taxonomies for categorizing queries and search query processing using taxonomies |
US20060161520A1 (en) * | 2005-01-14 | 2006-07-20 | Microsoft Corporation | System and method for generating alternative search terms |
US20070208700A1 (en) * | 2005-01-19 | 2007-09-06 | Konica Minolta Holdings, Inc. | Update detecting apparatus |
US20060206474A1 (en) * | 2005-03-10 | 2006-09-14 | Yahoo!, Inc. | System for modifying queries before presentation to a sponsored search generator or other matching system where modifications improve coverage without a corresponding reduction in relevance |
US20060206476A1 (en) * | 2005-03-10 | 2006-09-14 | Yahoo!, Inc. | Reranking and increasing the relevance of the results of Internet searches |
US20070038602A1 (en) * | 2005-08-10 | 2007-02-15 | Tina Weyand | Alternative search query processing in a term bidding system |
US20070038621A1 (en) * | 2005-08-10 | 2007-02-15 | Tina Weyand | System and method for determining alternate search queries |
US20070214118A1 (en) * | 2005-09-27 | 2007-09-13 | Schoen Michael A | Delivery of internet ads |
US20070208711A1 (en) * | 2005-12-21 | 2007-09-06 | Rhoads Geoffrey B | Rules Driven Pan ID Metadata Routing System and Network |
US20070203894A1 (en) * | 2006-02-28 | 2007-08-30 | Rosie Jones | System and method for identifying related queries for languages with multiple writing systems |
US20070208713A1 (en) * | 2006-03-01 | 2007-09-06 | Oracle International Corporation | Auto Generation of Suggested Links in a Search System |
US20070208714A1 (en) * | 2006-03-01 | 2007-09-06 | Oracle International Corporation | Method for Suggesting Web Links and Alternate Terms for Matching Search Queries |
US20070208701A1 (en) * | 2006-03-01 | 2007-09-06 | Microsoft Corporation | Comparative web search |
US20070208702A1 (en) * | 2006-03-02 | 2007-09-06 | Morris Robert P | Method and system for delivering published information associated with a tuple using a pub/sub protocol |
US20070208703A1 (en) * | 2006-03-03 | 2007-09-06 | Microsoft Corporation | Web forum crawler |
US20070208706A1 (en) * | 2006-03-06 | 2007-09-06 | Anand Madhavan | Vertical search expansion, disambiguation, and optimization of search queries |
US20070208704A1 (en) * | 2006-03-06 | 2007-09-06 | Stephen Ives | Packaged mobile search results |
Cited By (31)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110282903A1 (en) * | 2007-06-14 | 2011-11-17 | Google Inc. | Dictionary Word and Phrase Determination |
US8412517B2 (en) * | 2007-06-14 | 2013-04-02 | Google Inc. | Dictionary word and phrase determination |
US20080312911A1 (en) * | 2007-06-14 | 2008-12-18 | Po Zhang | Dictionary word and phrase determination |
US20150213142A1 (en) * | 2007-12-03 | 2015-07-30 | Ebay Inc. | Live search chat room |
US8150838B2 (en) * | 2008-03-31 | 2012-04-03 | International Business Machines Corporation | Method and system for a metadata driven query |
US20090248634A1 (en) * | 2008-03-31 | 2009-10-01 | International Business Machines Corporation | Method and system for a metadata driven query |
US20090299974A1 (en) * | 2008-05-29 | 2009-12-03 | Fujitsu Limited | Character sequence map generating apparatus, information searching apparatus, character sequence map generating method, information searching method, and computer product |
US20100106704A1 (en) * | 2008-10-29 | 2010-04-29 | Yahoo! Inc. | Cross-lingual query classification |
US8117237B2 (en) * | 2008-12-30 | 2012-02-14 | Thales | Optimized method and system for managing proper names to optimize the management and interrogation of databases |
US20100217781A1 (en) * | 2008-12-30 | 2010-08-26 | Thales | Optimized method and system for managing proper names to optimize the management and interrogation of databases |
US20110295897A1 (en) * | 2010-06-01 | 2011-12-01 | Microsoft Corporation | Query correction probability based on query-correction pairs |
US8745077B2 (en) * | 2010-06-23 | 2014-06-03 | Business Objects Software Limited | Searching and matching of data |
US20130054225A1 (en) * | 2010-06-23 | 2013-02-28 | Business Objects Software Limited | Searching and matching of data |
US8442987B2 (en) * | 2010-08-19 | 2013-05-14 | Yahoo! Inc. | Method and system for providing contents based on past queries |
US20120047151A1 (en) * | 2010-08-19 | 2012-02-23 | Yahoo! Inc. | Method and system for providing contents based on past queries |
WO2012074704A2 (en) * | 2010-11-29 | 2012-06-07 | Microsoft Corporation | Display of search ads in local language |
WO2012074704A3 (en) * | 2010-11-29 | 2012-07-19 | Microsoft Corporation | Display of search ads in local language |
CN103502990A (en) * | 2011-04-29 | 2014-01-08 | 惠普发展公司,有限责任合伙企业 | Systems and methods for in-memory processing of events |
US9355148B2 (en) | 2011-04-29 | 2016-05-31 | Hewlett Packard Enterprise Development Lp | Systems and methods for in-memory processing of events |
WO2012148427A1 (en) * | 2011-04-29 | 2012-11-01 | Hewlett-Packard Development Company, L.P. | Systems and methods for in-memory processing of events |
US8417718B1 (en) * | 2011-07-11 | 2013-04-09 | Google Inc. | Generating word completions based on shared suffix analysis |
US8886662B1 (en) | 2011-07-11 | 2014-11-11 | Google Inc. | Generating word completions based on shared suffix analysis |
US8725497B2 (en) * | 2011-10-05 | 2014-05-13 | Daniel M. Wang | System and method for detecting and correcting mismatched Chinese character |
US20130090916A1 (en) * | 2011-10-05 | 2013-04-11 | Daniel M. Wang | System and Method for Detecting and Correcting Mismatched Chinese Character |
US20230185857A1 (en) * | 2015-12-08 | 2023-06-15 | Yahoo Assets Llc | Method and system for providing context based query suggestions |
US20190295012A1 (en) * | 2018-03-23 | 2019-09-26 | International Business Machines Corporation | Predicting employee performance metrics |
US10891578B2 (en) * | 2018-03-23 | 2021-01-12 | International Business Machines Corporation | Predicting employee performance metrics |
US11170183B2 (en) * | 2018-09-17 | 2021-11-09 | International Business Machines Corporation | Language entity identification |
US20210240751A1 (en) * | 2018-12-26 | 2021-08-05 | Paypal, Inc. | Machine learning approach to cross-language translation and search |
US11914626B2 (en) * | 2018-12-26 | 2024-02-27 | Paypal, Inc. | Machine learning approach to cross-language translation and search |
RU2813239C1 (en) * | 2022-12-21 | 2024-02-08 | Акционерное общество "Лаборатория Касперского" | Method for filtering events for transmission to remote device |
Also Published As
Publication number | Publication date |
---|---|
CN101390097B (en) | 2012-07-04 |
CN102750323B (en) | 2016-05-11 |
JP2009528636A (en) | 2009-08-06 |
HK1176711A1 (en) | 2013-08-02 |
WO2007101194A2 (en) | 2007-09-07 |
EP3301591A1 (en) | 2018-04-04 |
US7689554B2 (en) | 2010-03-30 |
WO2007101194A3 (en) | 2008-03-13 |
US20070203894A1 (en) | 2007-08-30 |
CN101390097A (en) | 2009-03-18 |
CN102750323A (en) | 2012-10-24 |
KR101098703B1 (en) | 2011-12-23 |
EP1929415A2 (en) | 2008-06-11 |
HK1130912A1 (en) | 2010-01-08 |
EP1929415A4 (en) | 2011-06-15 |
KR20080114764A (en) | 2008-12-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US20080077588A1 (en) | Identifying and measuring related queries | |
US10325033B2 (en) | Determination of content score | |
TWI471737B (en) | System and method for trail identification with search results | |
US8676827B2 (en) | Rare query expansion by web feature matching | |
US8346754B2 (en) | Generating succinct titles for web URLs | |
JP5281405B2 (en) | Selecting high-quality reviews for display | |
US9542476B1 (en) | Refining search queries | |
JP5990178B2 (en) | System and method for keyword extraction | |
US8620745B2 (en) | Selecting advertisements for placement on related web pages | |
US7877389B2 (en) | Segmentation of search topics in query logs | |
US20120323968A1 (en) | Learning Discriminative Projections for Text Similarity Measures | |
US9251206B2 (en) | Generalized edit distance for queries | |
AU2019366858B2 (en) | Method and system for decoding user intent from natural language queries | |
US9798820B1 (en) | Classification of keywords | |
JP5507469B2 (en) | Providing content using stored query information | |
US20090216710A1 (en) | Optimizing query rewrites for keyword-based advertising | |
Chang et al. | Integrating a semantic-based retrieval agent into case-based reasoning systems: A case study of an online bookstore | |
AU2018250372B2 (en) | Method to construct content based on a content repository | |
US20110131093A1 (en) | System and method for optimizing selection of online advertisements | |
US20090327877A1 (en) | System and method for disambiguating text labeling content objects | |
JP5427694B2 (en) | Related content presentation apparatus and program | |
US20090248627A1 (en) | System and method for query substitution for sponsored search | |
CN108319586B (en) | Information extraction rule generation and semantic analysis method and device | |
JP4883644B2 (en) | RECOMMENDATION DEVICE, RECOMMENDATION SYSTEM, RECOMMENDATION DEVICE CONTROL METHOD, AND RECOMMENDATION SYSTEM CONTROL METHOD | |
WO2014169857A1 (en) | Data processing device, data processing method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: YAHOO| INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:ZHANG, WEI VIVIAN;JONES, ROSIE;REY, BENJAMIN;REEL/FRAME:020257/0539;SIGNING DATES FROM 20071127 TO 20071129 |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |
|
AS | Assignment |
Owner name: YAHOO HOLDINGS, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO| INC.;REEL/FRAME:042963/0211 Effective date: 20170613 |
|
AS | Assignment |
Owner name: OATH INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:YAHOO HOLDINGS, INC.;REEL/FRAME:045240/0310 Effective date: 20171231 |