WO2015076662A1 - A system and method for predicting query in a search engine - Google Patents

A system and method for predicting query in a search engine Download PDF

Info

Publication number
WO2015076662A1
WO2015076662A1 PCT/MY2014/000179 MY2014000179W WO2015076662A1 WO 2015076662 A1 WO2015076662 A1 WO 2015076662A1 MY 2014000179 W MY2014000179 W MY 2014000179W WO 2015076662 A1 WO2015076662 A1 WO 2015076662A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
keywords
classifications
historical data
keyword
Prior art date
Application number
PCT/MY2014/000179
Other languages
French (fr)
Inventor
Bin Mat Nor FAZLI
Ysrin Bin Amruddin AMRU
Bin Mohammad Ali MOHAMMAD AZAM
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2015076662A1 publication Critical patent/WO2015076662A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3322Query formulation using system suggestions

Definitions

  • the present invention relates to a system and method for predicting query in a search engine. More particularly, the present invention relates to a system and method for predicting query in a search engine based on a similarity of an input query with historical data of keywords and classifications.
  • a query search is usually processed based on the entered keywords by users. However, sometimes it is also processed by using predicted keywords or queries which are based on similarity, semantics or popularity. These predicted keywords and queries provide a better understanding on the information to the users. For a predicted query, users usually will have to select only one query from the predicted list and a search engine will then process the query before returning a set of results on the query.
  • the present invention relates to a system (100) and method for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications.
  • the system (100) comprises of at least one client server (101); a query server (102); and a database (103) to store prediction rules.
  • the query server (102) further comprises of a query manager (104) for processing queries submitted by the at least one client server (101) by extracting keywords and keywords classifications; a data mining engine (106) for providing query patterns; and a historical data generator (105) for updating historical data set based on the extracted keywords and keywords.
  • the method for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications is characterised by the steps of submitting the input query to a query server (102); extracting keywords from the query; performing query classifications on the extracted keywords; and performing a query executing process and performing an updating process based on the extracted keywords and classifications.
  • the step of performing query classifications is based on the definition of the query by using named entity recognition and lexical database.
  • the query executing process includes the steps of obtaining associated keywords by looking up query prediction rules which are stored in a database (103); executing query based on extracted keywords and associated keywords; and sending results to user.
  • the updating process includes the steps of retrieving keywords and their classifications based on the query; separating each keyword and its classifications; combining keywords and classifications; storing all combinations as mining data; executing mining process based on the updated historical data to generate predictive relationships between the keywords; and storing prediction rules in a database (103).
  • the step of combining keywords and classifications further includes the steps of selecting at least one keyword to be searched based on the query; selecting at least one classification for the at least one keyword; retrieving a list of other keywords based on the query; retrieving a list of related words for the other keywords based on the query; retrieving a list of related words of the at least one classification for the at least one keyword; determining similarity measurement between the list of related words for other keywords and the list of related words of the at least one classification for the at least one keyword using cosine similarity; and selecting combination of classification and other keywords.
  • the list of related words for other keywords and the list of related words of the at least one classification for the at least on keyword are retrieved using a reverse dictionary.
  • FIG. 1 illustrates a block diagram of a system (100) for predicting query in a search engine according to an embodiment of the present invention.
  • FIG. 2 illustrates a flowchart of an overall process for predicting query in a search engine according to an embodiment of the present invention.
  • FIG. 3 illustrates a flowchart of a method to update historical data by a historical data generator (106) of the system (100) of FIG. 1.
  • FIG. 4 illustrates a flowchart of a method to combine the keywords and classifications by the historical data generator (106) of the system (100) of FIG. 1.
  • FIG. 5 illustrates an example of a process of selecting the highest similarity of related keywords and classifications based on the method of FIG. 4.
  • FIG. 1 illustrates a system (100) for predicting query in a search engine according to an embodiment of the present invention.
  • the system (100) comprises of a client server (101), a query server (102) and a database (103).
  • the client server (101) is connected to the query server (102), while the query server (102) is connected to the client server (101) and the database (103).
  • the query server (102) comprises of a query manager (104), a historical data generator (105) and a data mining engine (106).
  • the client server (101) which can either be a server or a client device such as a laptop or a mobile device, is used to submit queries to the query server (102) while the database (103) is used to store all prediction rules.
  • the function of the query manager (104) is to process the queries by extracting keywords and performing query classifications based on their definition by using named entity recognition and lexical database.
  • the historical data generator (105) updates historical data set based on the extracted keywords and keywords classifications, while the data mining engine (106) provides query patterns by looking at user's past experience. This past experience includes the previous query keywords used by the user and the classifications of the keywords based on named entity recognition and lexical database.
  • FIG. 2 illustrates a flowchart of an overall process for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications according to an embodiment of the present invention.
  • a user submits an input query from the client server (101) to the query server (102) as in step 201 , wherein the query server (102) passes the query to the query manager (104).
  • the query manager (104) then processes the query by extracting keywords and performing query classifications based on the definition of the query by using named entity recognition and lexical database as in step 202.
  • the process then splits into two separate processes which are a query executing process (the query executing process is referred herein below as process A) and an updating process (the updating process is referred herein below as process B). These two processes are executed simultaneously.
  • the query manager (104) obtains a plurality of associated keywords by searching from the previously stored query prediction rules in the database (103) as in step 203. For example, if the previous query entered by user as "Najib Tun Razak Prime Minister Malaysia,” the keyword classification result is based on named entity recognition and lexical database, wherein “Najib Tun Razak” is classified as a Person, “Prime Minister” is classified as a Position; and “Malaysia” is classified as a Country.
  • the process continues by executing the query as in step 204 based on the keywords and associated keywords found in steps 203 and 204 respectively. Once the results of the query have been gathered, the results are sent back to the client server (101) as in step 205.
  • process B starts by updating historical data set as in step
  • the mining process is then executed to generate predictive relationships or associations between keywords by the data mining engine (106) as in step 400.
  • the data mining engine (106) uses association rules to predict new keywords based on the prediction rules explained above.
  • FIG. 3 shows a flow chart of the method to update historical data by the historical data generator (105) as in step 300.
  • the historical data generator (105) retrieves the keywords and classifications as in step 310 which are extracted previously from the user's past queries from step 202.
  • the historical data generator (105) separates the current keywords based on their classifications as in step 330 to get the keywords' current classifications. It is similar to the query prediction rules as explained in process A, wherein the keywords classification results are based on named entity recognition and lexical database.
  • the historical data generator (105) then combines each keyword with its relevant classification as in step 350 to combine the user's current query keywords with the previous keywords and classifications from user's past queries.
  • the relevant classification here is based on user's past experience.
  • the historical data generator (105) then stores the combination as historical data in the database (103) as in step 370 for the data mining engine (106) to discover the potential predictive relationships between keywords as in step 400.
  • the step of combining the keywords and classifications has an advantage over the existing search engines as it provides a resource of extracting predictive relationships between words in producing an enhanced keyword search.
  • step 350 in FIG. 3 illustrates a flowchart of the method to combine the keywords and classifications by the historical data generator (106).
  • the historical data generator (105) selects a keyword Kx starting with the first keyword from a set of keywords ⁇ , ., ., ⁇ as in step 351.
  • This set of keywords are generated after the user's current query keywords are separated and classified in step 330, wherein referring to the previous example, after "Najib Tun Razak' is classified as a Person; "Prime Minister” is classified as a Position; and "Malaysia" is classified as a Country, the set of keywords is ⁇ Najib Tun Razak, Prime Minister, Malaysia ⁇ .
  • the historical data generator (105) If the historical data generator (105) has finished selecting all keywords from the set of keywords during this process as in decision 352, the historical data generator (105) returns a collection of HKx i.e. ⁇ HKi ⁇ ⁇ HKn ⁇ as in step 363, wherein HKx is a collection Hm, and wherein Hm is of a combination of keywords and their relevant classifications. On the other hand, if there are more keywords to be selected from the set of keywords, the historical data generator (105) retrieves a list of other keywords, Dm as in decision 352 and step 353, wherein Dm is a list of all keywords except the one chosen in step 351. After retrieving the list of other keywords, Dm, the historical data generator
  • the historical data generator (105) selects a classification Cm of keyword Kx retrieved from step 202 starting with the first classification as in step 355. Referring to the previous example, the set of classification associated to the keyword "Najib Tun Razak' is ⁇ Person ⁇ .
  • keyword's classification can also be more than one, wherein for this example, besides a "Person,” “Najib Tun Razak” can also be a "Malay Name” and a "Politician.”
  • the historical data generator (105) If the historical data generator (105) has finished selecting all the classifications of Kx as in decision 356, it returns the set of Hm for keyword Kx i.e. HKx as in step 362, wherein Hm is a combination of keyword and relevant classification and HKx is a collection of Hm.
  • the historical data generator (105) continues to retrieve a list of related words of Cm by using a reverse dictionary as in decision 356 and step 357, wherein the list is represented by WCm.
  • the historical data generator (105) selects a set of related words WDm starting with the first set of related words based on the list of related words in WDm as in step 358.
  • the process continues by calculating the similarity Vi between WCm and WDm by using the cosine similarity until all the set of related words from WDm have been selected as in decision 359 and step 360.
  • the steps of selecting the set of related words WDm and calculating the similarity of WCm and WDm are repeated until there are no more related words to be selected.
  • the historical data generator (105) returns the set of combination Hm for each keyword, wherein the combination of Cm with other keywords is selected with the combination of ⁇ Kx, Cm, Dm ⁇ based on the highest value of similarity as in decision 359 and step 361.
  • the result of set Hm is stored as HKx as in 362.
  • the set of HKx is transferred to step 370 to be stored in the database as historical data, which is used for the updating process of process B.
  • the historical data generator (105) selects a keyword Kx starting with the first keyword from a set of keywords ⁇ , .,., ⁇ as in step 351.
  • the set of 3 keywords ⁇ K1, K2, K3 ⁇ is represented as ⁇ A, B, C ⁇ .
  • This set of keywords is generated after the user's current query keywords are separated and classified in step 330. For example, if the 3 keywords from a current query are "Najib Tun Razak;" "Prime Minister,” and “Malaysia," A represents “Najib Tun Razak " B represents “Prime Minister,” and C represents “Malaysia.”
  • the historical data generator (105) selects the first keyword K1 i.e. A from the set of keywords ⁇ A, B, C ⁇ as in step 351, the historical data generator (105) retrieves a list of other keywords, Dm as in decision 352 and step 353, wherein Dm is ⁇ B, C ⁇ .
  • the historical data generator (105) retrieves a list of related words for each keyword in list Dm as in step 354 by using a reverse dictionary, if there are more keywords to be selected from the set of keywords wherein the list of related words is represented by WDm, and wherein WDm is represented by the small dots in the circles related to B and C.
  • the historical data generator (105) selects a classification, Cm of keyword A retrieved from step 202 starting with the first classification as in step 355.
  • the example shows that there are 3 classifications ⁇ C1, C2, C3 ⁇ of A which are represented as A1, A2 and A3 respectively, with the first classification C1 is A1.
  • the historical data generator (105) continues to retrieve a list of related words of each keywords in Cm by using a reverse dictionary as in decision 356 and step 357, if there are more classifications to be selected wherein the list is represented by WCm i.e. ⁇ WC1, WC2, WC3 ⁇ and wherein WCm is represented by the small dots in the circles related to A1, A2 and A3.
  • WC1 is a collection of keywords related to A1
  • WC2 is a collection of keywords related to A2
  • WC3 is a collection of keywords related to A3.
  • the historical data generator (105) selects a set of related words WDm starting with the first set of related words based on the list of related words in WDm i.e. the small dots in the circles related to B.
  • the process continues by calculating the similarity Vi between WCm i.e. ⁇ WC1, WC2, WC3), which are the collections of keywords related to A1, A2 and A3 respectively represented by the small dots in the circles related to A1, A2 and A3 and WDm i.e. ⁇ WD1, WD2) which are the collections of related words related to keyword ⁇ and keyword C respectively represented by the small dots in the circles related to B and C.
  • the similarity measurement is calculated by using the cosine similarity until all the set of related words from WDm have been selected as in decision 359 and step 360.
  • Numbers such as 0.8, 0.5, 0.2, 0.6 and etc. represent the similarity Vi between WCm and WDm.
  • the similarity Vi between the first set of related words WDm i.e. the small dots in the circles related to B, and the first keyword's first classification i.e. the small dots in the circles related to A1 is shown as 0.8.
  • the steps of selecting the set of related words WDm and calculating the similarity of WCm and WDm are repeated until there are no more related words to be selected.
  • the steps of calculating the similarity between the classification of the first keyword i.e. A, with other keywords i.e. B and C are shown in the first row, Row 1.
  • the historical data generator (105) returns the set of combination Hm for each keyword, wherein the combination of Cm with other keywords is selected with the combination of ⁇ Kx, Cm, Dm ⁇ based on the highest value of similarity as in decision 359 and step 361.

Abstract

The present invention relates to a system (100) and method for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications. The system (100) comprises of a client server (101), a query server (102) and a database (103). The query server (102) further comprises of a query manager (104), a historical data generator (105) and a data mining engine (106). The system (100) provides predicted query or keywords based on users past experience which are trackable by the data mining engine (106). Thus, the system (100) is capable of improving search efficiency based on its predicted queries.

Description

A SYSTEM AND METHOD FOR PREDICTING QUERY IN A SEARCH ENGINE
FIELD OF INVENTION
The present invention relates to a system and method for predicting query in a search engine. More particularly, the present invention relates to a system and method for predicting query in a search engine based on a similarity of an input query with historical data of keywords and classifications.
BACKGROUND OF THE INVENTION
A query search is usually processed based on the entered keywords by users. However, sometimes it is also processed by using predicted keywords or queries which are based on similarity, semantics or popularity. These predicted keywords and queries provide a better understanding on the information to the users. For a predicted query, users usually will have to select only one query from the predicted list and a search engine will then process the query before returning a set of results on the query.
An example of above mentioned search engine is disclosed in United States Patent Publication No. 2007/0239703 A1 which relates to a system and method for generating forecasts of keyword search by providing one or more seasonal categories. After determining a category to the keywords, the system generates a forecast of a keyword search volume for one or more keywords having a seasonal correlation value greater than or equal to a predetermined threshold. Another United Stated Patent Publication No. 2004/0049499 also discloses a search engine, wherein the system extracts keywords from a query and it similarly classifies the keywords into a classification type for the system to retrieve the results. These retrieved results are then ranked in order of similarity based on the classification result.
These types of existing search engines usually use a set of training data or mining data to predict the next keyword that might be useful to the user. However, the mining data used is purely from raw historical data which is logged based on the user previous query. The data has not been enhanced or enriched with value added information. Consequently, the predicted queries which are returned by the search engine are irrelevant and inaccurate, hence returning search results which are also inaccurate. Therefore, there is a need to provide a system and method that can address the above mentioned drawbacks of the existing search engines. SUMMARY OF INVENTION
The present invention relates to a system (100) and method for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications. The system (100) comprises of at least one client server (101); a query server (102); and a database (103) to store prediction rules. The query server (102) further comprises of a query manager (104) for processing queries submitted by the at least one client server (101) by extracting keywords and keywords classifications; a data mining engine (106) for providing query patterns; and a historical data generator (105) for updating historical data set based on the extracted keywords and keywords.
The method for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications is characterised by the steps of submitting the input query to a query server (102); extracting keywords from the query; performing query classifications on the extracted keywords; and performing a query executing process and performing an updating process based on the extracted keywords and classifications.
Preferably, the step of performing query classifications is based on the definition of the query by using named entity recognition and lexical database.
Preferably, the query executing process includes the steps of obtaining associated keywords by looking up query prediction rules which are stored in a database (103); executing query based on extracted keywords and associated keywords; and sending results to user.
Preferably, the updating process includes the steps of retrieving keywords and their classifications based on the query; separating each keyword and its classifications; combining keywords and classifications; storing all combinations as mining data; executing mining process based on the updated historical data to generate predictive relationships between the keywords; and storing prediction rules in a database (103).
Preferably, the step of combining keywords and classifications further includes the steps of selecting at least one keyword to be searched based on the query; selecting at least one classification for the at least one keyword; retrieving a list of other keywords based on the query; retrieving a list of related words for the other keywords based on the query; retrieving a list of related words of the at least one classification for the at least one keyword; determining similarity measurement between the list of related words for other keywords and the list of related words of the at least one classification for the at least one keyword using cosine similarity; and selecting combination of classification and other keywords.
Preferably, the list of related words for other keywords and the list of related words of the at least one classification for the at least on keyword are retrieved using a reverse dictionary.
BRIEF DESCRIPTION OF THE DRAWINGS
The accompanying drawings, which are incorporated in and constitute a part of the specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
FIG. 1 illustrates a block diagram of a system (100) for predicting query in a search engine according to an embodiment of the present invention.
FIG. 2 illustrates a flowchart of an overall process for predicting query in a search engine according to an embodiment of the present invention.
FIG. 3 illustrates a flowchart of a method to update historical data by a historical data generator (106) of the system (100) of FIG. 1.
FIG. 4 illustrates a flowchart of a method to combine the keywords and classifications by the historical data generator (106) of the system (100) of FIG. 1. FIG. 5 illustrates an example of a process of selecting the highest similarity of related keywords and classifications based on the method of FIG. 4.
DESCRIPTION OF THE PREFFERED EMBODIMENT
A preferred embodiment of the present invention will be described herein below with reference to the accompanying drawings. In the following description, well known functions or constructions are not described in detail since they would obscure the description with unnecessary detail.
Reference is made initially to FIG. 1 which illustrates a system (100) for predicting query in a search engine according to an embodiment of the present invention. The system (100) comprises of a client server (101), a query server (102) and a database (103). The client server (101) is connected to the query server (102), while the query server (102) is connected to the client server (101) and the database (103). The query server (102) comprises of a query manager (104), a historical data generator (105) and a data mining engine (106).
The client server (101) which can either be a server or a client device such as a laptop or a mobile device, is used to submit queries to the query server (102) while the database (103) is used to store all prediction rules. The function of the query manager (104) is to process the queries by extracting keywords and performing query classifications based on their definition by using named entity recognition and lexical database. The historical data generator (105) updates historical data set based on the extracted keywords and keywords classifications, while the data mining engine (106) provides query patterns by looking at user's past experience. This past experience includes the previous query keywords used by the user and the classifications of the keywords based on named entity recognition and lexical database. Since the system (100) provides predicted queries or keywords based on user's past experience which are trackable by the data mining engine (106), the system (100) is capable of improving search efficiency based on its predicted queries. The method to predict queries is further explained in FIG. 2 which illustrates a flowchart of an overall process for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications according to an embodiment of the present invention.
Initially, a user submits an input query from the client server (101) to the query server (102) as in step 201 , wherein the query server (102) passes the query to the query manager (104). The query manager (104) then processes the query by extracting keywords and performing query classifications based on the definition of the query by using named entity recognition and lexical database as in step 202. The process then splits into two separate processes which are a query executing process (the query executing process is referred herein below as process A) and an updating process (the updating process is referred herein below as process B). These two processes are executed simultaneously.
For process A, the query manager (104) obtains a plurality of associated keywords by searching from the previously stored query prediction rules in the database (103) as in step 203. For example, if the previous query entered by user as "Najib Tun Razak Prime Minister Malaysia," the keyword classification result is based on named entity recognition and lexical database, wherein "Najib Tun Razak" is classified as a Person, "Prime Minister" is classified as a Position; and "Malaysia" is classified as a Country. Thereon, the data mining engine (106) generates query prediction rules which comprise of {Najib Tun Razak) = {Najib Tun Razak, Person, Position, Country}; {Prime Minister} = {Prime Minister, Person, Position, Country}; and {Malaysia} = {Malaysia, Person, Position, Country). The process continues by executing the query as in step 204 based on the keywords and associated keywords found in steps 203 and 204 respectively. Once the results of the query have been gathered, the results are sent back to the client server (101) as in step 205. On the other hand, process B starts by updating historical data set as in step
300 based on the extracted keywords and keyword classifications by the historical data generator (105). Once the historical data is populated by combining the keywords and the classifications from past and recent queries, the mining process is then executed to generate predictive relationships or associations between keywords by the data mining engine (106) as in step 400. Referring to previous example, if a user key in a new query keyword of "Malaysia Position," the data mining engine (106) uses association rules to predict new keywords based on the prediction rules explained above. An example of the association rules is {Malaysia Position} = {Najib Tun Razak, Person, Country}. Although the user does not mention "Najib Tun Razak" in his recent query, "Najib Tun Razak" is associated to the new query based on the user's past experience or past query. Finally, these relationship or association rules are stored in the database (103) for future consumption as in step 500.
Referring now to FIG. 3, it shows a flow chart of the method to update historical data by the historical data generator (105) as in step 300. Initially, the historical data generator (105) retrieves the keywords and classifications as in step 310 which are extracted previously from the user's past queries from step 202. Next, the historical data generator (105) separates the current keywords based on their classifications as in step 330 to get the keywords' current classifications. It is similar to the query prediction rules as explained in process A, wherein the keywords classification results are based on named entity recognition and lexical database. The historical data generator (105) then combines each keyword with its relevant classification as in step 350 to combine the user's current query keywords with the previous keywords and classifications from user's past queries. The relevant classification here is based on user's past experience. For example, if user's current keyword is "Prime Minister," the relevant classification is "Najib Tun Razak." Although the user does not mention "Najib Tun Razak' in his current query, "Najib Tun Razak' is associated to the new query based on the user's past experience or past query. Once all the combinations are generated, the historical data generator (105) then stores the combination as historical data in the database (103) as in step 370 for the data mining engine (106) to discover the potential predictive relationships between keywords as in step 400. The step of combining the keywords and classifications has an advantage over the existing search engines as it provides a resource of extracting predictive relationships between words in producing an enhanced keyword search.
A detailed process of step 350 in FIG. 3 is shown in FIG. 4, wherein it illustrates a flowchart of the method to combine the keywords and classifications by the historical data generator (106). Initially, the historical data generator (105) selects a keyword Kx starting with the first keyword from a set of keywords {Κ, ., .,Κη} as in step 351. This set of keywords are generated after the user's current query keywords are separated and classified in step 330, wherein referring to the previous example, after "Najib Tun Razak' is classified as a Person; "Prime Minister" is classified as a Position; and "Malaysia" is classified as a Country, the set of keywords is {Najib Tun Razak, Prime Minister, Malaysia}.
If the historical data generator (105) has finished selecting all keywords from the set of keywords during this process as in decision 352, the historical data generator (105) returns a collection of HKx i.e. {{HKi} {HKn}} as in step 363, wherein HKx is a collection Hm, and wherein Hm is of a combination of keywords and their relevant classifications. On the other hand, if there are more keywords to be selected from the set of keywords, the historical data generator (105) retrieves a list of other keywords, Dm as in decision 352 and step 353, wherein Dm is a list of all keywords except the one chosen in step 351. After retrieving the list of other keywords, Dm, the historical data generator
(105) then retrieves a list of related words for each keyword in list Dm as in step 354 by using a reverse dictionary, wherein the list of related words is represented by WDm. Next, the historical data generator (105) selects a classification Cm of keyword Kx retrieved from step 202 starting with the first classification as in step 355. Referring to the previous example, the set of classification associated to the keyword "Najib Tun Razak' is {Person}. However, keyword's classification can also be more than one, wherein for this example, besides a "Person," "Najib Tun Razak" can also be a "Malay Name" and a "Politician." If the historical data generator (105) has finished selecting all the classifications of Kx as in decision 356, it returns the set of Hm for keyword Kx i.e. HKx as in step 362, wherein Hm is a combination of keyword and relevant classification and HKx is a collection of Hm. On the other hand, if there are more classifications to be selected, the historical data generator (105) continues to retrieve a list of related words of Cm by using a reverse dictionary as in decision 356 and step 357, wherein the list is represented by WCm.
The historical data generator (105) then selects a set of related words WDm starting with the first set of related words based on the list of related words in WDm as in step 358. The process continues by calculating the similarity Vi between WCm and WDm by using the cosine similarity until all the set of related words from WDm have been selected as in decision 359 and step 360. The steps of selecting the set of related words WDm and calculating the similarity of WCm and WDm are repeated until there are no more related words to be selected. Once these steps are completed, the historical data generator (105) returns the set of combination Hm for each keyword, wherein the combination of Cm with other keywords is selected with the combination of {Kx, Cm, Dm} based on the highest value of similarity as in decision 359 and step 361. Once each of the keywords is processed from step 351 to step 361 , the result of set Hm is stored as HKx as in 362. Thereon, the set of HKx is transferred to step 370 to be stored in the database as historical data, which is used for the updating process of process B.
Referring now to FIG. 5, it illustrates an example of a process of selecting the highest similarity of related keywords and classifications as in step 351 to step 363 of FIG. 4 according to an embodiment of the present invention. Initially, the historical data generator (105) selects a keyword Kx starting with the first keyword from a set of keywords {Κ, .,.,Κη} as in step 351. The set of 3 keywords {K1, K2, K3} is represented as {A, B, C}. This set of keywords is generated after the user's current query keywords are separated and classified in step 330. For example, if the 3 keywords from a current query are "Najib Tun Razak;" "Prime Minister," and "Malaysia," A represents "Najib Tun Razak " B represents "Prime Minister," and C represents "Malaysia."
After the historical data generator (105) selects the first keyword K1 i.e. A from the set of keywords {A, B, C} as in step 351, the historical data generator (105) retrieves a list of other keywords, Dm as in decision 352 and step 353, wherein Dm is {B, C}. Next, the historical data generator (105) retrieves a list of related words for each keyword in list Dm as in step 354 by using a reverse dictionary, if there are more keywords to be selected from the set of keywords wherein the list of related words is represented by WDm, and wherein WDm is represented by the small dots in the circles related to B and C.
Next, the historical data generator (105) selects a classification, Cm of keyword A retrieved from step 202 starting with the first classification as in step 355. The example shows that there are 3 classifications {C1, C2, C3} of A which are represented as A1, A2 and A3 respectively, with the first classification C1 is A1. For example if the 3 classifications of "Najib Tun RazaW are "Person," "Malay Name " and "Politician," A1 represents "Person;" A2 represents "Malay Name " and A3 represents "Politician." The historical data generator (105) continues to retrieve a list of related words of each keywords in Cm by using a reverse dictionary as in decision 356 and step 357, if there are more classifications to be selected wherein the list is represented by WCm i.e. {WC1, WC2, WC3} and wherein WCm is represented by the small dots in the circles related to A1, A2 and A3. In other words, WC1 is a collection of keywords related to A1; WC2 is a collection of keywords related to A2; and WC3 is a collection of keywords related to A3.
As in step 358, the historical data generator (105) then selects a set of related words WDm starting with the first set of related words based on the list of related words in WDm i.e. the small dots in the circles related to B. The process continues by calculating the similarity Vi between WCm i.e. {WC1, WC2, WC3), which are the collections of keywords related to A1, A2 and A3 respectively represented by the small dots in the circles related to A1, A2 and A3 and WDm i.e. {WD1, WD2) which are the collections of related words related to keyword β and keyword C respectively represented by the small dots in the circles related to B and C. The similarity measurement is calculated by using the cosine similarity until all the set of related words from WDm have been selected as in decision 359 and step 360. Numbers such as 0.8, 0.5, 0.2, 0.6 and etc. represent the similarity Vi between WCm and WDm. For example, the similarity Vi between the first set of related words WDm i.e. the small dots in the circles related to B, and the first keyword's first classification i.e. the small dots in the circles related to A1 is shown as 0.8. The steps of selecting the set of related words WDm and calculating the similarity of WCm and WDm are repeated until there are no more related words to be selected. The steps of calculating the similarity between the classification of the first keyword i.e. A, with other keywords i.e. B and C are shown in the first row, Row 1.
Once these steps are completed, the historical data generator (105) returns the set of combination Hm for each keyword, wherein the combination of Cm with other keywords is selected with the combination of {Kx, Cm, Dm} based on the highest value of similarity as in decision 359 and step 361. The combination Hm of FIG. 5 is shown as output H1 = {A, At, B}, H2 = {A, A2, C} and H3 = {A, A3, B}. These outputs can be translated as H1 = {'Najib Tun RazaK', "Person", "Prime Minister"}; H2 = {"Najib Tun RazaK "Malay Name", "Malaysia"}; and H3 = {'Najib Tun RazakT, "Politician", "Prime Minister"}. Similarly, the same process is done from step 351 to step 361 on other keywords i.e. B and C to get the combination {Kx, Cm, Dm}. Once each of the keywords is processed from step 351 to step 361 , the result of set Hm is stored as HKx as in 362. Thereon, the set of HKx is transferred to step 370 to be stored in the database as historical data, which is used for the updating process of process B.
While embodiments of the invention have been illustrated and described, it is not intended that these embodiments illustrate and describe all possible forms of the invention. Rather, the words used in the specifications are words of description rather than limitation and various changes may be made without departing from the scope of the invention.

Claims

A system (100) for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications comprises of:
a) at least one client server (101);
b) a query server (102); and
c) a database (103),
characterised in that the query server (102) further comprises of:
i. a query manager (104) for processing queries submitted by the at least one client server (101) by extracting keywords and keywords classifications;
ii. a data mining engine (106) for providing query patterns; and iii. a historical data generator (105) for updating historical data set based on the extracted keywords and keywords.
A method for predicting query in a search engine based on the similarity of an input query with historical data of keywords and classifications is characterised by the steps of:
a) submitting the input query to a query server (102);
b) extracting keywords from the query;
c) performing query classifications on the extracted keywords; and d) performing a query executing process and performing an updating process based on the extracted keywords and classifications. 3. The method as claimed in claim 2, wherein the step of performing query classifications is based on the definition of the query by using named entity recognition and lexical database.
4. The method as claimed in claim 2, wherein the query executing process includes the steps of:
a) obtaining associated keywords by looking up query prediction rules which are stored in a database (103);
b) executing query based on extracted keywords and associated keywords; and
c) sending results to user. The method as claimed in claim 2, wherein the updating process includes the steps of:
a) retrieving keywords and their classifications based on the query;
b) separating each keyword and its classifications;
c) combining keywords and classifications;
d) storing all combinations as mining data;
e) executing mining process based on the updated historical data to generate predictive relationships between the keywords; and f) storing prediction rules in a database (103).
The method as claimed in claim 5, wherein the step of combining keywords and classifications further includes the steps of:
a) selecting at least one keyword to be searched based on the query; b) selecting at least one classification for the at least one keyword;
c) retrieving a list of other keywords based on the query;
d) retrieving a list of related words for the other keywords based on the query;
e) retrieving a list of related words of the at least one classification for the at least one keyword;
f) determining similarity measurement between the list of related words for other keywords and the list of related words of the at least one classification for the at least one keyword using cosine similarity; and g) selecting combination of classification and other keywords.
The method as claimed in claim 6, wherein the list of related words for the other keywords based on the query is retrieved using a reverse dictionary.
The method as claimed in claim 6, wherein the list of related words of the at least one classification for the at least one keyword is retrieved using a reverse dictionary.
PCT/MY2014/000179 2013-11-20 2014-06-12 A system and method for predicting query in a search engine WO2015076662A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2013702212 2013-11-20
MYPI2013702212A MY168793A (en) 2013-11-20 2013-11-20 A system and method for predicting query in a search engine

Publications (1)

Publication Number Publication Date
WO2015076662A1 true WO2015076662A1 (en) 2015-05-28

Family

ID=51703369

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2014/000179 WO2015076662A1 (en) 2013-11-20 2014-06-12 A system and method for predicting query in a search engine

Country Status (2)

Country Link
MY (1) MY168793A (en)
WO (1) WO2015076662A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017095510A1 (en) * 2015-11-30 2017-06-08 Intel Corporation Multi-scale computer vision
CN110633305A (en) * 2018-06-06 2019-12-31 中国石油化工股份有限公司 Chemical accident data mining method based on rule retrieval and keyword retrieval

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6006225A (en) * 1998-06-15 1999-12-21 Amazon.Com Refining search queries by the suggestion of correlated terms from prior searches

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6006225A (en) * 1998-06-15 1999-12-21 Amazon.Com Refining search queries by the suggestion of correlated terms from prior searches

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017095510A1 (en) * 2015-11-30 2017-06-08 Intel Corporation Multi-scale computer vision
CN110633305A (en) * 2018-06-06 2019-12-31 中国石油化工股份有限公司 Chemical accident data mining method based on rule retrieval and keyword retrieval

Also Published As

Publication number Publication date
MY168793A (en) 2018-12-04

Similar Documents

Publication Publication Date Title
US10853360B2 (en) Searchable index
CN109829104B (en) Semantic similarity based pseudo-correlation feedback model information retrieval method and system
CN103106282B (en) A kind of method of Webpage search and displaying
KR101078864B1 (en) The query/document topic category transition analysis system and method and the query expansion based information retrieval system and method
US8843470B2 (en) Meta classifier for query intent classification
US10289717B2 (en) Semantic search apparatus and method using mobile terminal
JP5623431B2 (en) Identifying query aspects
JP6216467B2 (en) Visual-semantic composite network and method for forming the network
US20100131485A1 (en) Method and system for automatic construction of information organization structure for related information browsing
US20170220589A1 (en) Item recommendation method, device, and system
US20100131496A1 (en) Predictive indexing for fast search
CN104199875A (en) Search recommending method and device
JP6185379B2 (en) RECOMMENDATION DEVICE AND RECOMMENDATION METHOD
US11675845B2 (en) Identifying merchant data associated with multiple data structures
CN103049495A (en) Method, device and equipment for providing searching advice corresponding to inquiring sequence
WO2017028395A1 (en) Method and device for providing search result
KR20080037413A (en) On line context aware advertising apparatus and method
Kim et al. Building concept network-based user profile for personalized web search
CN106294358A (en) The search method of a kind of information and system
US20210406291A1 (en) Dialog driven search system and method
WO2015076662A1 (en) A system and method for predicting query in a search engine
JP5486667B2 (en) Method and apparatus for diversifying query results
US20130091131A1 (en) Meta-model distributed query classification
CN106934007B (en) Associated information pushing method and device
US11281736B1 (en) Search query mapping disambiguation based on user behavior

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14784121

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 14784121

Country of ref document: EP

Kind code of ref document: A1