WO2000022493A2 - Finding querying patterns in querying applications - Google Patents
Finding querying patterns in querying applications Download PDFInfo
- Publication number
- WO2000022493A2 WO2000022493A2 PCT/US1999/024029 US9924029W WO0022493A2 WO 2000022493 A2 WO2000022493 A2 WO 2000022493A2 US 9924029 W US9924029 W US 9924029W WO 0022493 A2 WO0022493 A2 WO 0022493A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- query
- valued
- queries
- string
- result
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
Definitions
- the present invention relates to a query-based software system or an on-line analytical processing system, particularly to a software application that provides a front-end for a user to interact with the query based software system, e.g., a Structured Query Language (SQL) interface, a World Wide Web-enabled interface to databases. More particularly, the present invention relates to a computer implementation of a mathematical function which maps a finite set of strings to a set of numbers to alert the user to maximally-valued results that lie inside and outside of the line of the user's query.
- SQL Structured Query Language
- Typical attributes and their values for the security log application may include: type-of-incident, having values such as theft, assault, break-in, attempted-break-in, etc.; gravity-of-incident, having values such as minor, serious, etc.; location-of-incident, having values such as corporate-headquarters, region-x-headquarters, manufacturing-plant-a, etc.; shift-discovered, having values such as day, night, etc.; shift-occurred, having values such as day, night, unknown, etc.; year, having values such as 1998, 1997, etc.; quarter, having values such as first, second, third, or fourth; date-occurred, date-discovered, time-occurred, time-discovered, with sets of corresponding values such as 3/4/98 or 2300 hours, and so on.
- computations for the security log application may include: counting the number-of-incidents, determining the change-in-the-number of incidents in a year selected by the user (if the user does not select a year, the present year will be used) as compared with the previous year, determining the average-time-elapsed between the time of occurrence and the time of discovery of an incident, etc.
- a relational database can be used to represent the logical model of the security log data.
- the programmer can create a table of incidents, the rows of the table representing specific incidents, and the columns of the table representing the attributes used to describe those incidents.
- a specific location in the table contains the value of the attribute which describes the incident.
- GUI graphical user interface
- the programmer can also use the tools to create a GUI that allows a user to ask questions about the security log data by selecting suitable values for the attributes and computations that are of interest.
- the interface can present the user with two lists, one for the attributes and one for the computations. The user can select one or more items in each of these lists. When the user selects an attribute, the user is presented with a further list of the possible values for that attribute. The user must select at least one value for every attribute that is of interest to him.
- the querying application allows even a non-technical user to ask questions about data without writing a single line of programming code. Instead, the user selects (i.e., points and clicks) items on a computer screen to obtain the desired information about the data.
- Those items have labels which correspond to physical events that the user is familiar with. For example, if the security guard wants to find out how many incidents have occurred in 1998 at the corporate headquarters during the night shift, the guard selects the computation number-of-incidents, and the values: night for the attribute shift-occurred, 1998 for the attribute year, and corporate- headquarters for the attribute location-of-incident (hereinafter referred to as the number-of-incident query).
- the querying application computes a result for the present selection by processing the data in the security log.
- the answer to the query will be an integer, such as 5, 10 or 151.
- all computations supported by the security log querying application will output a number. For example, a positive integer for number-of-incidents, a positive or negative integer for change-in-the-number (of incidents in 1998 vs. 1997), and a real number for average-time-elapsed (between occurrence and discovery). Since the set of real numbers subsumes the set of integers, in general, we can say that computations in querying applications will map a string of attributes and their values to a real number.
- FIG. 1 The block diagram of the logical functioning of a querying application in accordance with the prior art is shown in Fig. 1.
- An input module 101 transmits a query 104 to a computation module 102.
- the computation module 102 executes the computations invoked by the query 104 and outputs the computation results 106 to an output module 103.
- the number-of-incident query example must be implemented by a stream of data that are logically partitioned into different fields.
- the query 104 may consist of two data fields, which contain information about the user's selections. One field specifies the set of attributes along with the values selected by the user for those attributes (hereinafter referred to as the attribute field), and the other field specifies the set of computations that will be presented with the selected values to the computation module 102 and executed to produce results.
- the contents of the attribute field may be further partitioned into sub-fields that correspond to the individual attributes selected by the user.
- the query 104 is a stream of bits that is partitioned logically into fields and sub-fields to identify the user's selections.
- the input module 101 is primarily a number of computer storage locations, e.g., computer memory, disk storage, tape storage, etc., that are logically partitioned to capture the fields and sub-fields of the query 104, thus making it possible for it to receive and store the different selections of the user as well as to transmit the query 104 to the computation module 102 in a manner which preserves that differentiation.
- the output module 103 is also primarily a number of computer storage locations that are logically partitioned into fields corresponding to the different computations that are supported by the querying application. The results of the computations invoked by the query 104 are stored in the appropriate fields.
- the security log application employed three computations: a count of incidents (number- of-incidents), a difference between two counts (change-in-the-number), and the average value of a difference between two time intervals (average- time-elapsed).
- Other commonly used computations producing numeric results may include percentages, products (obtained by the multiplication of two or more numbers), etc.
- a computation is defined to be a computer implementation of a mathematical function that maps an n-tuple of attribute- value pairs, referred to as an attribute-valued string, to a real number, referred to as a computation result, where n is the number of attributes in the querying application.
- the number-of-incidents computation maps that attribute- value-string to a number such as 5, 10 or 151.
- the computation module 102 is a collection of a pre- specified number of computations (as defined above), each of which can be executed to produce a numeric result.
- the computation result 106 described in Fig. 1 is a stream of bits partitioned into fields that contain numeric results, a field for each computation in the set of computations.
- the security guard did not specify a value for every attribute but only specified those attributes that the security guard was interested in. Although there was no value specified for the type-of-incident attribute, the number-of-incident query is still considered a well-defined query. The query requests that the number of incidents that occurred at night in 1998 at the corporate headquarters be computed. Those skilled in the art will recognize that there are several ways to handle such partial input. For example, a default value, namely, the "*" value, can be assigned to each attribute not specified by the user.
- the query 104 contains user-specified values in the data fields corresponding to the user-selected attributes and a "*" in the data fields corresponding to non-specified attributes.
- the query 104 may contain a special field that specifies the number of attribute- value pairs, so that the computation module 102 can inte ⁇ ret the query 104, even though only some of the attributes are specified. Logically, these two schemes are equivalent. It is appreciated that the "*" value implementation is used without any loss of generality.
- the security logs would reflect the counter measures and querying the security log about the number of break-ins at that point of entry would likely return an answer of zero. Instead, it would be desirable to analyze the security logs to identify vulnerable points of entry that are unknown to the security guard.
- a general question may read "Under what conditions does Team 1 outscore Team 2"? Those conditions, which would include which player is playing what position on the court, which player is guarding whom, and so on, are left unspecified by the user.
- the computer program will find the conditions that are meaningful and call them to the attention of the user.
- the problem with the above approach is that the developer of the data mining application must anticipate the general questions that the user is interested in, express and answer those questions in terms the user will understand, and finally, code that knowledge into the data mining application. This is only possible if the developer invests a lot of time in understanding a particular application of the data mining program. Consequently, this approach is costly and time consuming, as becomes evident if it is applied to the security log querying application discussed herein.
- the first approach permits the users to link a querying application to a data mining application.
- a user asks questions in a querying application, and uses the knowledge of the answers to generate a subset of the data.
- the data mining program operates only on that subset of the data.
- a user reviews the results of a data mining program and generates a query based on the knowledge of that review.
- the user then employs the querying application to issue the query on the data.
- this type of linking is known as bundling of querying applications with data mining applications.
- the second approach pre-computes data structures that are useful for certain querying applications and data mining applications.
- the common data structures provide a link between the querying application and the data mining application.
- OLAM On-line Analytical Mining
- a user utilizes a querying application to ask questions about the result of a data mining program. Since the results of computations used in data mining programs are often numeric in nature and every result often refers to a specific selection of attributes, a querying application can be designed that allows a user to ask questions about those results. The difference between such a querying application for analysts and a typical querying application for non-technical users is that the former employs mathematically sophisticated computations, e.g., determining correlation between attributes, finding factors that influence an attribute, etc.
- Bundling a data mining program with a querying application simply packages two different programs together. Therefore, it does nothing to simplify the complex computations of the data mining program. Also, the user still must learn two interfaces, one for the querying application and one for the data mining program. In fact, the user may have to learn a third interface as well, which links the data mining program to the querying application.
- designing a point-and-click interface to allow the user to ask questions about the results of data mining programs also does not simplify the mathematical primitives.
- the point-and-click applications simplify traditional querying applications such as the security-log querying application because the non-technical user does not have to write programming code to ask questions and the questions generally pertain to simple mathematical primitives such as sums, averages and percentages, that are simple and easy to relate to physical events that the non-technical user understands, e.g., number-of-incidents. If the mathematical primitives were complicated, such as correlation or influences, the point- and-click interfaces do not provide a mechanism to simplify that intrinsic complexity for the non-technical user.
- An example is a query to find the circumstances that correlated with large numbers of thefts or to determine the statistically significant factors in predicting a likelihood of theft.
- location-of-incident corporate-headquarters
- the output of the correlation query is a real number between 0 and 1 indicating the correlation between the incidents that occurred in 1998 and the incidents that occurred at night. But this approach has two drawbacks.
- correlation is an abstract mathematical concept (as are the prediction of likelihood and determination of statistical significance). Unlike the simpler number-of-incidents computations, the security guard is no longer dealing with familiar concepts that he or she can immediately relate to physical events. In other words, the mathematical primitive used in this computation, namely, correlation, is not easy to explain to a nontechnical user. It would be far more desirable if the security guard had to interpret only the simpler computations, which the security guard uses on a daily basis.
- Fig. 1 it is an object of the present invention to provide a data mining technique that can be inco ⁇ orated easily into the traditional querying applications represented by Fig. 1. It is another object of this invention to provide data mining programs that are simple to use even for non-technical users and that make it easy for the database querying application developers to add data mining capabilities to their offerings.
- a method and apparatus are provided to find the greatest-valued and/or least- valued results that are related to the queries issued by a user of a class of querying applications.
- a querying application includes means for the user to issue a query, namely, to select a computation, to specify the values of attributes that are input to that computation and to receive the numeric result of that computation for that input.
- Such a model fits the mould of easy-to-use, point-and-click querying applications that use computations such as counts, sums, averages, percentages, etc., to map the user's input into a number.
- the greatest-valued and/or least-valued results corresponding to inputs that overlap with the user's input are also computed and made available to the user in the form of alerts, along with the numeric result for the user query. Further, the user is alerted to whether the inputs corresponding to these greatest-valued and/or least- valued results overlap with still other inputs that yield even greater and/or lesser results respectively. The user is given the means to learn about those other inputs, should they exist.
- the user is also provided with the means to identify other greatest-valued and/or least-valued results whose corresponding inputs did not overlap with any input that was provided by the user or that was presented to the user in the form of an alert during the querying session, and which are greater than and/or lesser than (respectively) the results of all other inputs that overlap with said corresponding inputs.
- the invention increases the likelihood that these results will be of interest to the user. After all, these results are more extreme than the result of the user's original query.
- the invention By relating the maximally-valued results to the user's queries by making sure that there are common elements in the user's query and the queries that yield the maximally-valued results, the invention increases the likelihood that the results will be of relevance to the user and that the user will understand the results.
- the invention is not limited to applications for the non-technical user. It also applies to more complicated computations that map attribute- values to numbers, such as measures of correlation, influence of factors and statistical significance.
- Our invention suggests a new way to develop a querying application for analysts to review the results of data mining programs in a way that lets the user decide what should be reviewed but still manages to alert the user to important results that may be in danger of being overlooked.
- Fig. 1 is a block diagram of the logical functioning of a typical querying application, in accordance with the prior art
- Fig. 2 is a block diagram of a querying application utilizing an alert generator in accordance with the present invention
- Fig. 3 is a flowchart describing the functioning of the alert generator in the querying application depicted in Fig. 2;
- Fig. 4 is a flowchart describing the method by which the Alert
- Fig. 5 is a flowchart describing the method by which the alert generator generates MLNLNT alerts
- Fig. 6 is a flowchart that describes the method by which the alert generator generates LOQMAX alerts
- Fig. 7 is a flowchart that describes the method by which the alert generator generates LOQMIN alerts
- Fig. 8 is a flowchart describing the method by which the alert generator generates MAXTRACE alerts
- Fig. 9 is a flowchart describing the method by which the alert generator generates MINTRACE alerts
- Fig. 10 describes a specific implementation of the alert generator, in which alert results are pre-computed
- Fig. 11 describes a specific implementation showing how the user could be alerted to unseen MAXLNT and MLNLNT strings at the end of a session
- Fig. 12 displays the flowchart for generating MAXLNT, LOQMAX and MAXTRACE alerts using the optimized operation of the alert generator;
- Fig. 13 displays the flowchart for generating MLNLNT, LOQMIN and MINTRACE alerts using the optimized operation of the alert generator
- Fig. 14 describes a specific implementation of the output module, namely, a graphical user interface for displaying the alerts
- Fig. 15 describes a specific implementation of the input module, for use in a soccer data querying application utilizing an alert generator and a computation module, in accordance with the present invention
- Fig. 16 describes a specific implementation of the output module, for use in a soccer data querying application utilizing an alert generator and a computation module, in accordance with the present invention
- Fig. 17 describes a specific implementation of the input module, for use in a tennis data querying application utilizing an alert generator and a computation module, in accordance with the present invention
- Fig. 18 describes a specific implementation of the output module, for use in a tennis data querying application utilizing an alert generator and a computation module, in accordance with the present invention.
- Fig. 19 describes a specific implementation of a querying application utilizing an alert generator in accordance with the present invention, functioning within a multi-user environment.
- the present invention enables the user to find hidden patterns in the data using only the simple computations (such as counts, sums, percentages, averages, etc.) supported by the computation module 102 of Fig. 1. It is not necessary to use mathematically-sophisticated computations involving correlation, factors, influences, etc. to find hidden patterns in the data. Consequently, the non-technical user can also utilize the advanced functionality found in the sophisticated querying application since he or she is never exposed to computations that they are not already familiar with.
- the following is an illustration of how the present invention adds the capability of finding hidden patterns in the data to the security log querying application in a manner that does not overwhelm the non- technical user (i.e., the security guard). It is appreciated that the security- log example is used only for the pu ⁇ ose of illustration and the present invention has a wide range of applicability.
- the security guard dealing with only the simple computations, (e.g., number-of-incidents, change-in-number), and average-time-elapsed is able to find hidden patterns using the present invention.
- This is a hidden pattern, in that the security guard was not concerned with the type of incident when he issued the query.
- the result of the modified query is clearly relevant to any conclusions that the security guard may reach regarding the ten incidents. In fact, not knowing that these incidents involved theft may mislead the security guard, since the ten thefts at co ⁇ orate-headquarters may pose a serious problem while other types of incidents may not be as serious.
- the above examples also demonstrate why the present invention simplifies the data analysis and motivates a user to make use of such data analysis.
- the querying application of the present invention presents the mathematically-sophisticated concept of correlation (e.g., thefts are clearly correlated with the conditions in the user's query, such as, night, 1998, co ⁇ orate-headquarters, but the user does not need to formally understand the mathematical concept of correlation to appreciate the significance of the result.
- the querying application of the present invention advantageously presents (or expresses) the correlation in terms of the physical events that the user (i.e., the security guard) is familiar with, i.e., all of the specific events were thefts.
- the present technique of juxtaposing the result of the user's query with the answer to a related query makes it possible to convey the sophisticated mathematical concepts in terms of physical events that the user understands.
- the present technique also provides a more accurate data analysis, thereby encouraging more users to utilize data analysis.
- the security guard reaches an erroneous conclusion if he or she is not alerted to the fact that all the events pertaining to his or her query were thefts.
- the present invention advantageously minimizes the probability that the user makes an erroneous decision. Therefore, the user will be motivated to review the results of a related query that may potentially uncover hidden patterns in the data along with the result of his or her query.
- the present invention provides a rule or mechanism to find such hidden patterns automatically by computing the attribute-value strings that overlap with the user's query completely, i.e., they are longer than and contain the attribute- value string of the user's query.
- the present invention informs the user of the attribute- value string that has the greatest-valued result (i.e., the most interesting or informative result) for the number-of-incidents computation.
- This rule allows the present invention to add conditions to the user's query that enables the user to inte ⁇ ret the data in a more informative manner than just based on the user's original query.
- the following is an illustration of how a similar rule for the change-in-number computation leads to the identification of other hidden patterns in the data.
- the present invention provides a rule or mechanism to find such hidden patterns automatically by computing the attribute- value strings that overlap with the user's query completely to find the greatest-valued result and the attribute- value strings that are overlapped completely by the user's query to find the least- valued result.
- the present invention informs the user to the existence of two attribute-value strings that have elements in common with the user's query and that have the greatest-valued and least-valued results for the change-in-number computation. It is appreciated that the greatest-valued and least-valued results provide the most interesting information in the data that is also pertinent to the user's query.
- the present invention builds further on the above idea of identifying the greatest-valued and least-valued queries that overlap with the user's query in three ways.
- the present invention presents the user with all queries that overlap with the user query and have greatest- valued and/or least- valued results (hereinafter referred to as overlapping queries).
- the result of the query Ql is 4.25 hours (the average time for discovering incidents that occurred at night in 1998).
- the present invention alerts the user (i.e., security guard) to the existence of query Q2, which has the greatest-valued result of all queries that overlap the user's query.
- the present invention provides the security guard with the means to trace all overlapping queries to find a maximally- valued query Q4.
- the maximally-valued query has the greatest-valued result of all overlapping queries.
- the present invention also provides the means to trace all overlapping queries to find a minimally- valued query.
- the minimally- valued query has the least-valued result of all overlapping queries.
- the present invention alerts the user to other queries that have the following two properties: the alerted queries are not related to the user's query (i.e., the alerted queries do not overlap with the user's query and cannot be traced from the user's query as described above) and the alerted queries have the greatest-valued (or least- valued) result of all queries that overlap with them.
- the present invention alerts the user to other maximally- valued (or minimally- valued) queries.
- the present invention alerts the user to the queries having the greatest-valued and/or least-valued results. It is appreciated that only one of the specified computations may actually yield queries having the greatest-valued and least-valued results.
- the querying application adds a computation to the computations in the user query and presents the results of that added computation in addition to the computations selected by the user.
- the present invention alerts the user to queries having results that are greater than (or less than) the user's queries.
- the alerts ensure the user gains information about these abstract relationships implicitly when he compares the attribute-value strings of his queries to the attribute-value strings of the alerted queries to find results that are greater than or less than the results of his original query.
- the present invention alerts the user to the maximally- valued (or minimally- valued) results providing more insightful information about the data without subjecting the user to abstract, complex mathematical primitives.
- the user still deals with the computations that he or she can associate a physical event with, e.g., number-of-incidents, etc. Therefore, the interface for the querying application of the present invention is no more complex than the interface for the traditional querying application.
- the querying application of the present invention is effective in finding and alerting the user to meaningful hidden patterns in the data for three different types of computations.
- the querying application of the present invention can be used with any computation that maps an attribute-value string to a number.
- the present invention is generally applicable to all querying applications defined by the logical functioning of Fig. 1.
- Fig. 2 there is illustrated a block diagram of a querying application which inco ⁇ orates the present invention.
- the system of Fig. 2 adds an alert generator 201 to the querying application of Fig. 1.
- the Alert Generator alerts the user to greatest-valued and/or least- valued results that overlap with the user's queries (i.e., the first extension described herein), or that lie completely outside the user's line of questioning (i.e., the second extension described herein).
- the input module 101 provides the alert generator 201 and the computation module 102 with a query 104 consisting of the user's inputs (e.g., attribute- valued string S and the set of computations C to be performed). Since the possible values for every attribute is known a priori, all valid attribute values can be determined and pre-stored in a valid attribute values database 206. The stored valid attribute values are made available to the alert generator 201.
- a query 104 consisting of the user's inputs (e.g., attribute- valued string S and the set of computations C to be performed). Since the possible values for every attribute is known a priori, all valid attribute values can be determined and pre-stored in a valid attribute values database 206. The stored valid attribute values are made available to the alert generator 201.
- the overlap and tiebreak rules can be pre-stored in an overlap rules database 204 and a tiebreak rules database 205, respectively.
- the alert generator 201 generates alerts (e.g., alert notifications 202 and alert data 203) in response to the user's query 104 based on the valid attribute values in the valid attribute values database 206, computations in the computation module 102, the overlap rules in the overlap rules database 204, and the tiebreak rules in the tiebreak rules database 205.
- the alert data 203 contains information about the alerts generated and presented to the user in response to the user's query 104.
- the alert generator 201 provides alert notifications 202 and alert data 203, and the computation module 102 provides the computation results 106 to the output module 103.
- the computation module 102 and the alert generator 201 can synchronize their output using a communication path 207.
- the direct communication path between the input module 101 and the computation module 102 namely, the query 104
- all communication between input module 101 and the computation module 102 can be routed through the alert generator 201, since the alert generator 201 has direct communication paths with the input module 101 and the computation module 102.
- different communication paths between the computation module 102 and the alert generator 201 can be used for synchronizing output and executing computations.
- the same communication path 207 can be used for synchronizing output and invoking the execution of computations.
- the input module 101 provides the alert generator 201 with the query 104, consisting of an attribute- valued string S, and a set C containing M computations, ⁇ Ci, C 2 ... C M ⁇ at step 301.
- the alert generator 201 determines if the string S is a valid input using the valid attribute values stored in the valid attribute values database 206 at step 302. The value of each attribute in the string S is verified against the valid attribute values.
- the string S is declared to be invalid and all alert notifications are set to FALSE for all computations in the set C at step 304.
- the string S is declared to be a valid attribute-valued string and alerts are generated for the query 104 at step 303.
- Overlap Rule O is a function that has two attribute-valued strings R and S as inputs, and produces a Boolean result (true or false). If the overlap rule returns a true result, then string R is said to "overlap" string S.
- the alert generator 201 (Fig. 2) applies three types of overlap rules: substring, superstring and commonality rules. In all three rules, a comparison is true only if the values of attributes being compared are not the "*" value.
- the substring rule, O(R, S) returns a true result if and only if ("iff) every attribute in string R also occurs (or exists) in string S, and has the same value.
- String R is a substring of string S if the substring rule returns a true result. It is appreciated that the substring rule will always return a true result for O(R, R).
- the superstring rule, O(R, S), returns a true result if and only if every attribute in string S also occurs (or exists) in string R, and has the same value.
- String R is a superstring of string S if the superstring rule returns a true result. It is appreciated that the superstring will always return a true result for O(R, R).
- the commonality rule, O(R, S), returns a true result if and only if strings R and S satisfy the following constraints: i) there is at least one attribute A having the same value in both strings R and S; ii) there is at least one attribute B in string R, but not necessarily in string S and the value of the attribute B differs in strings R and S; and iii) there is at least one attribute C in string S, but not necessarily in string R and the value of the attribute C differs in strings R and S.
- Strings R and S have certain values in common if the commonality rule returns a true result. It is appreciated that the commonality rule will always return a false result for O(R, R).
- An attribute-valued string R overlaps another attribute-valued string S if and only if at least one overlap rule returns a true result.
- the overlap rules are not limited to the three specified herein and other overlap rules may be utilized by the alert generator 201. For example, instead of determining if one or more values of attributes are common to two strings, the overlap rules can determine if one or more attributes are common to two strings. Tiebreak Rule
- the alert generator 201 applies two types of tiebreak rules: longest string and shortest string rules.
- the longest string rule selects the string with the largest number of attributes selected by the user.
- the length of an attribute- valued string S denoted as L(S)
- L(S) The length of an attribute- valued string S, denoted as L(S)
- the shortest string rule selects the string with the smallest number of attributes selected by the user.
- the tiebreak rules are not limited to the two specified herein and other tiebreak rules may be utilized by the alert generator 201 (Fig. 2).
- the alert generator 201 (Fig. 2) generates six types of alerts: LOQMAX, LOQMIN, MAXLNT, MLNLNT, MAXTRACE and MINTRACE alerts. Each of the alerts will be described in detail herein.
- a query Q consisting of an attribute-valued string S, and a set C containing M computations, ⁇ C ⁇ , C 2 ... C M ⁇ is provided as input to the alert generator 201.
- the alert generator 201 Based on the input query Q, the alert generator 201 generates the LOQMAX, LOQMIN, MAXINT, MLNLNT, MAXTRACE, and MINTRACE alerts when strings are found satisfying the LOQMAX, LOQMIN, MAXLNT, MLNLNT, MAXTRACE, and MINTRACE conditions, respectively. Preferably, the alert generator 201 generates the alerts in the following order for the query Q: MAXINT, MLNLNT, LOQMAX, LOQMIN, MAXTRACE, and MINTRACE alerts.
- Each alert notification 202 (Fig. 2) contains a field for identifying (or describing) the computation associated with the notification.
- the alert notification also includes six other fields, one for each type of alert. Each of the alert fields can be set to true or false, indicating whether the corresponding alert is being generated, respectively, for the particular computation.
- the alert data 203 (Fig. 2) contains a field specifying the type of alert data (i.e., MAXINT Alert Data, MLNLNT Alert Data, LOQMAX Alert Data, LOQMIN Alert Data, MAXTRACE Alert Data or
- MINTRACE Alert Data another field that contains the actual Alert Data itself and preferably, a third field containing alert data error information.
- the content of the third field depends on the alert type (i.e., the content of the first field): the third field contains "MAXINT Alert Data Error! Reference source not found" for the MAXLNT alert, "MLNLNT Alert Data Error! Reference source not found” for the MLNLNT alert and so on.
- the third field contains "MAXINT Alert Data Error! Reference source not found" for the MAXLNT alert, "MLNLNT Alert Data Error! Reference source not found” for the MLNLNT alert and so on.
- the LOQMAX condition is used to find (or unearth) an attribute- valued string having the greatest-valued result that overlaps with the string in the user's query.
- string and "attribute- valued string” are used interchangeably herein.
- the input module 101 provides the alert generator 201 with the query 104, consisting of an attribute-valued string S, and a set C containing M computations, ⁇ Ci, C 2 ... C M ⁇ at step 601.
- the alert generator attempts to find a string S JLOQMAX that satisfies the LOQMAX condition for the string S and the computation C, , utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 602.
- the LOQMAX alert notification for Cj is set to "true” at step 603 and the string SJLOQMAX is output as the LOQMAX alert data for computation Cj at step 604.
- the alert generator 201 does not find the string SJLOQM A X for the computation Cj
- the LOQMAX alert notification for Cj is set to "false” at step 605.
- the variable "i" is incremented by one at step 606 until all of the computations in C have been dealt with (i.e., i > M) at step 607.
- the LOQMAX alert data itself consists of three sub-fields of data for each computation Cj.
- the first field contains the description of the computation Cj
- the second field contains the attribute-valued string SJLOQMAX that satisfies the LOQMAX condition for the input query 104
- the third field contains the computation result CJ(SJLO Q MAX)-
- the LOQMIN condition is used to find (or unearth) an attribute- valued string having the least-valued result that overlaps with the string in the user's query.
- the LOQMIN condition can be mathematically characterized as the local minimum condition, i.e., finding a local minimum string.
- the input module 101 provides the alert generator 201 with the query 104, consisting of an attribute- valued string S, and a set C containing M computations, ⁇ Ci, C 2 ... CM ⁇ at step 701.
- the alert generator attempts to find a string SJLOQMIN that satisfies the LOQMIN condition for the string S and the computation Cj , utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 702.
- the LOQMIN alert notification for Cj is set to "true” at step 703 and the string SJLO Q MIN is output as the LOQMIN alert data for computation Cj at step 704.
- the alert generator 201 does not find the string SJLOQMIN for the computation Cj, the LOQMIN alert notification for Cj is set to "false” at step 705.
- the variable "i" is incremented by one at step 706 until all of the computations in C have been dealt with (i.e., i > M) at step 707.
- the LOQMIN alert data itself consists of three sub-fields of data for each computation C,.
- the first field contains the description of the computation C
- the second field contains the attribute-valued string S ILOQMIN that satisfies the LOQMIN condition for the input query 104
- the third field contains the computation result C,(S, LO Q MIN )-
- the MAXINT condition is used to locate a string q having greatest-valued result r ma x that overlaps with the string in the user's query, and there is no query with a string having a result greater than r max that overlaps the string q.
- the MAXINT condition can be generalized to the following: The string S,MAXINT overlaps the string S and C,(S,MAXINT) > ⁇ C,(P) for all strings P overlapping the string S, MAXIN T- It is appreciated that the
- MAXINT condition can be mathematically characterized as the global maximum condition, i.e., finding one global maximum string.
- the more general MAXINT condition may find more than one global maximum string.
- the input module 101 provides the alert generator 201 with the query 104, consisting of an attribute-valued string S, and a set C containing M computations, ⁇ C l5 C 2 ... C M ⁇ at step 401.
- the alert generator 201 attempts to find a string S, MAXIN T that satisfies the MAXINT condition for the string S and the computation C, , utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 402.
- the MAXINT alert data itself consists of three sub-fields of data for each computation Cj.
- the first field contains the description of the computation Cj
- the second field contains the attribute-valued string Si MA xi NT hat satisfies the MAXINT condition for the input query 104
- the third field contains the computation result C J (S JMAXINT ).
- the MININT condition is used to locate a string q having least- valued result r m j n that overlaps with the string in the user's query, and there is no query with a string having a result less than r m j n that overlaps the string q.
- a string S JMIN INT satisfies the MININT (S, Cj) condition if and only if the following constraints are satisfied (for some pre-specified overlap and tiebreak rules):
- the MININT condition can be mathematically characterized as the global minimum condition, i.e., finding one global minimum string.
- the general MININT condition may find more than one global minimum string.
- the input module 101 provides the alert generator 201 with the query 104, consisting of an attribute-valued string S, and a set C containing M computations, ⁇ Ci, C 2 ... C M ⁇ at step 501.
- the alert generator 201 attempts to find a string S JMININT that satisfies the MININT condition for the string S and the computation Cj , utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 502.
- the MININT alert notification for Cj is set to "true” at step 503 and the string SJ MININT is output as the MININT alert data for computation Cj at step 504.
- the alert generator 201 does not find the string S JMININT for the computation Cj, the MININT alert notification for Cj is set to "false” at step 505.
- the variable "i" is incremented by one at step 506 until all of the computations in C have been dealt with (i.e., i > M) at step 507.
- the MININT alert data itself consists of three sub-fields of data for each computation C,.
- the first field contains the description of the computation C
- the second field contains the attribute- valued string S ⁇ M i N i N ⁇ that satisfies the MININT condition for the input query 104
- the third field contains the computation result C,(S,MININT)-
- the MAXTRACE function enables the user to trace through all overlapping queries to find a maximally-valued query.
- the maximally-valued query has the greatest- valued result of all overlapping queries.
- the MAXTRACE alert data itself consists of two sub-fields of data for each computation C, and a varying number of additional data fields depending on the number of strings T in the MAXTRACE.
- the first field contains the description of the computation C
- the second field contains the number of strings T in the MAXTRACE.
- the MAXTRACE alert data additionally contains one data field for identifying the string T j and a second data field for the computation result Cj(T j ).
- the MINTRACE function enables the user to trace through all overlapping queries to find a minimally-valued query.
- the minimally-valued query has the least- valued result of all overlapping queries.
- the input module 101 provides the alert generator 201 with the query 104, consisting of an attribute-valued string S, and a set C of M computations, ⁇ Ci, C 2 ... C M ⁇ at step 901.
- the alert generator attempts to find a MINTRACE for the string S and the computation Cj , utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 902.
- the MINTRACE alert data itself consists of two sub-fields of data for each computation Cj and a varying number of additional data fields depending on the number of strings T in the MINTRACE.
- the first field contains the description of the computation Cj and the second field contains the number of strings T in the MINTRACE.
- the MINTRACE alert data additionally contains one data field for identifying the string Tj and a second data field for the computation result (T j ).
- the alert generator primarily performs the tasks of searching for attribute-valued strings that satisfy one or more of the six types of alerts, for a particular input query.
- the querying application can compute and list all the attribute- valued strings satisfying the alert conditions for all potential user queries in advance, i.e., offline and prior to accepting user queries.
- the alert generator After receiving a specific user query, the alert generator searches for attribute-valued strings satisfying the alert conditions for the specific user query from the stored pre-computed results, thereby improving the efficiency of the alert generator.
- Fig. 10 there is illustrated a querying application of the present invention which pre-computes the alert conditions for all possible user queries in advance.
- the querying application of Fig. 10 adds a pre-computed results database 1001 to the querying application of Fig. 2.
- elements shown in Fig. 10 corresponding to those shown in Fig. 2 are denoted by the same reference numerals and their descriptions are omitted.
- the alert generator 201 now has access to the pre-computed results stored in the pre-computed results database 1001.
- the alert generator 201 searches the pre-computed results database 1001 to check if any of the stored results satisfies any of the six alert criteria for the user query 104. As in Fig. 2, the alert generator 201 outputs (or provides) the alert notifications 202 and alert data 203 to the output module 103. In contrast to Fig. 2, the link 207 of Fig. 10 is used only for synchronizing the transmission of the output from the computation module 102 and the alert generator 201 to the output module 103. That is, the link 207 is no longer necessary to invoke the execution of computations. Alternatively, the pre-computed results database 1001 may not contain the alert conditions for all possible user queries.
- the pre- computed results in the pre-computed results database 1001 is incomplete. This may occur if a data mining program or statistical analysis package is used to obtain the pre-computed results, as such programs generally do not provide an exhaustive, time-consuming and complete list of the alert conditions for all possible user queries. Persons skilled in the art will recognize that there are many data mining programs and statistical analysis packages that can be used in conjunction with the querying application of the present invention. Fundamentally, there is no difference in the operation of the alert generator 201 whether the pre-computed results are partial or complete. The alert generator 201 searches for the attribute-valued strings satisfying the alert conditions for a specific user query from the stored pre-computed results, thus resulting in a faster response.
- the alert generator 201 since the alert generator 201 generates the alert notifications 202 and the alert data 203 based on the incomplete pre-computed results stored in the pre-computed results database 1001, some of the alerts obtained with the complete pre-computed results may not be found in the alert notifications 202 and the alert data 203.
- the alert generator 201 may utilize both the pre-computed results database 1001 and the computation module 102 to generate the alert notifications 202 and the alert data 203.
- the alert generator 201 utilizes the computation module 103 if the attribute- valued strings satisfying the alert conditions for a specific user query is not found in the pre-computed results database 1001.
- the alert generator 201 utilizes the knowledge of the computations in the computation module 102 by executing these computations to discover the maximally- valued results in real-time or executing these computations in advance to discover the maximally-valued results by searching the pre-computed results database 1001. Also, the Alert Generator 201 uses the valid attribute values known a priori and pre- stored in the valid attribute values database 206.
- the alert generator 201 alerts the user to the existence of these unseen strings at the end of his or her current session (i.e., before the user exits the querying application).
- Fig. 11 illustrates the querying application of the present invention inco ⁇ orating the "end-of-session" alert feature. It is appreciated that the "end-of-session" alert feature can be inco ⁇ orated into the querying applications of Figs. 2 and 10. For simplicity, elements shown in Fig. 11 corresponding to those shown in Figs. 2 and 10 are denoted by the same reference numerals and their descriptions are omitted.
- the input module 101 now, in addition to the user query 104, also provides session information 1101 to the alert generator 201.
- the session information 1101 indicates the status of the user's interaction with the querying application (or simply, the status of the user's session).
- the session information 1101 consists of three Boolean fields (i.e., the content of the field is either "true” or false").
- the first field indicates whether a new session has started, the second field indicates whether the current session has ended, and the third field indicates whether the user has issued a query. It is appreciated that only one of the three fields may be set to "true" for any given time.
- the alert generator 201 provides an end-of-session alert notification 1102 to the output module 103.
- the alert generator 201 also provides an end-of-session alert data 1103 to the output module 103.
- the end-of-session alert data 1103 consist of a field specifying the number of MAXINT alerts (or N MAX ) that have not been alerted to the user and another field specifying the number of MLNLNT alerts (or N MIN ) that have not been alerted to the user.
- the end-of-session alert data 1103 contains additional (N MAX + N MIN ) fields identifying each
- the alert generator 201 alerts the user via the output module 103 of strings satisfying the MAXINT or MININT criteria that have not been alerted to the user in the current session by setting the end-of-session alert notification 1102 to "true".
- the alert generator 201 maintains a list of MAXINT and MININT strings provided (or alerted) to the user and a second list containing all MAXINT and MININT strings. It is appreciated that the list of alerted MAXINT and MININT strings can grow with each issuance of the user query.
- the alert generator 201 compares the list of alerted MAXINT and MININT strings with the list containing all MAXINT or MININT strings.
- the alert generator 201 sets the end-of-session alert notification 1102 to "true,” and outputs an appropriate end-of-session alert data 1103 to the output module 103.
- the alert generator 201 may alert the user of the unseen MAXINT and MININT strings during the session (i.e., presented with the user's query results) instead of waiting till the session is over.
- the querying application of the present invention can vary the number of alerts that are generated by the alert generator 201. For example, instead of alerting the user to only the strings that overlap the user's query and produce the greatest-valued and least- valued results, the alert generator 201 can also alert the user to the strings that produce the next greatest-valued and least- valued results; or the top five greatest-valued and bottom two least- valued results that overlap the user's query. Optimizing the Alert Generation
- the querying application of the present invention can optimize the manner in which the alert generator 201 generates the alerts to take advantage of the various relationship between the alerts. This is illustrated below for the case when the MAXINT and MININT conditions are applied. Similar relationships can be obtained for the case when the general MAXINT and general MININT conditions are applied.
- the alert generator 201 For each computation in C, the alert generator 201 first generates MAX alerts (e.g., LOQMAX, MAXINT and MAXTRACE) and then generates MIN alerts (e.g., LOQMIN, MININT and MINTRACE) for the string S and the computation Cj.
- MAX alerts e.g., LOQMAX, MAXINT and MAXTRACE
- MIN alerts e.g., LOQMIN, MININT and MINTRACE
- the alert generator 201 first attempts to find a string S JMAXINT that satisfies the MAXINT condition for the string S and the computation Cj, utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 1202. If the string S JMAXINT is found, the alert generator 201 sets the MAXINT alert notification for Cj to "true,” and outputs the string S JMAXINT as the MAXINT alert data for computation Cj at step 1205.
- the alert generator 201 sets the MAXINT alert notification for Cj to "false” at step 1203.
- the alert generator 201 attempts to find a string SJL OQMAX that satisfies the LOQMAX condition and the corresponding MAXTRACE for the string S and the computation Cj at step 1204. If the string S JLOQMAX is found, the alert generator 201 sets the LOQMAX and MAXTRACE alert notifications for Cj to "true,” and outputs the string S JLOQMAX as the LOQMAX alert data and the MAXTRACE as the MAXTRACE alert data at step 1207. However, if the string S JLOQMAX is not found, the alert generator 201 sets the LOQMAX and MAXTRACE alert notifications to "false" at step 1206.
- the manner in which the alert generator 201 generates MIN alerts is described in conjunction with the flow chart of Fig. 13.
- the input module 101 provides the attribute-valued string S and the computation Cj to the alert generator 201 at step 1301.
- the alert generator 201 first attempts to find a string SJ M I NINT that satisfies the MININT condition for the string S and the computation Cj , utilizing the overlap rules and the tiebreak rules stored in the overlap rules database 204 and the tiebreak rules database 205, respectively, at step 1302. If the string S JMININT is found, the alert generator 201 sets the MININT alert notification for Cj to "true,” and outputs the string S JMININT as the MLNLNT alert data for computation Cj at step 1305.
- the alert generator 201 sets the MININT alert notification for Cj to "false” at step 1303.
- the alert generator 201 attempts to find a string SJ LOQMIN that satisfies the LOQMIN condition and the corresponding MINTRACE for the string S and the computation Cj at step 1304. If the string S J L O Q MIN is found, the alert generator 201 sets the LOQMIN and MINTRACE alert notifications for Cj to "true,” and outputs the string S JLO Q MIN as the LOQMIN alert data and the MINTRACE as the MINTRACE alert data at step 1307. However, if the string S JLOQMIN is not found, the alert generator 201 sets the LOQMIN and MINTRACE alert notifications to "false" at step 1306.
- the input and output modules include user interfaces for displaying the alerts, as shown in Fig. 14. It is appreciated that the input and output modules fundamentally represent computer storage locations. Consequently, there are many ways of inco ⁇ orating the user interfaces into the input and output modules, and Fig. 14 depicts one such possible inco ⁇ oration or implementation for the case when the MAXINT and MININT conditions are applied. Similar interfaces can be obtained for the case when the general MAXINT and general MININT conditions are applied.
- the input module 101 (Fig. 2) displays the attribute- valued string S contained in the user query in a text box 1401 and a list of computation descriptions for the user query in a drop-down list 1402.
- the output module 103 (Fig. 2) displays the results of the computation Cj selected from the drop-down list 1402 by the user. That is, the output module 103 displays the computation result (S) in the text box 1403.
- the output module 103 provides the user with "clickable” buttons (i.e., a "MAX” button 1404 that provides the MAXTRACE when clicked and a "MIN” button 1405 that provides the MINTRACE when clicked) and "clickable” pictorial (image) alerts 1406 (if enabled, indicates that the user query is a MAXINT or MININT string) 1407 (when clicked, provides the MAXINT string at the end of a MAXTRACE) and 1408 (when clicked, provides the MININT string at the end of a MINTRACE) to easily access the alerts generated by the alert generator 201.
- “clickable” buttons i.e., a "MAX” button 1404 that provides the MAXTRACE when clicked and a "MIN” button 1405 that provides the MINTRACE when clicked
- “clickable” pictorial (image) alerts 1406 if enabled, indicates that the user query is a MAXINT or MININT string
- 1407 when clicked, provides the MAXINT string at the end of
- the output module 103 determines if the MAXINT alert data is same as the attribute-valued string S in the user query (indicating that the user query is the maximally (or most) interesting string in the data). If it is determined that the MAXINT alert data is equivalent to the string S in the user query, the output module 103 activates the pictorial alert 1406 that indicates that a MAXINT or MININT string has been found, disables the "MAX" button 1404 and disables the pictorial alert 1407 that provides the MAXINT string at the end of the MAXTRACE, indicating that the user query is the maximally-valued query. In other words, no other query related to the user's query has a result that is greater than the result to the user query. Consequently, there is no need to trace to the MAXINT.
- the output module 103 deactivates the pictorial alert 1406 (since the user query is not a MAXINT string), and enables or activates the "MAX" button 1404 and the pictorial alert 1407 (since it is possible to trace to a MAXINT string).
- the alert generator 201 de-activates the pictorial alerts 1406 and 1407, and enables the "MAX" button 1404.
- the output module 103 displays the details of the string S JLOQMAX satisfying the LOQMAX condition for (S, Cj) when the user clicks on the "MAX" button 1404. In other words, the user can issue a query for the string S JLOQMAX by simply using the "MAX" button feature. Also, the output module 103 displays the corresponding MAXTRACE for the string S.
- the output module 103 displays the computation result Cj(SjMAxiN ⁇ ) for the maximally valued string S JMAXINT satisfying the MAXINT condition for (S, Cj) when the user clicks on either pictorial alert 1406 or 1407.
- the output module 103 presents the computation results of the maximally valued string S JMAXINT along with the attribute-valued string S in the user query, thereby advantageously allowing the user to examine both strings together and, hence, to appreciate the relevance of the maximally- valued string to his query.
- the output module 103 determines if the MININT alert data is the same as the attribute-valued string S in the user query (indicating that the user query is the most interesting string in the data). If it is determined that MININT alert data is equivalent to the string S in the user query, the output module 103 activates the pictorial alert 1406, and disables the "MIN" button 1405 and the pictorial alert 1408. However, if it is determined that the MININT alert data is not equivalent to the string S in the user query, the output module 103 de-activates the pictorial alert 1406, enables or activates the "MIN" button 1405 and the pictorial alert 1408.
- the alert generator 201 If the alert generator 201 does not issue any MININT alert notification, the alert generator de-activates the pictorial alerts 1406 and 1408, and enables the "MIN" button 1405.
- the output module 103 displays the details of the string S JLOQMIN satisfying the LOQMIN condition for (S, Cj) when the user clicks on the "MIN" button 1405. In other words, the user can issue a query for the string S JLO Q MIN by simply using the "MIN" button feature. Also, the output module 103 displays the corresponding MINTRACE for the string S.
- the output module 103 displays the computation result C J (S JMIN I N T) for the minimally valued string S JMININT satisfying the MININT condition for (S, Cj) when the user clicks on either pictorial alert 1406 or 1408.
- the output module 103 presents the computation results of the minimally valued string S JM ININ T and the attribute-valued string S in the user query, thereby advantageously allowing the user to examine both strings together and, hence, to appreciate the relevance of the minimally- valued string to his query.
- the querying application of the present invention can be used to query and obtain information about maximally valued strings in various kinds of data, including sports data.
- the graphical user interfaces specifically inco ⁇ orated in the input and output modules to handle sports data are described herein. Also, certain computations designed specifically to process various sports data are described herein.
- the querying application of the present invention can provide professional or amateur soccer players, coaching personnel, fans and the like with the means to ask questions and obtain insightful information about soccer games. Accordingly, a logical model of the data collected for the soccer games (hereinafter referred to as the soccer data) is generated by treating the soccer data as a collection of touches and possessions and specifying attributes to describe these touches and possessions.
- a possession is defined as the activity occurring on the field between the time that a particular team gets the ball to the time that it loses the ball to the other team.
- the entire game is divided into a set of possessions, with one of the two teams controlling the ball during a possession.
- the attributes and their values for the various touches may include: "team- 1 -name” being the name of team 1; “team-2-name” being the name of team 2; “location-of-game” being the site of the soccer game; “match- score” having values such as 0-0, 0-1, 1-0, 2-0, 2-2, etc.; "player- who-has- ball” being the name of the player who has the ball; “game-period” having values such as first-half, second-half or overtime; “team-ahead” being the name of the team that is currently leading (or winning) in the game; “team-ahead-by” having values such as 1, 2, etc.; “how-player-receives- ball” having values such as pass, interception, steal, free-kick, throw-in, corner-kick, goal-kick, etc.; "first-touch- with” having values such as left- foot, right-foot, head, chest, etc.; “location-ball-received- vertical” having values such as defensive-zone,
- the attributes and their values for possessions may include: "team- 1-name” being the name of team 1; "team-2-name” being the name of team 2; "location-of-game” being the site of the soccer game; "match- score” having values such as 0-0, 0-1, 1-0, 2-0, 2-2, etc.; "players-who- touched-ball-at-least-once-in-possession” being the names of the players who touched the ball at least once in the possession; “game-period” having values such as first-half, second-half or overtime; “goal- difference” indicating whether the team is currently ahead, behind or tied in the game; “team-ahead-by” having values such as 1, 2, etc.; "how-team- began-possession” having values such as pass, interception, steal, free- kick, throw-in, corner-kick, goal-kick, etc.; "player-who-began- possession” being the name of the player that began the possession; “location-possession-began-vertical” having
- Computations specifically designed for the soccer application may include: “number-of-touches”, “number-of-touches-where-ball-turned-over”, “number-of-touches-where- scoring-opportunity-created", “percentage-of-touches-where-ball-turned- over”, “percentage-of-touches-where-scoring-opportunity-created", "number-of-touches-where-pass-completed”, “number-of-touches-where- pass-attempted”, “percentage-of-touches-where-pass-completed”,
- a user can issue a query to determine the percentage of possessions that led to scoring opportunities when Player 1 began the possession after an interception.
- the user can use the graphical user interfaces of the input module 101, as shown in Fig. 15, to select the computation percentage-of-possessions-where-scoring-opportunity-created, and the values player 1 for the attribute player-who-began-possession, interception for the attribute how-team-began-possession.
- the computation module 102 (Fig. 2) computes a result such as 25% (9 scoring opportunities created out of 36 possessions) for the user query based on the soccer data.
- a result such as 25% (9 scoring opportunities created out of 36 possessions) for the user query based on the soccer data.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest-valued and/or least-valued results for the computation percentage-of- possessions-where-scoring-opportunity-created.
- the querying application of the present invention alerts the user to related scenarios where the team had most (greatest-valued result) and/or least (least-valued result) scoring opportunities, thereby providing the user with a better understanding of team's performance.
- the output module 103 may inco ⁇ orate the graphical user interfaces, as shown in Fig. 16, for the soccer application.
- the output module 103 displays the query in a textual form in a text box 1601, and computation result in text box 1602.
- the output module 103 alerts the user to the hidden patterns in the soccer data using the "clickable" MAX and MIN buttons and “clickable” pictorial alerts as discussed herein with Fig. 14.
- the query application of the present invention can provide professional or amateur tennis players, coaching personnel, fans and the like with the means to ask questions and obtain insightful information about the game of tennis. Accordingly, a logical model of the data collected for tennis games (hereinafter referred to as the tennis data) is generated by treating the tennis data as a collection of shots and specifying attributes to describe these various shots.
- the tennis data a logical model of the data collected for tennis games
- the attributes and their values for the various shots may include: "player- 1 -name” being the name of player 1; "player-2-name” being the name of player 2; "location-of-game” being the site of the match; "set- number” having values such as 1, 2, 3, 4 or 5; “game-number-in-set” having values such as 1, 2, 3, 4, 5, 6, etc.; “match-score” having values such as 0-0, 1-0, 0-1, 2-0, 0-2, 1-1, 1-2, 1-3, 2-2, etc.; “set-score” having values such as 0-0, 1-0, 3-2, 5-5, etc.; “game-score” having values such as 0-0, 15-0, 30-30, 40-15, etc.; "server” being the name of the player who is serving in the point; “serve-from” having a left-side or right-side value ; “serve-number” having a 1 or 2 value ; “breaks” having values such as up, down or even; “number-of-
- Computations specially designed for the tennis application may include: “number-of-winners”, “number-of-unforced-errors”, “number-of-shots”, “percentage-of- winners”, “percentage-of-unforced-errors”, “number-of-winning-shots”, “number-of-losing-shots”, “percentage-of-winning-shots”, “percentage-of- losing-shots”, “difference-in-contributions-to-points-won-and-lost” (assumed that each shot in a point contributes equally to the outcome of the point).
- a user can issue a query to determine how player 1 performed against player 2 while hitting forehand shots in the second set when player 1 was up 1 break.
- the user can use the graphical user interfaces of the input module 101, as shown in Fig. 17, to select the computation difference-in-contributions-to-points-won-and-lost, and values player 1 for the attribute player- 1 -name, forehand for the attribute forehand-backhand, 2 for the attribute set-number, up for the attribute breaks, 1 for the attribute number-of-breaks-up.
- the input module 101 (Fig.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2)
- the user is alerted to a set of circumstances related to the user query having the greatest- valued and/or least- valued result for the computation difference-in-contributions-to-points-won-and-lost and.
- the querying application of the present invention alerts the user to related scenarios where player 1 had the best (greatest- valued) and/or worst (least- valued) performance, thereby providing the user with a better understanding of player 1 's performance.
- the output module 103 may inco ⁇ orate the graphical user interfaces, as shown in Fig. 18, for the tennis application.
- the output module 103 displays the query in a textual form in a text box 1801, and computation results in text box 1802.
- the output module 103 alerts the user to the hidden patterns in the tennis data using the "clickable" MAX and MIN buttons and "clickable” pictorial alerts as discussed herein with Fig. 14. Golf Application
- the querying application of the present invention can provide professional or amateur golfers, coaching personnel, fans, and the like with the means to ask questions and obtain insightful information about golf games. Accordingly, a logical model of the data collected for golf games (hereinafter referred to as the golf data) is generated by treating the golf data as a collection of shots on different holes on various golf courses and specifying attributes to describe these various shots and holes.
- the golf data a logical model of the data collected for golf games
- the attributes and their values for the various shots may include: "type-of-shot” having values such as putts, chips, drives, shots-from- fringe, long-iron-shots, etc.; "club-selection” having values such as driver, woods, long irons, short irons, pitching wedge, putter, etc.; "distance- from-hole” having values such as 5-10 feet, 20-25 feet, 100-150 yards, 300-350 yards, etc.; “location-of-shot” having values such as green, light rough, heavy rough, fringe, tee, fairway, etc.; “location-of-pin-horizontal” having values such as left, right, or center; “location-of-pin-vertical” having values such as top, bottom, or middle; “quality-of-previous-shot” having values such as good, moderate, or bad ; "previous-hole- performance” having values such as eagle, birdie, par, bogie, double- bogie, etc.; “club-off-tee”
- the attributes and their values for the various holes on a course may include: “par” having values such as 3, 4, or 5 ; "length-of-hole” having values such as 148 yards, 208 yards, 409 yards, etc.; “dog-leg-in- hole” having a yes or no value; "width-of-fairway” having a narrow or wide value ; “side-of-hole-with-trouble” having values such as left, right, or both; “number-of-bunkers-around-green” having values such as 0, 1, 2, etc.; “speed-of-green” having values such as 9, 10, 11, 12, etc.; “number- of-bunkers-around-green” having values such as 0, 1, 2, etc.; “number-of- fairway-bunkers” having values such as 0, 1, 2, etc.; “tiered-green” having a yes or no value; “side-with-out-of-bounds” having values such as left, right, both, or none; “elevated-tee” having a yes or
- computations specifically designed for the golf application may include: “number-of-shots-made”, “number-of-shots-attempted”, “percentage-of-shots-made”, “average- score-for-one-round-of-golf, “quality-of-shot”, “location-of-next-shot”, “lie-of-next-shot” (this determines the quality of the current shot), standard-golf-performance-metrics for players (e.g., putting average, driving distance, par breakers, birdie average, driving accuracy percentage, scoring average, putts/round, greens in regulation, sand save percentage, scrambling, etc.), and the like.
- a user can issue a query to determine a particular player's putting performance in the first round ofa specific tournament when the player was 5-10 feet away from the pin and the speed of the green was 10.
- the user can select the computation percentage-of- shots-made, and values 5-10 feet for the attribute distance-from-hole, putts for the attribute type-of-shot and 10 for the attribute speed-of-green.
- the input module 101 (Fig.
- the computation module 102 (Fig. 2) computes a result, such as 50% (4 shots made out of 8 shots attempted) for the user query based on the golf data.
- a result such as 50% (4 shots made out of 8 shots attempted) for the user query based on the golf data.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest-valued and/or least- valued result for the computation percentage- of-shots-made.
- the querying application of the present invention alerts the user to related scenarios where the player had the best (greatest-valued result) and/or worst (least-valued result) putting performance, thereby providing the user with a better understanding of the player's putting performance.
- the querying application of the present invention can provide professional or amateur basketball players, coaching personnel, fans, and the like with the means to ask questions and obtain insightful information about basketball games. Accordingly, a logical model of the data collected for basketball games (hereinafter referred to as the basketball data) is generated by treating the basketball data as a collection of possessions and specifying attributes to describe these various possessions. A possession for a particular team (i.e., team 1) is defined from the time team 1 gets the ball to the time that team 1 loses the ball to team 2.
- the attributes and their values for various possessions may include: "home-team-name” being the name of home team or team 1; "away-team-name” being the name of the visiting team or team 2; "location-of-game” being the site of the game; "team- 1 -point-guard” being the name of the player playing the point guard position for team 1 ; “team- 1 -shooting- guard” being the name of the player playing the shooting guard position for team 1; “team- 1 -small-forward” being the name of the player playing the small forward position for team 1; “team- 1 -power-forward” being the name of the player playing the power forward position for team 1 ; “team- 1 -center” being the name of the player playing the center position for team 1; “team-2-point-guard” being the name of the player playing the point guard position for team 2; “team-2-shooting-guard” being the name of the player playing the shooting guard position for team 2; “team-2-small- forward” being the name of the player playing the small forward position
- computations specifically designed for the basketball application may include: “number- of-shots-made”, “number-of-shots-attempted”, “percentage-of-shots- made”, “points-plus-minus” (the difference between the points scored by the two teams), “rebounding-plus-minus” (the difference between the rebounds collected by the two teams), standard-basketball-performance- metrics for players (e.g., number-of-minutes-played, number-of-offensive- rebounds, number-of-defensive-rebounds, total-number-of-rebounds, number-of-steals, number-of-turnovers, number-of-assists, number-of- free-throw-attempts, number-of-free-throw-made, free-throw-shooting- percentage, etc.), and the like.
- a user can issue a query to determine how team 1 with player 1 as the shooting guard performed against team 2 at home in the fourth quarter.
- the user can select the computation points-plus-minus, and values team 1 for the attribute home- team-name, team 2 for the attribute away-team-name, fourth quarter for the attribute period-of-game and player 1 for the attribute team-1- shooting-guard.
- the computation module 102 (Fig. 2) computes a result, such as -1 (team 1 scored 21 points and team 2 scored 22 points) for the user query based on the basketball data.
- a result such as -1 (team 1 scored 21 points and team 2 scored 22 points) for the user query based on the basketball data.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest- valued and/or least-valued result for the computation points-plus- minus.
- the querying application of the present invention alerts the user to related scenarios where team 1 had the best (greatest- valued result) and/or worst (least- valued result) point plus minus, thereby providing the user with a better understanding of the team's performance.
- the querying application of the present invention can provide professional or amateur baseball players, coaching personnel, fans, and the like with the means to ask questions and obtain insightful information about baseball games. Accordingly, a logical model of the data collected for the baseball games (hereinafter referred to as the baseball data) is generated by treating the baseball data as a collection of pitches and specifying attributes to describe these various pitches.
- the attributes and their values for the various pitches may include: "team-1-name” being the name of team 1; “team-2-name” being the name of team 2; "location-of-game” being the site of the match; “side-batting” being the name of the team that is batting; “team-ahead” being the name of the team that is currently leading the game; “pitcher-name” being the name of the player pitching, i.e., the pitcher; “batter-name” being the name of the player batting during this pitch; “game-score” having values such as 0-0, 1-0, 0-1, 2-0, 0-2, 1-1, 1-2, 1-3, 2-2, 7-3, etc.; innings having values such as 1, 2, 3, 8, 9, etc.; "pitching-side” having a left-handed or right-handed value ; “batter-side” having a left-handed or right-handed value ; “number-of-outs” having values such as 0, 1, 2 or 3; “number-of- strikes” having values such as
- computations specifically designed for the baseball application may include: “number- of-pitches”, “number-of-strikes”, “number-of-balls”, “number-of-hits”, “percentage-of-strikes”, “percentage-of-balls”, “percentage-of-hits”, “impact-of-pitch”, “type-of-pitch”, standard-baseball-performance-metrics for batters (e.g., runs, hits, runs batted in, walks, strikeouts, stolen bases, home runs, etc.), standard-baseball-performance-metrics for pitchers (e.g., innings pitched, runs, earned runs, walks, strikeouts, home runs allowed, pitches, ground balls/fly balls ratio, strikes/pitches ratio, etc.) and standard-baseball-performance-metrics for the team (e.g., runs scored, runs allowed, hits,
- a user can issue a query to determine the different types of pitches that pitcher 1 throws when he is behind in the count while his team is leading.
- the user can select the computation type-of-pitch and values pitcher 1 for the attribute pitcher-name, behind for the attribute pitcher-count, and team 1 for the attribute team-ahead.
- the computation module 102 (Fig. 2) computes a result, such as a set of numbers (one for each type of pitch that the pitcher 1 has thrown): fastballs-75% (60 fastballs out of a total of 80 pitches), curveballs-10% (8 out of 80), sliders- 5% (4 out of 80) and changeups-10% (8 out of 80) for the user query based on the baseball data.
- a result such as a set of numbers (one for each type of pitch that the pitcher 1 has thrown): fastballs-75% (60 fastballs out of a total of 80 pitches), curveballs-10% (8 out of 80), sliders- 5% (4 out of 80) and changeups-10% (8 out of 80) for the user query based on the baseball data.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest-valued and/or least-valued result for the computation type-of-pitch.
- the querying application of the present invention alerts the user to related scenarios where the pitcher 1 has relied more (greatest- valued result) and/or less (least- valued result) on different types of pitches, thereby providing the user with a better understanding of the pitcher's pitching performance, i.e., pitch selection.
- the querying application of the present invention can provide professional or amateur football players, coaching personnel, fans, and the like with the means to ask questions and to obtain insightful information about football games. Accordingly, a logical model of the data collected for football games (hereinafter refe ⁇ ed to as football data) is generated by treating the football data as a collection of plays and specifying attributes to describe these various plays.
- football data a logical model of the data collected for football games
- the attributes and their values for the various plays may include: "team- 1 -name” being the name of team 1 ; “team-2-name” being the name of team 2; “location-of-game” being the site of the match; “day-of-week” having values such as Sunday, monday, etc.; “type-of-surface” having values such as grass, turf, etc.; “game-period” having values such as first- quarter, second-quarter, third-quarter, fourth-quarter or overtime; “side- with-possession” being the name of the team having possession of the ball for a particular play; “game-score” having values such as 0-3, 14-7, 21-3, 21-14, etc.; “team-ahead” being the name of the team that is currently leading in the game; “team-ahead-by” having values such as 3, 7, 10, etc.; “quarterback-name” being the name of the quarterback; “running-back- name” being the name of the running back for this game; “down” having values such as 1, 2, 3, or 4; "
- computations specifically designed for the football application may include: “number- of-plays”, “average-yards-gained-per-play”, “average-yards-gained-per- passing-play”, “average-yards-gained-per-running-play”, “number-of- carries”, “percentage-of-first-down-completion”, “percentage-of-third- down-conversion”, “percentage-of-plays-turned-over”, pass-play- outcome", standard-football-performance-metrics (e.g., number-of-touch- downs, number-of-fumbles, number-of-interceptions, number-of-carries, number-of-sacks, number-of-yards-in-offense, etc.), and the like.
- standard-football-performance-metrics e.g., number-of-touch- downs, number-of-fumbles, number-of-interceptions, number-of-carries, number-of-sacks, number-of
- the users are now able to use the querying application of the present invention to find hidden patterns in the football data by employing various football-specific computations defined herein.
- a user can issue a query to determine the average yards gained on passing plays from the shotgun formation by team 1 playing on turf.
- the user can select the computation average-yards-gained-per-passing-play, and values team 1 for the attribute team- 1 -name, yes for the attribute shotgun, pass for the attribute play-type, and turf for the attribute type-of-surface.
- the input module 101 (Fig.
- the computation module 102 (Fig. 2) computes a result, such as 7.2 (144 yards gained on 20 passing plays) for the user query based on the football data.
- a result such as 7.2 (144 yards gained on 20 passing plays) for the user query based on the football data.
- the input module 102 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest-valued or least- valued result for the computation average-yards- gained-per-passing-play.
- the querying application of the present invention alerts the user to related scenarios where the team had the best (greatest- valued) and/or worst (least-valued) passing performance.
- the querying application of the present invention can provide professional or amateur cricket players, coaching personnel, fans, and the like with the means to ask questions and obtain insightful information about cricket games. Accordingly, a logical model of the data collected for cricket games (hereinafter referred to as the cricket data) is generated by treating the cricket data as a collection of matches and balls, and specifying attributes to describe these various matches and balls.
- the cricket data a logical model of the data collected for cricket games
- the attributes and their values for the various matches may include: "team- 1 -name” being the name of team 1; "team-2-name” being the name of team 2; "location-of-match” (site of the match) having values such as home, away or neutral; "month-of-match” having values such as January, February, etc.; “tournament-match” having a yes or no value; "name-of-tournament” being the name of the tournament if the match was played as part of a tournament , if any ; “pitch-conditions” (nature of the pitch) having values such as slow, fast, wet, dry, etc.; “weather-conditions having values such as windy, wet, sunny, etc.; "team- winning-toss” being the name of the team that won the toss; "team-batting- first” being the name of the team that batted first; “team 1 -runs-made” indicating the number of runs made by the first team; “team 1 -wickets- lost
- the attributes and their values for the various balls may include: "batsman-name” being the name of the batsman; “batsman-hand” (hand used to bat) having a right or left value ; “type-of-batsman” having values such as opener, middle-order, or tail-ender; "non-striker-name” being the name of the non-striker; “non-striker-hand” (hand used to bat) having a right or left value ; “type-of-non-striker” having values such as opener, middle-order, or tail-ender; "runner-name” being the name of the runner, if any ; "runner-hand” (hand used to bat) having a right or left value ; “bowler-name” being the name of the bowler; “bowler-hand” (hand used to bowl) having a right or left value ; “type-of-bowler” having values such as fast, fast-medium, medium, spin, etc.; "number-of-fielders
- a user can issue a query to determine the winning percentage of team 1 against team 2. For such winning percentage query, the user can select the computation win-percentage, and values team 1 for the attribute team-1- name and team 2 for the attribute team-2-name.
- the computation module 102 (Fig. 2) computes a result, such as 25.0 (50 wins out of 200 matches) for the user query based on the cricket data.
- a result such as 25.0 (50 wins out of 200 matches) for the user query based on the cricket data.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having greatest-valued or least- valued result for the computation win-percentage.
- the querying application of the present invention alerts the user to related scenarios where the team 1 had the highest (greatest- valued result) and lowest (least- valued result) winning percentage against team 2, thereby providing the user with a better understanding of the team 1 's performance against team 2.
- a user can issue a query to determine player 1 's contribution to the required runs per over when ball type is a yorker. For such contribution query, the user can select the computation contribution- to-required-runs-per-over, and values playerl for the attribute batsman- name and yorker for the attribute type-of-ball.
- the computation module 102 (Fig. 2) computes a result, such as -0.23 for the user query based on the cricket data. Since the input module 101 (Fig. 2) also provides the user query to the alert generator 201, the user is alerted to a set of circumstances related to the user query having the greatest- valued or least- valued result for the computation contribution-to-required-runs-per-over. In other words, the querying application of the present invention alerts the user to related scenarios where player 1 contributed most (greatest- valued result) and least (least- valued result) to the required number of runs per over, thereby providing the user with a better understanding of player 1 's performance. For example, the user would be alerted to the result of the following related query: i) Least- valued result: -1.2 for the modified query
- the querying application of the present invention can be used in numerous banking applications to query and obtain insightful information from the banking data.
- the querying application of the present invention can provide investment bankers with the means to ask questions and obtain insightful information about companies raising money in the market though various instruments such as bonds, private placements, etc.
- a logical model of the data collected for the banking application (hereinafter referred to as the banking data) is generated by treating the banking data as a collection of deals and specifying attributes to describe these various deals.
- the attributes and their values for the various deals may include: "product” indicating the type of instrument being used, and having values such as private placement, non-convertible bond, etc.; “sales-segment” indicating the relative performance of the issuing company in terms of sales, and having values such as excellent, good, etc.; “growth-in-sales” indicating the relative performance of the issuing company in terms of growth in sales, and having values such as excellent, good, etc.; “industry” indicating the industry to which the issuing company belongs to, and having values such as electronics, construction, etc.; “captive-finance- company” indicating whether the issuing company is a financial arm of a larger company, and having a yes or no value; “customer-status” indicating whether the issuing company is an existing customer of the bank, and having a yes or no value; “customer-priority” indicating the priority of the issuing company as a customer of the bank, and having values not-a-customer, high, low, etc.
- computations specifically designed for the banking application may include: “number- of-deals”, “amount-raised”, “percentage-of-deals”, “percentage-of- amount-raised”, and the like.
- a user can issue a query to determine the percentage of deals in the electronics industry for which bankl was the lead manager. For such deal percentage query, the user can select the computation percentage-of-deals and values bankl for the attribute lead-manager and electronics for the attribute industry.
- the computation module 102 (Fig. 2) computes a result, such as 5.8% (58 deals from a total of 1000) for the user query based on the banking data.
- a result such as 5.8% (58 deals from a total of 1000) for the user query based on the banking data.
- the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest-valued or least- valued result for the computation percentage-of- deals.
- the querying application of the present invention alerts the user to related scenarios where bankl had the highest (greatest- valued result) and/or the lowest (least- valued result) percentage of deals, thereby providing the user with a better understanding of the bank's performance in winning the deals.
- a call center is a place where customers of a company call in to get support and information about products, report problems and get solutions for their problems. Call centers are also used for direct marketing to end- users.
- call centers have been automated by integrating the telephony systems with computer hardware and software to assist in managing the call center effectively and efficiently.
- data is collected about the customers, agents, nature of the call, and outcome of the calls.
- the querying application of the present invention can be used in numerous call center management applications to query and obtain insightful information from the call center data.
- the querying application of the present invention can provide call center managers and personnel with the means to ask questions and obtain insightful information about the call center calls.
- a logical model of the data collected for the call center management application (hereinafter referred to as the call center data) is generated by treating the call center data as a collection of calls between agents and customers and specifying attributes to describe these various calls, agents and customers.
- the attributes and their values for the various calls may include: "customer-telephone-number” having values such as 910-761-2380, 715- 656-2278, etc.; "id-of-first-agent-on-call” indicating an uniquely assigned id number for the first agent on the call; "incoming-outgoing” having an outgoing or incoming value; "time-of-call” having values such as 12:30 AM, 4:30 PM, etc.; “call-about-which-product-or-service” having values such as product X, product Y, service X, etc.; “how-product-or-service- advertised” having values such as TV, Internet, newspaper, etc.; “switch- that-call-went-through” having values such as switch 12, switch 35, etc.; “agent-ids-of-all-agents-on-call” indicating the agent ids of all the agents on the call; "number-of-transfer-in-call” having values such as 0, 1, 2, etc.; "num
- the attributes and their values for the various agents may include: "agent-id-number” having values such as 016657, 200765, etc.; “agent-skill” indicating the skill of the agent; “agent-experience” indicating the experience of the agent; “agent-training” indicating the level of agent's experience; “agent-gender” indicating the gender of the agent; “agent-age” having values such as 45, 28, etc.; and so on.
- the attributes and their values for the various customers may include: "customer-gender” having values such as male or female; “customer-age” having have values such as 45, 28, etc.; “customer- income” having values such as $35,000 - $45,000 per year, >$100,000 per year., etc.; “customer-duration” having values such as 1 year, 2 years, etc.; and so on.
- computations specifically designed for the call center management application may include: “number-of-calls”, “average-call-duration”, “difference-in-average-call-duration-from-overall-average”, “average- number-of-transfers", “average-number-of-holds”, “average-number-of- agents-on-call", “number-of-satisfied-customers”, “number-of-successful- sale-calls", “average-number-of-calls-taken-by-an-agent”, “average- duration-of-calls-taken-by-an-agent”, “number-of-calls-transferred-by-an- agent”, and the like.
- a user can issue a query to determine the average call duration for all incoming calls occurring early in the morning. For such average call duration query, the user can select the computation average-call- duration, and values incoming for the attribute incoming-outgoing and early-morning for the attribute time-of-call.
- the computation module 102 (Fig. 2) computes a result, such as
- the input module 101 since the input module 101 also provides the user query to the alert generator 201 (Fig. 2), the user is alerted to a set of circumstances related to the user query having the greatest- valued and/or least- valued result for the computation average-call-duration.
- the querying application of the present invention alerts the user to related scenarios having the longest (greatest- valued result) and/or the shortest (least-valued result) average call duration, thereby providing the user with a better understanding of the average call duration.
- the user can select the computation difference-in-average-call-duration-from-overall- average, and values 016657 for the attribute agent-id-number and product X for the attribute call-about-which-product-or-service.
- the computation module 102 (Fig. 2) computes a result, such as 200.1 seconds (the average call duration to agent 016657 being 1209.6 seconds and the average call duration for all agents being 1009.5 seconds) for the user query based on the call center data. Since the input module 101 also provides the user query to the alert generator 201, the user is alerted to a set of circumstances related to the user query having the greatest-valued and/or least-valued result for the computation difference- in-average-call-duration- firorn-overall-average.
- the querying application of the present invention alerts the user to related scenarios where the agent had the largest (greatest-valued result) and/or the smallest (least-valued result) difference in the average call duration from the overall average for all agents, thereby providing the user with a better understanding of the agent's performance.
- Customer relationship management is very critical to the operation of any business. An important area of customer relationship management is understanding and managing customer churn.
- the querying application of the present invention can be used in numerous customer relationship management applications to query and obtain insightful information from customer relationship management data.
- the querying application of the present invention can provide customer relationship managers and personnel with the means to ask questions and obtain insightful information about customer relationships.
- a logical model of the data collected for customer relationship management application (hereinafter referred to as the customer relationship management data) is generated by treating the data as a collection of customers and by specifying attributes to describe the customers.
- the attributes and their values for customers may include: "customer-gender” having a male or female value; “customer-age” having values such as 45, 28, etc. ; “customer-income” having values such as $35,000 - $45,000 per year, >$100,000 per year, etc. ; “customer-duration” having values such as ⁇ 1 year, 2 years, etc. ; “customer-marital-status” having values such as single, married, divorced, etc. ; “customer-of-which- product-or-service” having values such as Product X, Product Y, Service X, etc. ; “how-product-or-service-advertised” having values such as TV, Internet, newspaper, etc.
- customer-satisfaction-at-time-of-sale having values such as satisfied, very satisfied, dissatisfied, very dissatisfied, etc.
- customer-satisfaction-at-time-of-churn having values such as satisfied, very satisfied, dissatisfied, very dissatisfied, etc.
- duration-of-customer having values such as ⁇ lyear, 1-2 years, 2-5 years, etc.
- chlorurn-date having values such as Dec 19 th 1998, etc.
- reason-of-churn having values such as unhappy-with-product, unhappy-with-service, found-better- product, etc.
- computations specially designed for the customer relationship management application may include : "number-of-customers”, “number- of-customers-that-churn”, “number-of-customers-that-are-retained”, “number-of-high-risk-customers”, “churn-percentage”, and the like.
- the users are now able to use the querying application of the present invention to find hidden patterns in the customer relationship management data by employing various customer relationship management-specific computations defined herein. For example, a user can issue a query to determine the percentage of customers that churned in the fourth quarter of 1998.
- the user can select the computation churn-percentage, and values 4Q 1998 for the attribute churn-date.
- the computation module 102 (Fig. 2) computes a result such as 9% for the user query based on the customer relationship management data.
- the input module 101 (Fig. 2) also provides the user query to the alert generator 201 (Fig.
- the user is alerted to a set of circumstances related to the user query having the greatest- values and/or least-valued result for the computation churn- percentage.
- the querying application of the present invention alerts the user to related scenarios having the highest (greatest- valued result) and/or lowest (least-values result) churn-percentage, thereby providing the user with a better understanding of the customer churn.
- the user may be alerted to the result of the following related query: i) Greatest- valued result : 18% for the modified query
- the querying application of the present invention can be inco ⁇ orated into a multi-user environment, such as a communication network of computers, an intranet, or the Internet (world-wide-web).
- a multi-user environment such as a communication network of computers, an intranet, or the Internet (world-wide-web).
- Fig. 19 there is illustrated a multi-user environment inco ⁇ orating the querying application of the present invention.
- elements shown in Fig. 19 corresponding to those shown in Fig. 2 are denoted by the same reference numerals and their descriptions are omitted.
- the querying application of Fig. 19 replaces the input module 101 and the output module 103 of the querying application of Fig. 2 with multiple input modules 1902 and multiple output modules 1903.
- the alert generator 201 is now connected to the input and output modules via a communications network 1901, such as a computer network.
- the alert generator 201 of Fig. 19 receives a plurality of different queries from the multiple input modules 1902 (i.e., multiple users) via the communications network 1901.
- the alert generator 201 accesses the computation module 102, and generates alert notifications 202 and alert data 203 for each output module 1903.
- the alert generator 201 transmits (or sends) the appropriate alert notifications 202 and the alert data 203 to each output module 1903 over the communications network 1901.
- web browsers can serve as the input and output modules, and the alert generator 201 and the computation module 102 can reside on a server (i.e., an Internet server).
- the communications network 1901 connecting the web browsers to the server can be the Internet, a company intranet, a local-area network, a wide-area network and the like.
- the querying application of the present invention can be used as a multimedia querying application.
- the multimedia querying application retrieves multimedia objects, such as images, video clips, and audio clips.
- the multimedia objects are generally annotated and stored in a multimedia data repository.
- the annotation can be done in one of two ways.
- a list of pre-specified attributes each having a finite set of pre-specified values, can be used to classify the multimedia objects.
- a textual description is used to label the object.
- the conventional multimedia database generally supports both types of annotations.
- the annotations can be manually generated, e.g., a person looks at an image that is a picture of a mountain and types in the annotation: mountain or selects the value mountain for a suitable attribute such as topography.
- the annotations can be automatically generated, e.g., an image analysis algorithm processes the image and classifies the image as a mountain.
- a multimedia querying application can be used to retrieve the multimedia object.
- a querying application allows the user to specify the objects in one of two ways.
- the user selects the values of attributes of interest and the querying application searches the annotations of the multimedia objects in its repository to retrieve all objects that have an annotation matching the user's requirements. This is equivalent to a user issuing a query in the traditional (non-multimedia) querying application in Fig. 1.
- the user types in keywords of interest and the querying application searches the annotations, including textual descriptions, and retrieves the multimedia objects having the keywords contained in their annotations.
- the manner in which the textual annotations can be represented by an attribute-value classification scheme is described herein.
- the first step is to remove the common, non-informational words that occur frequently in a textual description in a natural language such as English. Examples include such words as a, the, is, that, etc. Those skilled in the art appreciate that exhaustive lists of such words have already been compiled and are available. Once these words have been removed, the remaining words are added to a list of unique words for every annotation in the multimedia repository. Finally, every unique word in this list is made into an attribute that can have two possible values, present or absent. All attributes are used to classify a textual description, the value for a given attribute being present, if the corresponding word is present in the textual description, else the value is absent. One skilled in the art will be able to implement such a method for textual descriptions. Note that method could also be applied to a collection of documents, each document corresponding to a textual description.
- the input module 101 can support the user interface for both traditional and multimedia querying application.
- the former outputs a numeric result
- the multimedia querying application retrieves a multimedia object.
- a common starting point is the attribute- value string that is selected by the user.
- at least one computation must be defined for the multimedia application such that the computation maps the strings to a numeric result that will be useful to the user.
- the multimedia querying application can be modified to execute that computation for the user's query string and to output the numeric result (with associated alerts) to the user along with the multimedia objects that are retrieved.
- the count or percentage of multimedia objects that have a particular value For example, the count or percentage of multimedia objects that have a particular value. Or, the difference in count between a particular value and all other values of the same attribute, when such a difference is computed for the subset of objects that have the user's query string.
- the latter scheme will be particularly useful when specific attributes are known to represent outcomes of interest.
- our invention can be used to generate alerts in the manner described in the present invention. Consequently, our invention can be embedded easily in a multimedia querying application.
- the apparatus of Fig. 2 is simply added to the apparatus of the multimedia querying application. The two apparatus can share the same Input Module 101 and the Output Module 103 is inco ⁇ orated within the output mechanisms of the multimedia querying application.
- the present invention can also be used as is within a real-time environment.
- the underlying data change, but that does not change the fundamentals of our invention. All that is happening is that the computations in the Computation Module 102 change over time. At any given point in time, the situation can be frozen, i.e., the state of the Computation Module 102 at that point in time is used.
- One skilled in the art will easily be able to define such a point in time in the context of the querying application in Fig. 1, e.g., the point in time when the user issues a query. It follows that our invention can easily be applied in a real-time environment.
- a real-time environment e.g, a military intelligence environment in which threats are being evaluated by inputting the latest information received in the form of a user's query to a querying application.
- the querying application outputs a number between 0 and 1, with 0 representing a definite threat and 1 representing a non-threat.
- the MAXINT string and/or MININT string do not overlap directly with the attribute-value string of the query but instead can be traced from the user's query.
- strings having the greatest- valued and least-valued results respectively, can reflect the strongest measures in the data that underlie the querying application. Consequently, if the user's query does not overlap directly with those strings, it could indicate that the user may have hit upon a weakness in the underlying data to respond to the current query, a fact that may be relevant when evaluating threats.
- This invention can be used to introduce data mining functionality in database systems and database applications such as data marts and data warehousing systems.
- any querying mechanism such as a Graphical User Interface, HTML forms on a web browser, etc.
- any reporting mechanism HTTP pages, Graphical interface, textual reports, etc.
- Computations required could either be performed by the database system or application on the fly, or the results could be looked up from previously computed results.
- This invention provides a method for linking the two classes of products together by enabling the output of OLAP query tools to be fed into data mining suites to extract meaningful information along the user's line of questioning.
- the interface provided by the OLAP Query tool could function as the Input Module.
- the OLAP tool could itself provide the Computation Module.
- the Data Mining suite provides the search space for the Alert Generator.
- the OLAP reporting tools provide the Output Module for receiving the alerts.
- Agent An agent is a computer program that monitors the activity of another computer program.
- This invention can be used in the context of an agent provided that the inputs received by the computer program that is being monitored can be represented in terms of attribute- valued strings and the outputs of that program can be represented as real numbers.
- the monitored program is the querying application and the user is not a human but instead a user process represented by the agent.
- contents of the databases may be stored in a single database or stored in memory or the like. It is intended that the appended claims be inte ⁇ reted as including the embodiments discussed above, those various alternatives which have been described and all equivalents thereto.
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU18075/00A AU768745B2 (en) | 1998-10-13 | 1999-10-12 | Method and apparatus for finding hidden patterns in the context of querying applications |
CA002344818A CA2344818A1 (en) | 1998-10-13 | 1999-10-12 | Finding querying patterns in querying applications |
EP99961517A EP1129393A4 (en) | 1998-10-13 | 1999-10-12 | Method and apparatus for finding hidden patterns in the context of querying applications |
JP2000576332A JP2002527805A (en) | 1998-10-13 | 1999-10-12 | Method and apparatus for finding hidden patterns in the context of a query application |
IL14201399A IL142013A0 (en) | 1998-10-13 | 1999-10-12 | Method and apparatus for finding hidden patterns in the context of querying applications |
NZ510844A NZ510844A (en) | 1998-10-13 | 1999-10-12 | Method and apparatus for finding hidden patterns in the context of querying applications |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US10394898P | 1998-10-13 | 1998-10-13 | |
US60/103,948 | 1998-10-13 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2000022493A2 true WO2000022493A2 (en) | 2000-04-20 |
WO2000022493A3 WO2000022493A3 (en) | 2000-08-31 |
Family
ID=22297856
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US1999/024029 WO2000022493A2 (en) | 1998-10-13 | 1999-10-12 | Finding querying patterns in querying applications |
Country Status (8)
Country | Link |
---|---|
EP (1) | EP1129393A4 (en) |
JP (1) | JP2002527805A (en) |
AU (1) | AU768745B2 (en) |
CA (1) | CA2344818A1 (en) |
IL (1) | IL142013A0 (en) |
NZ (1) | NZ510844A (en) |
WO (1) | WO2000022493A2 (en) |
ZA (1) | ZA200102556B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6947929B2 (en) | 2002-05-10 | 2005-09-20 | International Business Machines Corporation | Systems, methods and computer program products to determine useful relationships and dimensions of a database |
US7447687B2 (en) | 2002-05-10 | 2008-11-04 | International Business Machines Corporation | Methods to browse database query information |
US20200387990A1 (en) * | 2017-12-08 | 2020-12-10 | Real Estate Equity Exchange Inc. | Systems and methods for performing automated feedback on potential real estate transactions |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5737734A (en) * | 1995-09-15 | 1998-04-07 | Infonautics Corporation | Query word relevance adjustment in a search of an information retrieval system |
US5802515A (en) * | 1996-06-11 | 1998-09-01 | Massachusetts Institute Of Technology | Randomized query generation and document relevance ranking for robust information retrieval from a database |
US6006225A (en) * | 1998-06-15 | 1999-12-21 | Amazon.Com | Refining search queries by the suggestion of correlated terms from prior searches |
US6012053A (en) * | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4490811A (en) * | 1979-03-14 | 1984-12-25 | Yianilos Peter N | String comparator device system circuit and method |
-
1999
- 1999-10-12 AU AU18075/00A patent/AU768745B2/en not_active Ceased
- 1999-10-12 IL IL14201399A patent/IL142013A0/en unknown
- 1999-10-12 NZ NZ510844A patent/NZ510844A/en unknown
- 1999-10-12 CA CA002344818A patent/CA2344818A1/en not_active Abandoned
- 1999-10-12 JP JP2000576332A patent/JP2002527805A/en active Pending
- 1999-10-12 WO PCT/US1999/024029 patent/WO2000022493A2/en not_active Application Discontinuation
- 1999-10-12 EP EP99961517A patent/EP1129393A4/en not_active Withdrawn
-
2001
- 2001-03-28 ZA ZA200102556A patent/ZA200102556B/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5737734A (en) * | 1995-09-15 | 1998-04-07 | Infonautics Corporation | Query word relevance adjustment in a search of an information retrieval system |
US5802515A (en) * | 1996-06-11 | 1998-09-01 | Massachusetts Institute Of Technology | Randomized query generation and document relevance ranking for robust information retrieval from a database |
US6012053A (en) * | 1997-06-23 | 2000-01-04 | Lycos, Inc. | Computer system with user-controlled relevance ranking of search results |
US6006225A (en) * | 1998-06-15 | 1999-12-21 | Amazon.Com | Refining search queries by the suggestion of correlated terms from prior searches |
Non-Patent Citations (1)
Title |
---|
See also references of EP1129393A2 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6947929B2 (en) | 2002-05-10 | 2005-09-20 | International Business Machines Corporation | Systems, methods and computer program products to determine useful relationships and dimensions of a database |
US7447687B2 (en) | 2002-05-10 | 2008-11-04 | International Business Machines Corporation | Methods to browse database query information |
US20200387990A1 (en) * | 2017-12-08 | 2020-12-10 | Real Estate Equity Exchange Inc. | Systems and methods for performing automated feedback on potential real estate transactions |
Also Published As
Publication number | Publication date |
---|---|
CA2344818A1 (en) | 2000-04-20 |
AU1807500A (en) | 2000-05-01 |
IL142013A0 (en) | 2002-03-10 |
WO2000022493A3 (en) | 2000-08-31 |
EP1129393A4 (en) | 2006-06-21 |
EP1129393A2 (en) | 2001-09-05 |
NZ510844A (en) | 2003-11-28 |
ZA200102556B (en) | 2002-07-01 |
JP2002527805A (en) | 2002-08-27 |
AU768745B2 (en) | 2004-01-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7110998B1 (en) | Method and apparatus for finding hidden patterns in the context of querying applications | |
US8117223B2 (en) | Integrating external related phrase information into a phrase-based indexing information retrieval system | |
US8589398B2 (en) | Search clustering | |
US8065299B2 (en) | Methods and systems for providing a response to a query | |
US6263327B1 (en) | Finding collective baskets and inference rules for internet mining | |
US7739274B2 (en) | Methods and systems for providing a response to a query | |
US20100076979A1 (en) | Performing search query dimensional analysis on heterogeneous structured data based on relative density | |
US20130006958A1 (en) | Automatic Identification of Related Search Keywords | |
US20070112740A1 (en) | Result-based triggering for presentation of online content | |
US20130138590A1 (en) | System for planning, managing, and analyzing sports teams and events | |
CN100416556C (en) | Information storage and research | |
AU768745B2 (en) | Method and apparatus for finding hidden patterns in the context of querying applications | |
CN107291930A (en) | The computational methods of weight number | |
Wenninger et al. | Data mining in elite beach volleyball–detecting tactical patterns using market basket analysis | |
US11724171B2 (en) | Reducing human interactions in game annotation | |
Ali et al. | Content and link-structure perspective of ranking webpages: A review | |
Jain et al. | Join Optimization of Information Extraction Output: Quality Matters! | |
Kashyap et al. | Analysis of Pattern Identification Using Graph Database for Fraud Detection | |
Huilin et al. | Efficiently crawling strategy for focused searching engine | |
TWM623755U (en) | System for generating creative materials | |
Liu et al. | Using video analysis and artificial neural network to explore association rules and influence scenarios in elite table tennis matches | |
Khan et al. | A framework for effective annotation of information from closed captions using ontologies | |
KR20040091397A (en) | A method of providing encycropedia information through internet and a system thereof | |
Alves et al. | A Heuristic-Regression Approach to Crawler Pattern Identification on Clickstream Data | |
Sivakumar et al. | Developing Multiple Sub Functions Per Association Function For Data Mining System |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
ENP | Entry into the national phase |
Ref document number: 2000 18075 Country of ref document: AU Kind code of ref document: A |
|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
AK | Designated states |
Kind code of ref document: A3 Designated state(s): AE AL AM AT AU AZ BA BB BG BR BY CA CH CN CU CZ DE DK EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MD MG MK MN MW MX NO NZ PL PT RO RU SD SE SG SI SK SL TJ TM TR TT UA UG UZ VN YU ZA ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A3 Designated state(s): GH GM KE LS MW SD SL SZ TZ UG ZW AM AZ BY KG KZ MD RU TJ TM AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE BF BJ CF CG CI CM GA GN GW ML MR NE SN TD TG |
|
WWE | Wipo information: entry into national phase |
Ref document number: 142013 Country of ref document: IL Ref document number: IN/PCT/2001/00219/DE Country of ref document: IN |
|
ENP | Entry into the national phase |
Ref document number: 2344818 Country of ref document: CA Ref document number: 2344818 Country of ref document: CA Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2001/02556 Country of ref document: ZA Ref document number: 200102556 Country of ref document: ZA |
|
WWE | Wipo information: entry into national phase |
Ref document number: 510844 Country of ref document: NZ |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18075/00 Country of ref document: AU |
|
ENP | Entry into the national phase |
Ref document number: 2000 576332 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1999961517 Country of ref document: EP |
|
REG | Reference to national code |
Ref country code: DE Ref legal event code: 8642 |
|
WWP | Wipo information: published in national office |
Ref document number: 1999961517 Country of ref document: EP |
|
WWG | Wipo information: grant in national office |
Ref document number: 18075/00 Country of ref document: AU |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: 1999961517 Country of ref document: EP |