WO2003021477A2 - A sampling approach for data mining of association rules - Google Patents
A sampling approach for data mining of association rules Download PDFInfo
- Publication number
- WO2003021477A2 WO2003021477A2 PCT/EP2002/008335 EP0208335W WO03021477A2 WO 2003021477 A2 WO2003021477 A2 WO 2003021477A2 EP 0208335 W EP0208335 W EP 0208335W WO 03021477 A2 WO03021477 A2 WO 03021477A2
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- association rules
- sample
- transactions
- multitude
- sample size
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/18—Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99931—Database or file accessing
- Y10S707/99933—Query processing, i.e. searching
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10—TECHNICAL SUBJECTS COVERED BY FORMER USPC
- Y10S—TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y10S707/00—Data processing: database and file management or data structures
- Y10S707/99941—Database schema or data structure
- Y10S707/99942—Manipulating data structure, e.g. compression, compaction, compilation
Definitions
- the present invention relates generally to a method, system and program product for uncovering relationships or association rules between items in large databases .
- Data mining is an emerging technical area, whose goal is to extract significant patterns or interesting rules from large databases; in general the area of data mining comprises all methods which are applicable to extract "knowledge” from large amounts of existing data. The whole process is known as knowledge discovery in databases. Finding association rules is one task for which data mining methods have been developed for.
- Association rule mining has been introduced by Agrawal et al . (refer for instance to R. Agrawal and R. Srikant, Fast algorithms for mining association rules, in Proc . 20th V DB Conf . , Sept. 1994.) and was motivated by shopping basket analysis. The rules were generated to find out which articles or items in a shop are bought together. To be more general association rules can be used to discover dependencies among attribute values of records in a database. Even further specific basket data usually consists of a record per customer with a transaction date, along with items bought by the customer. An example of an association rule- over such a database could be that 80% of the customers that bought bread and milk, also bought eggs. The data mining task for association rules can be broken into two steps .
- the first step consists of finding all the sets of items, called as itemsets, that occur in the database with a certain user-specified frequency, called minimum support. Such itemsets are called large itemsets. An itemset of k items is called a k-itemset.
- the second step consists of forming implication rules among the large itemsets found in the first step.
- association rules have been developed to generate efficiently association rules.
- the well known and very successful APRIORI algorithm has been disclosed by Agrawal et al . for instance in above mentioned document.
- the most important value with which association rules are measured is the support value which is the relative frequency of occurrence of one item or several items together in one rule.
- association rules Today generating association rules in case of very large data bases (number of entries several million records and above) can be extremely time consuming. Many algorithms proposed for data mining of association rules make repeated passes over the database to determine the commonly occurring itemsets (or set of items) . For large databases, the I/O overhead in scanning the database can be extremely high. This processing time is not only required for executing the mining algorithms themselves . A lot of time is also spent during- the preprocessing steps. This includes the processing time for import of data and also processing time for transforming data for applying the algorithm. This preparation can take several hours of expensive CPU-time even in case of large MVS-systems. To improve this performance equation it has been suggested instead of taking the whole database for the generation of association rules just to draw a sample and generate the association rules on that basis .
- Toivonen et al stated an algorithm for detecting "exact" (not being based on some sample) association rules. Within this teaching sampling has been used only for the precalculation of the support values of the rules as one step in the algorithm; Toivonen et al . are completely mute about the idea of data mining for "estimated” (approximate) association rules based on some sample. Toivonen et al . also disclosed necessary bounds for sample sizes. Using an univariate approach the • support value of an arbitrary association rule has been estimated. Toivonen et al . calculated the probability that an error between the true support value and the estimated support value exceeds a given threshold by using the binomial distribution and applying Chernoff bounds. With this they derived a formula for a sufficient sample size.
- Zaki et al took this idea up and published these bounds for approximate association rules generated under sampling. These bounds were also calculated using the univariate approach suggested by Toivonen including Chernoff bounds. It turned out by these investigations that these bounds are not very efficient since the required sample size can be very huge. As shown by Zaki et al . the required sample sizes can even become greater than the original database ( ! ) . Thus the current state of the art teaching is completely unsatisfactory and actually cannot be applied to real world problems .
- the invention is based on the objective to improve the performance of the technologies for data mining of association rules .
- the current invention relates to a data mining technology for determining association rules within a multitude of N transactions each transaction comprising up to p different items .
- a sample size n of the multitude of N transactions is determined based on precision requirements.
- the sample size n is chosen such, that it is at least in the order of magnitude of an estimated sample size n* .
- association rules are computed based on a sample of the multitude of N transactions with sample size n according to any methodology for mining of association rules using the association rules as estimated association rules of the multitude of N " transactions .
- sample sizes determined according to the current invention are much lower than the number of the original transactions and much lower than the known state of the art approaches. Therefore, the current teaching results in very significant performance improvements for data mining of association rules .
- Figure 3 visualizes the process flow for sampling of association rules in multivariate case. This process flow could be applied also to the univariate model accordingly without any further problem.
- Figure 4 depicts a distributed processing model for mining of association rules .
- the present invention can be realized in hardware, software, or a combination of hardware and software. Any kind of computer system - or other apparatus adapted for carrying out the methods described herein - is suited.
- a typical combination of hardware and software could be a general- purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods described herein.
- the present invention can also be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein, and which - when being loaded in a computer system - is able to carry out these methods .
- Computer program means or computer program in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following a) conversion to another language, code or notation; b) reproduction in a different material form.
- transaction record refers to just a tuple of items; of course it is not required that such a record has been part of any computer transaction.
- the wording of transaction records is used for historical reasons only.
- an item may be represented by any type of attribute not necessarily related to an item within the real world.
- association rules are a methodology to figure out unknown relations or rules from usually very large data sets.
- This methodology consists of the following proceeding. Given a set of so called items. These items could be purchases from supermarket basket data. Subsets of this set of items are so called transactions, for example beer and crisps as one transaction, whereas another transaction could consist of bred and butter. A set of items is often called itemset too. Therefore every transaction contains an itemset.
- I ⁇ i 1 ,i 2 ,...,i p ⁇ be a set of p distinct attribute values, also called items.
- Each itemset is said to have a support s if s% of the transactions in D contain the itemset (thus, the support measure represents a relative frequency) .
- the first question aims to eliminate possible systematic errors. For instance, selecting for example every n-th record from a database could and could choose such serious systematic errors .
- the second question deals with the point how many transactions should be taken from the whole population, that is the sample size. Intuitively it is quite clear that this problem is related with the precision which can be achieved due to the sample. This means that a sample of 100 transactions can guarantee a much less precision of the estimators than a sample of 10000 transactions.
- the third question deals with the following: Assume the set of all transactions contains 1.000.000 transactions. If one takes a sample of 100 transactions by chance one can calculate the relative frequency of an item (say) A as an estimator for the relative frequency of item A on the whole population. If one takes by chance a second sample of 100 transactions, one can also calculate the relative frequency of item A based on this second sample as an estimator, but both calculated frequencies will be different. If one repeats this proceeding several hundred times then the so calculated relative frequencies will more or less scatter around the relative frequency of item A on the whole population.
- sampling scheme One of the most well known sampling schemes is the so called Bernoulli sampling. This assumes that the data is given for example as a sequential file or data base where the records can be numbered von 1 to N and the data base can be traversed through along this ordering. Given a probability ⁇ , with which every element can be chosen, this sampling scheme works as following: For the i-th element a random experiment is made where this element is chosen with probability ⁇ . This can be done by generating a random number on the interval (0,1) where - li ⁇
- the i-th element is taken, if the considered random number is smaller than ⁇ , otherwise this element is refused.
- u._ is the percentile of the standard normal distribution
- the inclusion probability % of the i-th observation to be put into the sample is then given by:
- the inclusion probability ⁇ y for the i-th as well as the j-th element to be put simultaneously into the sample is in this case equal ⁇ 2 .
- the big advantage of the sampling scheme consists in the fact that it can be easily implemented on a computer.
- the disadvantage consists in- the- fact that the sample size is no longer a fixed quantity but a random variable.
- N n is .the sample size and N the size of the population.
- n ⁇ is the number of the elements already taken under the first k-1 elements of the whole population. If we have n-n ⁇ ⁇ k k N-k+1 in case of the k-th random number- ⁇ k , then the-k-th element is- taken, otherwise not.
- This invention describes a methodology for computing association rules based on a sample instead of the whole population. It is suggested to use these association rules as estimated association rules of whole the population of transactions. As then the actual methodology for mining of association rules can be limited to the sample only significant performance improvements are achieved.
- the important feature of the proposed invention is the technique for determining the sample size, while at the same time achieving prescribed precision requirements.
- the essential concept of the current invention is the observation that much smaller sample sizes can be determined, which at the same time satisfy the required precision requirements, if further parameters characterizing the multitude of transactions are introduced in the sample size determination.
- it is suggested to use as such characterizing property the size N of the multitude of transactions in another embodiment of the current invention the number p of different items occurring within the transactions is used as characterizing property.
- additional approximation techniques may be applied to eliminate these characterizing properties again.
- Estimations for instance estimations for the support values calculated according to the current state-of-the-art are done only based on a univariate analysis.
- An univariate analysis means that only a single value is estimated.
- a multivariate analysis means that a vector is undergoing an estimation analysis wherein each component of that vector is an estimator and wherein all components are estimated simultaneously.
- the idea with this approach is to have a sample size which estimates the support of all of the single items to a specified correctness simultaneously.
- the proposed multivariate approach is based on confidence ellipsoids to determine the necessary sample sizes and has several advantages .
- the basic concept is that the support value of any rule R can be seen as a relative frequency. This value can be approximately measured by an estimator as follows.
- the whole database consists of N sequentially ordered elements (where each element is represented by a record) .
- a binary attribute which is 1 if the element supports the rule, i.e. when the item(s) of a rule appear (s) in the record, and 0 if the rule is not supported by the element.
- the mean value of this binary attribute is the support (denoted by p) .
- Drawing a sample without replacement an unbiased estimator an estimator is unbiased if the expectation value of the estimator equals the parameter which shall be estimated) for this support value is the mean of the binary attribute measured- over all elements in the sample (This mean is denoted by p ) .
- a confidence interval for the support value can be constructed.
- N the size of the whole population and n the sample size.
- Minsup value is 0,01 and the estimator for a rule R shall not differ from the true value by 1% with a probability of..90%. Then.it is necessary to draw a sample of the size -1415204 elements.
- Zaki suggested using the following formulae for estimating that the estimated value p derived from a sample of size n is less ⁇ greater ⁇ than (l- ⁇ ) ⁇ (1 + ⁇ ) ⁇ from the true value p which means that the relative error is less ⁇ greater ⁇ than a factor ⁇ from :
- the other possibility to calculate the sample size is related to specify an absolute error d between the estimator and the true value. Based on the absolute error measure d the following formula may be deduced:
- This confidence interval can be used if one is only interested that the error that the true support value is less than the lower border is less than the error probability ⁇ . This can be the case when the error shall be controlled that a rule has a support value greater than the Minsup threshold whereas the true value is less than this threshold. For example if in the sample a rule will have a support value such that the lower bound of the corresponding confidence interval is greater than the Minsup threshold then the true value is smaller than this bound only with an error probability of at most ⁇ .
- n U ⁇ P(1 ? - P) eq. 5
- p can be chosen as 0.5 such that is the largest value for n resulting in
- sample sizes obtained by these formulae are smaller than the sample size calculated for the corresponding closed confidence interval. Since the latter sample sizes have been shown to be smaller than the sample sizes suggested by Zaki et al . and Toivonen et al . so are the sample sizes derived here.
- a generalization (in the sense of considering p items simultaneously) of this idea consists in the construction of a so called confidence ellipsoid at confidence level (1- ⁇ ) .
- a confidence ellipsoid in p dimensions defines a region in p dimensions, such that the true value is enclosed in this region with a certain probability (1- ⁇ ).
- the width (the area or the volume respectively) is a measure for the precision. Therefore if one requires a certain precision one could choose the sample size such, that the width (the area or the volume respectively) does not exceed a prescribed bound (for a desired confidence level) .
- Every component of such a vector corresponds to an item, where a value 1 means that the considered item exists in the considered transaction and a value 0, that the item is not " present. Note that the dimension of the binary vectors is implied by the number p of all possible single items.
- h k k l
- const (p) is a constant depending on the dimension p.
- V* cons(p)n 2 ⁇ _ a) p ) 2 Vdet? From this equation we get for the necessary n as sample size:
- intervals can be received from a constructed confidence ellipsoid as following.
- Fig. 3 visualizes the process flow for sampling of association rules in multivariate case outlined in the previous chapter. This process flow could be applied also to the univariate model accordingly without any further problem.
- step 301 a decision is made whether data mining for association rules should be performed based on the complete multitude of transaction records (choosing path 302) or based on a sample (choosing path 303) .
- path 302 the methodology for mining of association is applied within step 304 followed by a steps 305 to visualize the computed association rules .
- the sample size has to be determined within step 306.
- One approach would consist in specifying the sample size directly.
- Another approach would consist in a step to calculate the sample size.
- the sample size would be calculated based on: a. the number p of different items occurring within the multitude of transactions as parameter for more thoroughly characterizing the multitude of transactions, b. further precision requirements for a quality of the approximation comprising for example: bl . the confidence (1- ⁇ ) for an estimation based on a sample b2.
- an estimated sample size will be calculated within step 307 according to the approximation formulas eq. 11 or 12.
- This estimated sample size can be used directly as sample size or may be used as an orientation only. In the later case the final sample size would have to be chosen at least in the order of magnitude of the estimated sample size.
- step 304 the state of the art methodologies for mining of association may be applied within step 304 to determine the estimated association rules followed by a step 305 to visualize the estimated association rules.
- step 306 would also comprise the specification of a required minimal support value then in step 305 even a decision would be possible whether a considered association rule is of interest or not. For this purpose the simultaneous confidence intervals calculated within eq. 13 can be exploited. The following decision process would be applied:
- Fig. 4 depicts a distributed processing model for mining of association rules.
- a client computer 401 is shown for controlling determination of association rules.
- the client stores the multitude of N transaction records 402.
- the client computer is drawing within step 403 a sample 404 from the multitude of N transactions with a sample size n.
- the sample size may be determined by any of the previous disclosed approaches .
- the sample is transmitted to the server computer 406 providing a specific service for mining of association rules.
- the association rules are calculated based on the provided sample and returned to the client computer across the communication network. Since now the time for the analysis is small (being based on a small sample only) it is possible to send back the resulting approximate rules very quickly. Finally these rules may then be analyzed for further activities on the client system within step 408.
- the client computer itself is determining the sample size
- the server computer is responsible for determining the sample size. In any case the technology disclosed within the current specification for determining the sample size is exploited.
Abstract
Description
Claims
Priority Applications (6)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2003525499A JP2005502130A (en) | 2001-09-04 | 2002-07-26 | Sampling method for data mining of association rules |
KR10-2004-7003281A KR20040029157A (en) | 2001-09-04 | 2002-07-26 | A sampling approach for data mining of association rules |
CA002459758A CA2459758A1 (en) | 2001-09-04 | 2002-07-26 | A sampling approach for data mining of association rules |
IL16073102A IL160731A0 (en) | 2001-09-04 | 2002-07-26 | A sampling approach for data mining of association rules |
US10/489,138 US7289984B2 (en) | 2001-09-04 | 2002-07-26 | Sampling approach for data mining of association rules |
US11/865,775 US7668793B2 (en) | 2001-09-04 | 2007-10-02 | Method of multivariate estimation analysis and sampling for data mining |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP01121122.4 | 2001-09-04 | ||
EP01121122 | 2001-09-04 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/865,775 Continuation US7668793B2 (en) | 2001-09-04 | 2007-10-02 | Method of multivariate estimation analysis and sampling for data mining |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2003021477A2 true WO2003021477A2 (en) | 2003-03-13 |
WO2003021477A3 WO2003021477A3 (en) | 2004-02-12 |
Family
ID=8178526
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2002/008335 WO2003021477A2 (en) | 2001-09-04 | 2002-07-26 | A sampling approach for data mining of association rules |
Country Status (7)
Country | Link |
---|---|
US (2) | US7289984B2 (en) |
JP (1) | JP2005502130A (en) |
KR (1) | KR20040029157A (en) |
CN (1) | CN1578955A (en) |
CA (1) | CA2459758A1 (en) |
IL (1) | IL160731A0 (en) |
WO (1) | WO2003021477A2 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
CN108805755A (en) * | 2018-07-04 | 2018-11-13 | 山东汇贸电子口岸有限公司 | A kind of vacation packages generation method and device |
Families Citing this family (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100500329B1 (en) * | 2001-10-18 | 2005-07-11 | 주식회사 핸디소프트 | System and Method for Workflow Mining |
US7680685B2 (en) * | 2004-06-05 | 2010-03-16 | Sap Ag | System and method for modeling affinity and cannibalization in customer buying decisions |
CN101145030B (en) * | 2006-09-13 | 2011-01-12 | 新鼎系统股份有限公司 | Method and system for increasing variable amount, obtaining rest variable, dimensionality appreciation and variable screening |
CA2702406C (en) * | 2007-10-12 | 2018-07-24 | Patientslikeme, Inc. | Processor-implemented method and system for facilitating a user-initiated clinical study to determine the efficacy of an intervention |
CN101149751B (en) * | 2007-10-29 | 2012-06-06 | 浙江大学 | Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule |
CN101453360B (en) * | 2007-12-06 | 2011-08-31 | 中国移动通信集团公司 | Method and equipment for obtaining related object information |
US8170974B2 (en) * | 2008-07-07 | 2012-05-01 | Yahoo! Inc. | Forecasting association rules across user engagement levels |
JP5501445B2 (en) | 2009-04-30 | 2014-05-21 | ペイシェンツライクミー, インコーポレイテッド | System and method for facilitating data submission within an online community |
US8812543B2 (en) * | 2011-03-31 | 2014-08-19 | Infosys Limited | Methods and systems for mining association rules |
CN102195899B (en) * | 2011-05-30 | 2014-05-07 | 中国人民解放军总参谋部第五十四研究所 | Method and system for information mining of communication network |
CN102999496A (en) * | 2011-09-09 | 2013-03-27 | 北京百度网讯科技有限公司 | Method for building requirement analysis formwork and method and device for searching requirement recognition |
US9110969B2 (en) * | 2012-07-25 | 2015-08-18 | Sap Se | Association acceleration for transaction databases |
CN102930372A (en) * | 2012-09-25 | 2013-02-13 | 浙江图讯科技有限公司 | Data analysis method for association rule of cloud service platform system orienting to safe production of industrial and mining enterprises |
US8977587B2 (en) | 2013-01-03 | 2015-03-10 | International Business Machines Corporation | Sampling transactions from multi-level log file records |
CN103678540A (en) * | 2013-11-30 | 2014-03-26 | 武汉传神信息技术有限公司 | In-depth mining method for translation requirements |
CN104182527B (en) * | 2014-08-27 | 2017-07-18 | 广西财经学院 | Association rule mining method and its system between Sino-British text word based on partial order item collection |
US10037361B2 (en) * | 2015-07-07 | 2018-07-31 | Sap Se | Frequent item-set mining based on item absence |
US20180005120A1 (en) * | 2016-06-30 | 2018-01-04 | Futurewei Technologies, Inc. | Data mining interest generator |
CN106156316A (en) * | 2016-07-04 | 2016-11-23 | 长江大学 | Special name under a kind of big data environment and native place correlating method and system |
KR101987687B1 (en) * | 2017-10-24 | 2019-06-11 | 강원대학교산학협력단 | Variable size sampling method for supporting uniformity confidence in data stream environment |
US11894139B1 (en) | 2018-12-03 | 2024-02-06 | Patientslikeme Llc | Disease spectrum classification |
CN109858805B (en) * | 2019-01-29 | 2022-12-16 | 浙江力嘉电子科技有限公司 | Farmer garbage collection quantity calculation method based on interval estimation |
Family Cites Families (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4132614A (en) * | 1977-10-26 | 1979-01-02 | International Business Machines Corporation | Etching by sputtering from an intermetallic target to form negative metallic ions which produce etching of a juxtaposed substrate |
US5229300A (en) * | 1991-02-19 | 1993-07-20 | The Dow Chemical Company | Membrane method for the determination of an organic acid |
US5272910A (en) * | 1992-05-13 | 1993-12-28 | The Regents Of The University Of California | Vadose zone monitoring system having wick layer enhancement |
US6134555A (en) * | 1997-03-10 | 2000-10-17 | International Business Machines Corporation | Dimension reduction using association rules for data mining application |
US6032146A (en) * | 1997-10-21 | 2000-02-29 | International Business Machines Corporation | Dimension reduction for data mining application |
US6189005B1 (en) * | 1998-08-21 | 2001-02-13 | International Business Machines Corporation | System and method for mining surprising temporal patterns |
US6260038B1 (en) * | 1999-09-13 | 2001-07-10 | International Businemss Machines Corporation | Clustering mixed attribute patterns |
AU4733601A (en) * | 2000-03-10 | 2001-09-24 | Cyrano Sciences Inc | Control for an industrial process using one or more multidimensional variables |
US6636862B2 (en) * | 2000-07-05 | 2003-10-21 | Camo, Inc. | Method and system for the dynamic analysis of data |
US6905827B2 (en) * | 2001-06-08 | 2005-06-14 | Expression Diagnostics, Inc. | Methods and compositions for diagnosing or monitoring auto immune and chronic inflammatory diseases |
-
2002
- 2002-07-26 IL IL16073102A patent/IL160731A0/en unknown
- 2002-07-26 US US10/489,138 patent/US7289984B2/en not_active Expired - Fee Related
- 2002-07-26 CN CNA028172469A patent/CN1578955A/en active Pending
- 2002-07-26 CA CA002459758A patent/CA2459758A1/en not_active Abandoned
- 2002-07-26 KR KR10-2004-7003281A patent/KR20040029157A/en active Search and Examination
- 2002-07-26 WO PCT/EP2002/008335 patent/WO2003021477A2/en active Application Filing
- 2002-07-26 JP JP2003525499A patent/JP2005502130A/en active Pending
-
2007
- 2007-10-02 US US11/865,775 patent/US7668793B2/en not_active Expired - Fee Related
Non-Patent Citations (7)
Title |
---|
BEEKMAN F, RUDOLPH A: "Zeitoptimierte Assoziationsanalyse durch Stichprobenauswahl dargestellt am Beispiel aus der Telekommunikationsbranche" PROCEEDINGS OPERATION RESEARCH 2001, DUISBURG, GERMANY, [Online] 3 - 5 September 2001, XP002261142 Retrieved from the Internet: <URL:http://www.uni-duisburg.de/or2001/pdf /Sek%2014%20-%20Beekmann%20Rudolf.pdf> [retrieved on 2003-11-13] * |
GU B, HU F, LIU H: "Sampling and its application in data mining: a survey" TECHNICAL REPORT TRA6/00, [Online] June 2000 (2000-06), XP002261143 National University of Singapore, School of Computing Retrieved from the Internet: <URL:http://techrep.comp.nus.edu.sg/techre portsÖ2000ÖTRA6-00.pdf> [retrieved on 2003-11-13] * |
KREIENBROCK L: "Einf}hrung in die Stichprobenverfahren" 1989 , OLDENBOURG VERLAG , M]NCHEN XP002261411 page 74 page 155, lines 14-22 * |
KREIENBROCK L: "Einfache und geschichtete Zufallsauswahl aus endlichen Grundgesamtheiten bei multivariaten Beobachtungen" 1987 , DISSERTATION, FACHBEREICH STATISTIK , UNIVERSIT[T DORTMUND XP002261412 Sections 2.6.1, 2.6.3, 2.7.3 * |
MONTGOMERY D C, RUNGER G C: "Applied Statistics and Probability for Engineers" 1994 , JOHN WILEY & SONS, INC. , NEW YORK XP002261413 Section 7.9 * |
WATANABE O: "Simple sampling techniques for discovery science" TECHNICAL REPORTS ON MATHEMATICAL AND COMPUTING SCIENCES: TR-C137, [Online] October 1999 (1999-10), XP002261410 Tokyo Institute of Technology Retrieved from the Internet: <URL:ftp://ftp.is.titech.ac.jp/pub/tech-re ports/C/C-137.ps.gz> [retrieved on 2003-11-13] * |
ZAKI M J ET AL: "Evaluation of sampling for data mining of association rules" RESEARCH ISSUES IN DATA ENGINEERING, 1997. PROCEEDINGS. SEVENTH INTERNATIONAL WORKSHOP ON BIRMINGHAM, UK 7-8 APRIL 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 7 April 1997 (1997-04-07), pages 42-50, XP010219673 ISBN: 0-8186-7849-6 cited in the application * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101655857A (en) * | 2009-09-18 | 2010-02-24 | 西安建筑科技大学 | Method for mining data in construction regulation field based on associative regulation mining technology |
CN108805755A (en) * | 2018-07-04 | 2018-11-13 | 山东汇贸电子口岸有限公司 | A kind of vacation packages generation method and device |
Also Published As
Publication number | Publication date |
---|---|
JP2005502130A (en) | 2005-01-20 |
CN1578955A (en) | 2005-02-09 |
US7289984B2 (en) | 2007-10-30 |
IL160731A0 (en) | 2004-08-31 |
US20050027663A1 (en) | 2005-02-03 |
US7668793B2 (en) | 2010-02-23 |
WO2003021477A3 (en) | 2004-02-12 |
CA2459758A1 (en) | 2003-03-13 |
KR20040029157A (en) | 2004-04-03 |
US20080147688A1 (en) | 2008-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2003021477A2 (en) | A sampling approach for data mining of association rules | |
Aggarwal et al. | A framework for projected clustering of high dimensional data streams | |
Cérou et al. | Sequential Monte Carlo for rare event estimation | |
Ruiz et al. | Graphon signal processing | |
Whittaker et al. | A Markov chain model for statistical software testing | |
Chen et al. | Stochastic root finding via retrospective approximation | |
Chen et al. | On data labeling for clustering categorical data | |
Lai et al. | The optimality box in uncertain data for minimising the sum of the weighted job completion times | |
Tikhomirov | Invertibility via distance for noncentered random matrices with continuous distributions | |
Meyer | Density estimation with distribution element trees | |
Cohen | Min-Hash Sketches. | |
Papadimitriou et al. | Adaptive, unsupervised stream mining | |
AU2002325371A1 (en) | A sampling approach for data mining of association rules | |
Ordonez et al. | Accelerating EM clustering to find high-quality solutions | |
Minartz et al. | Multivariate correlations discovery in static and streaming data | |
Wang et al. | On modeling influence maximization in social activity networks under general settings | |
Heidergott et al. | Single-run gradient estimation via measure-valued differentiation | |
Yusuf et al. | Towards cryptanalysis of a variant prime numbers algorithm | |
Broersen et al. | Costs of order selection in time series analysis | |
Chuang et al. | Feature-preserved sampling over streaming data | |
Fletcher et al. | Ranked sparse signal support detection | |
Hajiaghayi et al. | Optimal algorithms for free order multiple-choice secretary | |
Eickhoff et al. | Analysis of the time evolution of quantiles in simulation | |
Delcoigne et al. | Large deviations rate function for polling systems | |
Zhang et al. | Efficient single-source SimRank query by path aggregation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VN YU ZA ZM Kind code of ref document: A2 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG Kind code of ref document: A2 Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 2003525499 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2002325371 Country of ref document: AU Ref document number: 20028172469 Country of ref document: CN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 160731 Country of ref document: IL Ref document number: 2459758 Country of ref document: CA Ref document number: 472/CHENP/2004 Country of ref document: IN Ref document number: 1020047003281 Country of ref document: KR |
|
WWE | Wipo information: entry into national phase |
Ref document number: 10489138 Country of ref document: US |
|
122 | Ep: pct application non-entry in european phase |