WO2003021477A3 - A sampling approach for data mining of association rules - Google Patents

A sampling approach for data mining of association rules Download PDF

Info

Publication number
WO2003021477A3
WO2003021477A3 PCT/EP2002/008335 EP0208335W WO03021477A3 WO 2003021477 A3 WO2003021477 A3 WO 2003021477A3 EP 0208335 W EP0208335 W EP 0208335W WO 03021477 A3 WO03021477 A3 WO 03021477A3
Authority
WO
WIPO (PCT)
Prior art keywords
association rules
multitude
transactions
data mining
sample size
Prior art date
Application number
PCT/EP2002/008335
Other languages
French (fr)
Other versions
WO2003021477A2 (en
Inventor
Frank Beekmann
Roland Grund
Andreas Rudolph
Original Assignee
Ibm
Ibm Deutschland
Frank Beekmann
Roland Grund
Andreas Rudolph
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ibm, Ibm Deutschland, Frank Beekmann, Roland Grund, Andreas Rudolph filed Critical Ibm
Priority to KR10-2004-7003281A priority Critical patent/KR20040029157A/en
Priority to CA002459758A priority patent/CA2459758A1/en
Priority to JP2003525499A priority patent/JP2005502130A/en
Priority to US10/489,138 priority patent/US7289984B2/en
Priority to IL16073102A priority patent/IL160731A0/en
Publication of WO2003021477A2 publication Critical patent/WO2003021477A2/en
Publication of WO2003021477A3 publication Critical patent/WO2003021477A3/en
Priority to US11/865,775 priority patent/US7668793B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99931Database or file accessing
    • Y10S707/99933Query processing, i.e. searching
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10TECHNICAL SUBJECTS COVERED BY FORMER USPC
    • Y10STECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y10S707/00Data processing: database and file management or data structures
    • Y10S707/99941Database schema or data structure
    • Y10S707/99942Manipulating data structure, e.g. compression, compaction, compilation

Abstract

The current invention relates to a data mining technology for determining association rules within a multitude of N transactions each transaction comprising up to p different items. According to the invention a sample size n of the multitude of N transactions is determined based on precision requirements. The sample size n is chosen such, that it is at least in the order of magnitude of an estimated sample size n*. Finally association rules are computed based on a sample of the multitude of N transactions with sample size n according to any methodology for mining of association rules using the association rules as estimated association rules of the multitude of N transactions.
PCT/EP2002/008335 2001-09-04 2002-07-26 A sampling approach for data mining of association rules WO2003021477A2 (en)

Priority Applications (6)

Application Number Priority Date Filing Date Title
KR10-2004-7003281A KR20040029157A (en) 2001-09-04 2002-07-26 A sampling approach for data mining of association rules
CA002459758A CA2459758A1 (en) 2001-09-04 2002-07-26 A sampling approach for data mining of association rules
JP2003525499A JP2005502130A (en) 2001-09-04 2002-07-26 Sampling method for data mining of association rules
US10/489,138 US7289984B2 (en) 2001-09-04 2002-07-26 Sampling approach for data mining of association rules
IL16073102A IL160731A0 (en) 2001-09-04 2002-07-26 A sampling approach for data mining of association rules
US11/865,775 US7668793B2 (en) 2001-09-04 2007-10-02 Method of multivariate estimation analysis and sampling for data mining

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP01121122.4 2001-09-04
EP01121122 2001-09-04

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US11/865,775 Continuation US7668793B2 (en) 2001-09-04 2007-10-02 Method of multivariate estimation analysis and sampling for data mining

Publications (2)

Publication Number Publication Date
WO2003021477A2 WO2003021477A2 (en) 2003-03-13
WO2003021477A3 true WO2003021477A3 (en) 2004-02-12

Family

ID=8178526

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2002/008335 WO2003021477A2 (en) 2001-09-04 2002-07-26 A sampling approach for data mining of association rules

Country Status (7)

Country Link
US (2) US7289984B2 (en)
JP (1) JP2005502130A (en)
KR (1) KR20040029157A (en)
CN (1) CN1578955A (en)
CA (1) CA2459758A1 (en)
IL (1) IL160731A0 (en)
WO (1) WO2003021477A2 (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100500329B1 (en) * 2001-10-18 2005-07-11 주식회사 핸디소프트 System and Method for Workflow Mining
US7680685B2 (en) * 2004-06-05 2010-03-16 Sap Ag System and method for modeling affinity and cannibalization in customer buying decisions
CN101145030B (en) * 2006-09-13 2011-01-12 新鼎系统股份有限公司 Method and system for increasing variable amount, obtaining rest variable, dimensionality appreciation and variable screening
CA2702408C (en) * 2007-10-12 2019-08-06 Patientslikeme, Inc. Self-improving method of using online communities to predict health-related outcomes
CN101149751B (en) * 2007-10-29 2012-06-06 浙江大学 Generalized relating rule digging method for analyzing traditional Chinese medicine recipe drug matching rule
CN101453360B (en) * 2007-12-06 2011-08-31 中国移动通信集团公司 Method and equipment for obtaining related object information
US8170974B2 (en) * 2008-07-07 2012-05-01 Yahoo! Inc. Forecasting association rules across user engagement levels
EP2430574A1 (en) 2009-04-30 2012-03-21 Patientslikeme, Inc. Systems and methods for encouragement of data submission in online communities
CN101655857B (en) * 2009-09-18 2013-05-08 西安建筑科技大学 Method for mining data in construction regulation field based on associative regulation mining technology
US8812543B2 (en) * 2011-03-31 2014-08-19 Infosys Limited Methods and systems for mining association rules
CN102195899B (en) * 2011-05-30 2014-05-07 中国人民解放军总参谋部第五十四研究所 Method and system for information mining of communication network
CN102999496A (en) * 2011-09-09 2013-03-27 北京百度网讯科技有限公司 Method for building requirement analysis formwork and method and device for searching requirement recognition
US9110969B2 (en) * 2012-07-25 2015-08-18 Sap Se Association acceleration for transaction databases
CN102930372A (en) * 2012-09-25 2013-02-13 浙江图讯科技有限公司 Data analysis method for association rule of cloud service platform system orienting to safe production of industrial and mining enterprises
US8977587B2 (en) 2013-01-03 2015-03-10 International Business Machines Corporation Sampling transactions from multi-level log file records
CN103678540A (en) * 2013-11-30 2014-03-26 武汉传神信息技术有限公司 In-depth mining method for translation requirements
CN104182527B (en) * 2014-08-27 2017-07-18 广西财经学院 Association rule mining method and its system between Sino-British text word based on partial order item collection
US10037361B2 (en) * 2015-07-07 2018-07-31 Sap Se Frequent item-set mining based on item absence
US20180005120A1 (en) * 2016-06-30 2018-01-04 Futurewei Technologies, Inc. Data mining interest generator
CN106156316A (en) * 2016-07-04 2016-11-23 长江大学 Special name under a kind of big data environment and native place correlating method and system
KR101987687B1 (en) * 2017-10-24 2019-06-11 강원대학교산학협력단 Variable size sampling method for supporting uniformity confidence in data stream environment
CN108805755B (en) * 2018-07-04 2021-11-23 浪潮卓数大数据产业发展有限公司 Tourism package generation method and device
US11894139B1 (en) 2018-12-03 2024-02-06 Patientslikeme Llc Disease spectrum classification
CN109858805B (en) * 2019-01-29 2022-12-16 浙江力嘉电子科技有限公司 Farmer garbage collection quantity calculation method based on interval estimation

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4132614A (en) * 1977-10-26 1979-01-02 International Business Machines Corporation Etching by sputtering from an intermetallic target to form negative metallic ions which produce etching of a juxtaposed substrate
US5229300A (en) * 1991-02-19 1993-07-20 The Dow Chemical Company Membrane method for the determination of an organic acid
US5272910A (en) * 1992-05-13 1993-12-28 The Regents Of The University Of California Vadose zone monitoring system having wick layer enhancement
US6134555A (en) * 1997-03-10 2000-10-17 International Business Machines Corporation Dimension reduction using association rules for data mining application
US6032146A (en) * 1997-10-21 2000-02-29 International Business Machines Corporation Dimension reduction for data mining application
US6189005B1 (en) * 1998-08-21 2001-02-13 International Business Machines Corporation System and method for mining surprising temporal patterns
US6260038B1 (en) * 1999-09-13 2001-07-10 International Businemss Machines Corporation Clustering mixed attribute patterns
DE60113073T2 (en) * 2000-03-10 2006-08-31 Smiths Detection Inc., Pasadena CONTROL FOR AN INDUSTRIAL PROCESS WITH ONE OR MULTIPLE MULTIDIMENSIONAL VARIABLES
WO2002003256A1 (en) * 2000-07-05 2002-01-10 Camo, Inc. Method and system for the dynamic analysis of data
US6905827B2 (en) * 2001-06-08 2005-06-14 Expression Diagnostics, Inc. Methods and compositions for diagnosing or monitoring auto immune and chronic inflammatory diseases

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
BEEKMAN F, RUDOLPH A: "Zeitoptimierte Assoziationsanalyse durch Stichprobenauswahl dargestellt am Beispiel aus der Telekommunikationsbranche", PROCEEDINGS OPERATION RESEARCH 2001, DUISBURG, GERMANY, 3 September 2001 (2001-09-03) - 5 September 2001 (2001-09-05), XP002261142, Retrieved from the Internet <URL:http://www.uni-duisburg.de/or2001/pdf/Sek%2014%20-%20Beekmann%20Rudolf.pdf> [retrieved on 20031113] *
GU B, HU F, LIU H: "Sampling and its application in data mining: a survey", TECHNICAL REPORT TRA6/00, June 2000 (2000-06-01), National University of Singapore, School of Computing, XP002261143, Retrieved from the Internet <URL:http://techrep.comp.nus.edu.sg/techreports\2000\TRA6-00.pdf> [retrieved on 20031113] *
KREIENBROCK L: "Einfache und geschichtete Zufallsauswahl aus endlichen Grundgesamtheiten bei multivariaten Beobachtungen", 1987, DISSERTATION, FACHBEREICH STATISTIK, UNIVERSITÄT DORTMUND, XP002261412 *
KREIENBROCK L: "Einführung in die Stichprobenverfahren", 1989, OLDENBOURG VERLAG, MÜNCHEN, XP002261411 *
MONTGOMERY D C, RUNGER G C: "Applied Statistics and Probability for Engineers", 1994, JOHN WILEY & SONS, INC., NEW YORK, XP002261413 *
WATANABE O: "Simple sampling techniques for discovery science", TECHNICAL REPORTS ON MATHEMATICAL AND COMPUTING SCIENCES: TR-C137, October 1999 (1999-10-01), Tokyo Institute of Technology, XP002261410, Retrieved from the Internet <URL:ftp://ftp.is.titech.ac.jp/pub/tech-reports/C/C-137.ps.gz> [retrieved on 20031113] *
ZAKI M J ET AL: "Evaluation of sampling for data mining of association rules", RESEARCH ISSUES IN DATA ENGINEERING, 1997. PROCEEDINGS. SEVENTH INTERNATIONAL WORKSHOP ON BIRMINGHAM, UK 7-8 APRIL 1997, LOS ALAMITOS, CA, USA,IEEE COMPUT. SOC, US, 7 April 1997 (1997-04-07), pages 42 - 50, XP010219673, ISBN: 0-8186-7849-6 *

Also Published As

Publication number Publication date
KR20040029157A (en) 2004-04-03
IL160731A0 (en) 2004-08-31
CN1578955A (en) 2005-02-09
US7668793B2 (en) 2010-02-23
CA2459758A1 (en) 2003-03-13
US7289984B2 (en) 2007-10-30
WO2003021477A2 (en) 2003-03-13
JP2005502130A (en) 2005-01-20
US20050027663A1 (en) 2005-02-03
US20080147688A1 (en) 2008-06-19

Similar Documents

Publication Publication Date Title
WO2003021477A3 (en) A sampling approach for data mining of association rules
WO2005048050A3 (en) System and method for evaluating underwriting requirements
WO2005119442A3 (en) Methods and systems for cross-probing in integrated circuit design
WO2003029804A1 (en) Measurement instrument and concentration measurement apparatus
WO2007038587A3 (en) Company and contact information system and method
WO2005050380A3 (en) Generating flight schedules using fare routings and rules
WO2004029790A3 (en) Load sensing surface as pointing device
EP1253495A3 (en) Method and system for assessing adjustment factors in testing or monitoring process
AU2002358088A1 (en) Method and sensing device for motion detection in an optical pointing device, such as an optical mouse
WO2005096762A3 (en) Systems and methods of electronic trading using automatic book updates
AU2003272234A1 (en) Integrated spectral data processing, data mining, and modeling system for use in diverse screening and biomarker discovery applications
WO2008033480A3 (en) Security vulnerability determination in a computing system
WO2007084062A8 (en) Image processing
WO2006113291A3 (en) Registration of applications and complimentary features for interactive user interfaces
WO2001055812A3 (en) Fully flexible financial instrument pricing system with intelligent user interfaces
EP1524611A3 (en) System and method for providing information to a user
WO2005038707A3 (en) Personalized automatic publishing extensible layouts
WO2007062408A3 (en) Systems and methods of conducting clinical research
WO2005076900A3 (en) Data and metadata linking form mechanism and method
WO2002097439A3 (en) A differential labelling method
FI20031719A (en) Systems, methods and goods for the production of an interface with select and scroll functions
EP1336916A3 (en) Position-direction measuring apparatus and information processing method
EP1273348A3 (en) Graduated pipette
WO2002059737A3 (en) Method for moving a graphical pointer on a computer display
Martell et al. An improved bandstrength index for the CH G band of globular cluster giants

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BY BZ CA CH CN CO CR CU CZ DE DM DZ EC EE ES FI GB GD GE GH HR HU ID IL IN IS JP KE KG KP KR LC LK LR LS LT LU LV MA MD MG MN MW MX MZ NO NZ OM PH PL PT RU SD SE SG SI SK SL TJ TM TN TR TZ UA UG US UZ VN YU ZA ZM

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ OM PH PL PT RO RU SD SE SG SI SK SL TJ TM TN TR TT TZ UA UG US UZ VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR IE IT LU MC NL PT SE SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ UG ZM ZW AM AZ BY KG KZ RU TJ TM AT BE BG CH CY CZ DK EE ES FI FR GB GR IE IT LU MC PT SE SK TR BF BJ CF CG CI GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
WWE Wipo information: entry into national phase

Ref document number: 2003525499

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 2002325371

Country of ref document: AU

Ref document number: 20028172469

Country of ref document: CN

WWE Wipo information: entry into national phase

Ref document number: 160731

Country of ref document: IL

Ref document number: 2459758

Country of ref document: CA

Ref document number: 472/CHENP/2004

Country of ref document: IN

Ref document number: 1020047003281

Country of ref document: KR

WWE Wipo information: entry into national phase

Ref document number: 10489138

Country of ref document: US

122 Ep: pct application non-entry in european phase