[PS91]
G. Piatetsky-Shapiro. Discovery, analysis, and presentation of strong rules. In G. Piatetsky-Shapiro and W.J. Frawley, editors, Knowledge Discovery in Databases. AAAI/MIT Press, Cambridge, MA, 1991. [ bib | find paper in Google Scholar | find paper in Google ]
Introduces the measure LEVERAGE which is the simplest function which satisfies his principles for rule-interest functions (0 if the variables are statistically independent; monotonically increasing if the variables occur more often together; monotonically decreasing if one of the variables alone occurs more often).
[BMUT97]
Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Tsur. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May 1997. [ bib | find paper in Google Scholar | find paper in Google ]
Introduces CONVICTION (as an improvement to confidence based on implication rules) and INTEREST (later called LIFT).
[AY98]
C. C. Aggarwal and P. S. Yu. A new framework for itemset generation. In PODS 98, Symposium on Principles of Database Systems, pages 18-24, Seattle, WA, USA, 1998. [ bib | find paper in Google Scholar | find paper in Google ]
Points out weaknesses of the large frequent itemset method using support (spuriousness, dense datasets) and that lift gives only values close to one for items which are very frequent, even if they are perfectly positive correlated. COLLECTIVE STRENGTH is introduced. Collective strength uses the violation rate for an itemset which is the fraction of transactions which contains some, but not all items of the itemset. The violation rate is compared to the expected violation rate under independence. Collective strength is downward closed.
[LHM99]
Bing Liu, Wynne Hsu, and Yiming Ma. Pruning and summarizing the discovered associations. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining (KDD-99), pages 125-134. ACM Press, 1999. [ bib | find paper in Google Scholar | find paper in Google ]
Remove insignificant rules using the chi-square test to test for correlation between the antecedent and the confident of a rule. Also DIRECTION SETTING (DS) RULES are introduced. A DS rule has a pos. correlated antecedent and consequent and is not built from a rule with a shorter antecedent which is a DS rule. Normally, only a small and concise fraction of rules are DS rules.
[BD01]
Dario Bruzzese and Cristina Davino. Pruning of discovered association rules. Computational Statistics, 16:387-398, 2001. [ bib | find paper in Google Scholar | find paper in Google ]
The authors construct several statistical tests to evaluate the significance of discovered associations.
[BH03]
Brock Barber and Howard J. Hamilton. Extracting share frequent itemsets with infrequent subsets. Data Mining and Knowledge Discovery, 7:153-185, 2003. [ bib | find paper in Google Scholar | find paper in Google ]
ITEMSET SHARE is the fraction of some measure (e.g., sales, profit) contributed by the items in the set. A itemset is share frequent if it exceeds a threshold. Share frequency is not downward closed! The article presents several algorithms and heuristics to mine share frequent itemsets.
[TKS04]
Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right objective measure for association analysis. Information Systems, 29(4):293-313, 2004. [ bib | find paper in Google Scholar | find paper in Google ]
Compare the properties of 21 objective measures (of interest). The measures in general lack to agree with each other. However, the authors show that if support-based pruning or table standardization (of the contingency tables) is used, the measures become highly correlated.
[Sch05]
Tobias Scheffer. Finding association rules that trade support optimally against confidence. Intelligent Data Analysis, 9(4):381-395, 2005. [ bib | find paper in Google Scholar | find paper in Google ]
Introduces predictive accuracy which is the expected value of the confidence of a rules with respect to the process underlying the database. The author shows how predictive accuracy can be calculated from confidence and support measured on a data set using a Bayesian frequency correction (very simplified: confidence is discounted for rules with low supports). Also an algorithm is presented which finds the top n most predictive association rules (redundant rules with a 0 predictive accuracy improvement are removed) and shows how to estimate the prior distribution needed for the correction.
[BGBG05]
Julien Blanchard, Fabrice Guillet, Henri Briand, and Regis Gras. Assessing rule interestingness with a probabilistic measure of deviation from equilibrium. In Proceedings of the 11th international symposium on Applied Stochastic Models and Data Analysis ASMDA-2005, pages 191-200. ENST, 2005. [ bib | find paper in Google Scholar | find paper in Google ]
Presents a statistical test for the deviation from the equilibrium of a rule. The equilibrium for rule a -> b is defined as: the number of transactions which contain a and b together is equal to the number of transactions which contain a and not b.
[GH06]
Liqiang Geng and Howard J. Hamilton. Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3):9, 2006. [ bib | find paper in Google Scholar | find paper in Google ]
[Li06]
Jiuyong Li. On optimal rule discovery. IEEE Transactions on Knowledge and Data Engineering, 18(4):460-471, 2006. [ bib | find paper in Google Scholar | find paper in Google ]
An optimal rule set (with respect to a metric of interestingness) contains all rules except those with no greater interestingness than one of its more general rules. An optimal rule set is a subset of a nonredundant rule set. The autors present an algorithm called ORD to find an optimal rule set. Classifiers build on optimal class association rules are at least as accurate as those built from CBA and C4.5 rule.
[HH07]
Michael Hahsler and Kurt Hornik. New probabilistic interest measures for association rules. Intelligent Data Analysis, 11(5):437-455, 2007. [ bib | find paper in Google Scholar | find paper in Google ]
Presents a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. Uses such data and a real-world grocery database to explore the behavior of confidence and lift, two popular interest measures used for rule mining. Also introduces the new probabilistic measures hyper-lift and hyper-confidence.

This file was generated by bibtex2html 1.95.