# Michael Hahsler

## A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules

> Research on Association Rules

On this page I collected some commonly used measures of significance and interestingness for association rules and itemsets. Agrawal, Imielinski, and Swami introduced the problem of association rule mining in the following way: Let $I=\{i_1, i_2,\ldots,i_n\}$ be a set of $n$ binary attributes called items. Let $D = \{t_1, t_2, \ldots, t_m\}$ be a set of transactions called the database. Each transaction $t \in D$ has an unique transaction ID and contains a subset of the items in $I$, i. e., $t \subseteq I$. A rule is defined as an implication of the form $X \Rightarrow Y$ where $X, Y \subseteq I$ and $X \cap Y = \emptyset$. The sets of items (for short itemsets) $X$ and $Y$ are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule, respectively.

To make the measures comparable, all measures are also defined not only in terms of itemset support (see support below), but also in terms of probabilities. The probability $P(E_X)$ of the event that all items in itemset $X$ are contained in an arbitrarily chosen transaction can be estimated from a database $D$ using maximum likelihood estimation (MLE) by $$P(E_X) = \frac{|\{t \in D; X \subseteq t\}|}{|D|}$$ where $|\{t \in D; X \subseteq t\}|$ is the number of transactions that contain the itemset $X$ and $|D|$ is the size (number of transactions) of the database. Note: The used probability estimate will be very poor for itemsets with low observed frequencies. This needs to be always taken into account since it effects all measured discussed below.

A good overview of different association rules measures is provided by: Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right objective measure for association analysis. Information Systems, 29(4):293-313, 2004 and Liqiang Geng and Howard J. Hamilton. Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3):9, 2006.

Please cite this document as: Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: http://michael.hahsler.net/research/association_rules/measures.html

## Support

Introduced by R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993. $$supp(X) = \frac{|\{t \in D; X \subseteq t\}|}{|D|} = P(E_X)$$ Support is defined on itemsets and gives the proportion of transactions which contain $X$. It is used as a measure of significance (importance) of an itemset. Since it basically uses the count of transactions it is often called a frequency constraint. An itemset with a support greater then a set minimum support threshold, $supp(X) > \sigma$, is called a frequent or large itemset.

Supports main feature is that it possesses the down-ward closure property (anti-monotonicity) which means that all sub sets of a frequent set are also frequent. This property (actually, the fact that no super set of a infrequent set can be frequent) is used to prune the search space (usually thought of as a lattice or tree of item sets with increasing size) in level-wise algorithms (e.g., the Apriori algorithm).

The disadvantage of support is the rare item problem. Items that occur very infrequently in the data set are pruned although they would still produce interesting and potentially valuable rules. The rare item problem is important for transaction data which usually have a very uneven distribution of support for the individual items (typical is a power-law distribution where few items are used all the time and most item are rarely used).

## Confidence (also called strength)

Introduced by R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993. $$conf(X \rightarrow Y) = \frac{supp(X \rightarrow Y)}{supp(X)} = \frac{supp(X \cup Y)}{supp(X)} = \frac{P(E_X \cap E_Y)}{P(E_X)} = P(E_Y | E_X)$$ Confidence is defined as the probability of seeing the rule's consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules $X \rightarrow Y$ and $Y \rightarrow X$. Association rules have to satisfy a minimum confidence constraint, $conf(X \rightarrow Y) \ge \gamma$.

Confidence is not down-ward closed and was developed together with support by Agrawal et al. (the so-called support-confidence framework). Support is first used to find frequent (significant) itemsets exploiting its down-ward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold.

A problem with confidence is that it is sensitive to the frequency of the consequent $Y$ in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items.

## All-confidence

Introduced by Edward R. Omiecinski. Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003.

All-confidence is defined on itemsets (not rules) as $$allconfidence(X) = \frac{supp(X)}{max(supp(x \in X))} = \frac{P(E_X)}{max(P(E_x; x \in X))}$$ where $max(support(x \in X))$ is the support of the item with the highest support in $X$. All-confidence means that all rules which can be generated from itemset $X$ have at least a confidence of $allconfidence(X)$. All-confidence possesses the downward-closed closure property.

## Collective strength

Introduced by C. C. Aggarwal and P. S. Yu. A new framework for itemset generation. In PODS 98, Symposium on Principles of Database Systems, pages 18-24, Seattle, WA, USA, 1998. $$C(X) = \frac{1-v(X)}{1-E[v(X)]} \frac{E[v(X)]}{v(X)}$$ where $v(X)$ is the violation rate and $E[]$ is the expected value for independent items. The violation rate is defined as the fraction of transactions which contain some of the items in an itemset but not all. Collective strength gives 0 for perfectly negative correlated items, infinity for perfectly positive correlated items, and 1 if the items co-occur as expected under independence.

Problematic is that for items with medium to low probabilities the observations of the expected values of the violation rate is dominated by the proportion of transactions which do not contain any of the items in $X$. For such itemsets collective strength produces values close to one, even if the itemset appears several times more often than expected together.

## Coverage

Coverage is sometimes called antecedent support. It measures how often a rule $X \rightarrow Y$ is applicable in a database. $$coverage(X \rightarrow Y) = supp(X) = P(E_X)$$

## Conviction

Introduced by Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Turk. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May 1997. $$conviction(X \rightarrow Y) =\frac{1-supp(Y)}{1-conf(X \rightarrow Y)} = P(E_X)P(E_{\neg Y})/P(E_X \wedge E_{\neg Y})$$ where $E_{\neg Y}$ is the event that $Y$ does not appear in a transaction. Conviction was developed as an alternative to confidence which was found to not capture direction of associations adequately. Conviction compares the probability that $X$ appears without $Y$ if they were dependent with the actual frequency of the appearance of $X$ without $Y$. In that respect it is similar to lift (see section about lift on this page), however, it contrast to lift it is a directed measure since it also uses the information of the absence of the consequent. An interesting fact is that conviction is monotone in confidence and lift.

## Leverage

Introduced by Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 1991: p. 229-248. $$leverage(X \rightarrow Y) = P(E_X \cap E_Y) - (P(E_X)P(E_Y))$$ Leverage measures the difference of $X$ and $Y$ appearing together in the data set and what would be expected if $X$ and $Y$ where statistically dependent. The rational in a sales setting is to find out how many more units (items $X$ and $Y$ together) are sold than expected from the independent sells.

Using min. leverage thresholds at the same time incorporates an implicit frequency constraint. E.g., for setting a min. leverage thresholds to 0.01% (corresponds to 10 occurrence in a data set with 100,000 transactions) one first can use an algorithm to find all itemsets with min. support of 0.01% and then filter the found item sets using the leverage constraint. Because of this property leverage also can suffer from the rare item problem.

## Lift (originally called interest)

Introduced by S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD '97), pages 265-276, 1997. $$lift(X \rightarrow Y) = lift(Y \rightarrow X) = \frac{conf(X \rightarrow Y)}{supp(Y)} = \frac{conf(Y \rightarrow X)}{supp(X)} = \frac{P(E_X \cap E_Y)}{P(E_X)P(E_Y)}$$ Lift measures how many times more often $X$ and $Y$ occur together than expected if they where statistically independent.

Lift is not down-ward closed and does not suffer from the rare item problem. Also lift is susceptible to noise in small databases. Rare itemsets with low counts (low probability) which per chance occur a few times (or only once) together can produce enormous lift values.

## Other measures

cross support ratio, chi-square statistic, cosine/IS measure, gini index, hyper-lift, hyper-confidence, improvement, phi (correlation), odds ratio.