A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules
> Research on Association RulesThis page contains a collection of commonly used measures of significance and interestingness for association rules and itemsets. All measures discussed on this page are implemented in the freely available R-extension package arules in function interestMeasure(). Please contact me for corrections or if a measure is missing (on this page or implemented in arules).
This work is licensed under the
Creative
Commons Attribution Share Alike 4.0 International License.
Please cite this document as Michael Hahsler, A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules, 2015, URL: http://michael.hahsler.net/research/association_rules/measures.html
Introduction and Definitions
Agrawal, Imielinski, and Swami introduced the problem of association rule mining in the following way: Let $I=\{i_1, i_2,\ldots,i_m\}$ be a set of $m$ binary attributes called items. Let $D = \{t_1, t_2, \ldots, t_n\}$ be a set of transactions called the database. Each transaction $t \in D$ has a unique transaction ID and contains a subset of the items in $I$, i. e., $t \subseteq I$. A rule is defined as an implication of the form $X \Rightarrow Y$ where $X, Y \subseteq I$ and $X \cap Y = \emptyset$. The sets of items (for short itemsets) $X$ and $Y$ are called antecedent (left-hand-side or LHS) and consequent (right-hand-side or RHS) of the rule, respectively.
To make the measures comparable, all measures are also defined not only in terms of itemset support (see support below) but also in terms of probabilities and sometimes in terms of counts. The probability $P(E_X)$ of the event that all items in itemset $X$ are contained in an arbitrarily chosen transaction can be estimated from a database $D$ using maximum likelihood estimation (MLE) by $$\hat{P}(E_X) = \frac{|\{t \in D; X \subseteq t\}|}{n}$$ where $c_X = |\{t \in D; X \subseteq t\}|$ is the count of the number of transactions that contain the itemset $X$ and $n=|D|$ is the size (number of transactions) of the database. For conciseness of notation, we will drop the hat and the $E$ from the notation. We will use in the following just $P(X)$ to mean $\hat{P}(E_X)$, and $P(X \cap Y)$ to mean $\hat{P}(E_X \cap E_Y)$, the probability that a transaction contains all items in $X$ and $Y$.
Note on probability estimation: The used probability estimates will be very poor for itemsets with low observed frequencies. This needs to be always taken into account since it affects most measured discussed below.
Note on null-transactions: Transaction datasets typically contain a large number of transactions that do not contain either $X$ or $Y$. These transactions are called null-transactions, and it is desirable that measures of rule strength should not be influenced by a change in the number of null-transactions. However, most measures are affected by the number of null-transactions since the total number of transactions is used for probability estimation. Measures that are not influenced by a change in the number of null-transactions are called null-invariant (see Tan et al., 2004 and Wu et al., 2010)
A good overview of different association rules measures is provided by Pang-Ning Tan, Vipin Kumar, and Jaideep Srivastava. Selecting the right objective measure for association analysis. Information Systems, 29(4):293-313, 2004 and Liqiang Geng and Howard J. Hamilton. Interestingness measures for data mining: A survey. ACM Computing Surveys, 38(3):9, 2006.
Measures for the support-confidence framework for mining association rules:
Alternative interest measures:
- Added Value
- All-confidence
- Casual Confidence
- Casual Support
- Certainty Factor
- Chi-Squared
- Cross-Support Ratio
- Collective Strength
- Conviction
- Cosine
- Coverage
- Descriptive Confirmed Confidence
- Difference of Confidence
- Example and Counter-Example Rate
- Fisher's Exact Test
- Gini Index
- Hyper-Confidence
- Hyper-Lift
- Imbalance Ratio
- Implication Index
- Importance
- Improvement
- Jaccard Coefficient
- J-Measure
- Kappa
- Klosgen
- Kulczynski
- Goodman-Kruskal Lambda
- Laplace Corrected Confidence
- Least Contradiction
- Lerman Similarity
- Leverage
- Lift
- MaxConf
- Mutual Information
- Odds Ratio
- Phi Correlation Coefficient
- Ralambondrainy Measure
- Relative Linkage Disequilibrium
- Rule Power Factor
- Sebag-Schoenauer Measure
- Standadized Lift
- Varying Rates Liaison
- Yule's Q
- Yule's Y
Quick filter:
Support
Introduced by R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993. $$supp(X) = \frac{|\{t \in D; X \subseteq t\}|}{|D|} = \frac{c_X}{|D|}= P(X)$$ Support is defined on itemsets and gives the proportion of transactions which contain $X$. It is used as a measure of significance (importance) of an itemset. Since it basically uses the count of transactions it is often called a frequency constraint. An itemset with a support greater than a set minimum support threshold, $supp(X) > \sigma$, is called a frequent or large itemset.
For rules the support defined as the support of all items in the rule, i.e., $supp(X \Rightarrow Y) = supp(X \cup Y) = P(X \cap Y)$.
Supports main feature is that it possesses the downward closure property (anti-monotonicity) which means that all subsets of a frequent set are also frequent. This property (actually, the fact that no superset of an infrequent set can be frequent) is used to prune the search space (usually thought of as a lattice or tree of item sets with increasing size) in level-wise algorithms (e.g., the Apriori algorithm).
The disadvantage of support is the rare item problem. Items that occur very infrequently in the data set are pruned although they would still produce interesting and potentially valuable rules. The rare item problem is important for transaction data which usually have a very uneven distribution of support for the individual items (typical is a power-law distribution where few items are used all the time and most item are rarely used).
Range: $[0, 1]$
Confidence, Strength
Introduced by R. Agrawal, T. Imielinski, and A. Swami. Mining associations between sets of items in large databases. In Proc. of the ACM SIGMOD Int'l Conference on Management of Data, pages 207-216, Washington D.C., May 1993. $$conf(X \Rightarrow Y) = \frac{supp(X \Rightarrow Y)}{supp(X)} = \frac{supp(X \cup Y)}{supp(X)} = \frac{c_{XY}}{c_X} = \frac{P(X \cap Y)}{P(X)} = P(Y | X)$$ Confidence is defined as the probability of seeing the rule's consequent under the condition that the transactions also contain the antecedent. Confidence is directed and gives different values for the rules $X \Rightarrow Y$ and $Y \Rightarrow X$. Association rules have to satisfy a minimum confidence constraint, $conf(X \Rightarrow Y) \ge \gamma$.
Confidence is not downward closed and was developed together with support by Agrawal et al. (the so-called support-confidence framework). Support is first used to find frequent (significant) itemsets exploiting its downward closure property to prune the search space. Then confidence is used in a second step to produce rules from the frequent itemsets that exceed a min. confidence threshold.
A problem with confidence is that it is sensitive to the frequency of the consequent $Y$ in the database. Caused by the way confidence is calculated, consequents with higher support will automatically produce higher confidence values even if there exists no association between the items.
Range: $[0, 1]$
Added Value (AV), Pavillon Index, Centered Confidence
Defined as $$AV(X \Rightarrow Y)) = conf(X \Rightarrow Y) - supp(Y)$$
Range: $[-.5, 1]$
All-confidence
Introduced by Edward R. Omiecinski. Alternative interest measures for mining associations in databases. IEEE Transactions on Knowledge and Data Engineering, 15(1):57-69, Jan/Feb 2003.
All-confidence is defined on itemsets (not rules) as $$\textit{all-confidence}(X) = \frac{supp(X)}{max_{x \in X}(supp(x))} = \frac{P(X)}{max_{x \in X}(P(x))} = min\{P(X|Y), P(Y|X)\}$$ where $max_{x \in X}(supp(x \in X))$ is the support of the item with the highest support in $X$. All-confidence means that all rules which can be generated from itemset $X$ have at least a confidence of $\textit{all-confidence}(X)$. All-confidence possesses the downward-closed closure property and thus can be effectively used inside mining algorithms. All-confidence is null-invariant.
Range: $[0, 1]$
Casual Confidence
Kodratoff, Y. (1999). Comparing Machine Learning and Knowledge Discovery in Databases: An Application to Knowledge Discovery in Texts. Lecture Notes on AI (LNAI) - Tutorial series.
Confidence reinforced by negatives given by $$\textit{casual-conf} = \frac{1}{2} [conf(X \Rightarrow Y) + conf(\overline{X} \Rightarrow \overline{Y})] = \frac{1}{2} [P(Y|X) + P(\overline{Y}|\overline{X})]$$
Range: $[0, 1]$
Casual Support
Kodratoff, Y. (1999). Comparing Machine Learning and Knowledge Discovery in Databases: An Application to Knowledge Discovery in Texts. Lecture Notes on AI (LNAI) - Tutorial series.
Support improved by negatives given by $$\textit{casual-supp} = supp(X \cup Y) + supp(\overline{X} \cup \overline{Y}) = P(X \cap Y) + P(\overline{X} \cap \overline{Y})$$
Range: $[0, 2]$
Certainty Factor (CF), Loevinger
Berzal, Fernando, Ignacio Blanco, Daniel Sanchez and Maria-Amparo Vila (2002). Measuring the accuracy and interest of association rules: A new framework. Intelligent Data Analysis 6, 221-235.
The certainty factor is a measure of variation of the probability that $Y$ is in a transaction when only considering transactions with $X$. An increasing CF means a decrease of the probability that $Y$ is not in a transaction that $X$ is in. Negative CFs have a similar interpretation. $$CF(X \Rightarrow Y) = \frac{conf(X \Rightarrow Y)-supp(Y)}{supp(\overline{Y})} = \frac{P(Y|X)-P(Y)}{1-P(Y)}$$
Range: $[-1, 1]$ (0 indicates independence)
Chi-Squared
The chi-squared statistic of the test for independence between the LHS and RHS of the rule. The chi-squared statistic is $$\textit{Chi-squared}(X \Rightarrow Y) = \frac{(\textit{observed} - \textit{expected})^2}{\textit{expected}} = \frac{(c_{XY}- \frac{c_X c_Y}{n})^2}{\frac{c_X c_Y}{n}}.$$ The statistic has approximately as chi-squared distribution with 1 degree of freedom (2x2 contingency table) and the critical value for $\alpha=0.05$ is $3.84$; higher chi-squared values indicate that the null-hypothesis of independence between LHS and the RHS should be rejected (i.e., the rule is not spurious). Larger chi-squared values indicate stronger evidence that the rule represents a strong relationship. The statistic can be converted into a p-value using the chi-squared distribution. Note that the contingency tables for some rules may contain cells with low expected values (less then 5) and thus Fisher's Exact Test might be more appropriate.
Range: $[0, \infty]$
Cross-Support Ratio
Introduced by Xiong et al., 2003. Defined on itemsets as the ratio of the support of the least frequent item to the support of the most frequent item, i.e., $$\textit{cross-support}(X) = \frac{min_{x \in X}(supp(x))}{max_{x \in X}(supp(x))}$$ a ratio smaller than a set threshold. Normally many found patterns are cross-support patterns which contain frequent as well as rare items. Such patterns often tend to be spurious.
Range: $[0, 1]$
Collective Strength (S)
Introduced by C. C. Aggarwal and P. S. Yu. A new framework for itemset generation. In PODS 98, Symposium on Principles of Database Systems, pages 18-24, Seattle, WA, USA, 1998. $$S(X) = \frac{1-v(X)}{1-E[v(X)]} \frac{E[v(X)]}{v(X)} = \frac{P(X \cap Y)+P(\overline{Y}|\overline{X})} {P(X)P(Y)+P(\overline{X})P(\overline{Y})} $$ where $v(X)$ is the violation rate and $E[v(X)]$ is the expected violation rate for independent items. The violation rate is defined as the fraction of transactions which contain some of the items in an itemset but not all. Collective strength gives 0 for perfectly negative correlated items, infinity for perfectly positive correlated items, and 1 if the items co-occur as expected under independence.
Problematic is that for items with medium to low probabilities the observations of the expected values of the violation rate is dominated by the proportion of transactions which do not contain any of the items in $X$. For such itemsets, collective strength produces values close to one, even if the itemset appears several times more often than expected together.
Range: $[0, \infty]$
Conviction
Introduced by Sergey Brin, Rajeev Motwani, Jeffrey D. Ullman, and Shalom Turk. Dynamic itemset counting and implication rules for market basket data. In SIGMOD 1997, Proceedings ACM SIGMOD International Conference on Management of Data, pages 255-264, Tucson, Arizona, USA, May 1997. $$conviction(X \Rightarrow Y) =\frac{1-supp(Y)}{1-conf(X \Rightarrow Y)} = \frac{P(X)P(\overline{Y})}{P(X \cap \overline{Y})}$$ where $\overline{Y} = E_{\neg Y}$ is the event that $Y$ does not appear in a transaction. Conviction was developed as an alternative to confidence which was found to not capture direction of associations adequately. Conviction compares the probability that $X$ appears without $Y$ if they were dependent on the actual frequency of the appearance of $X$ without $Y$. In that respect it is similar to lift (see the section about lift on this page), however, in contrast to lift it is a directed measure since it also uses the information of the absence of the consequent. An interesting fact is that conviction is monotone in confidence and lift.
Range: $[0, \infty]$ (1 indicates independence; rules that alway hold have $\infty$)
Cosine
Cosine is a null-invariant measure of correlation between the items in $X$ and $Y$ defined as (Tan et al. (2004)) $$cosine(X \Rightarrow Y) = \frac{supp(X \cup Y)}{\sqrt{(supp(X)supp(Y))}} = \frac{P(X \cap Y)}{\sqrt{P(X)P(Y)}} = \sqrt{P(X | Y) P(Y | X)}$$
Range: $[0, 1]$ ($0.5$ means no correlation)
Coverage, LHS Support
It measures the probability that a rule $X \Rightarrow Y$ applies to a randomly selected transaction. It is estimated by the proportion of transactions that contain the antecedent of the rule $X \Rightarrow Y$. Therefore, coverage is sometimes called antecedent support or LHS support. $$cover(X \Rightarrow Y) = supp(X) = P(X)$$
Range: $[0, 1]$
Descriptive Confirmed Confidence
Kodratoff, Y. (1999). Comparing Machine Learning and Knowledge Discovery in Databases: An Application to Knowledge Discovery in Texts. Lecture Notes on AI (LNAI) - Tutorial series.
Confidence confirmed by its negative defined as $$\textit{confirmed-conf} = conf(X \Rightarrow Y) - conf(X \Rightarrow \overline{Y}) = P(Y|X) - P(\overline{Y}|X)$$
Range: $[-1, 1]$
Difference of Confidence
Hofmann, Heike and Adalbert Wilhelm (2001). Visual comparison of association rules. Computational Statistics, 16(3):399-415.
Defined as $$doc(X \Rightarrow Y) = conf(X \Rightarrow Y) - conf(\overline{X} \Rightarrow Y) = P(Y|X) - P(Y|\overline{X})$$
Range: $[-1, 1]$
Example and Counter-Example Rate
Defined as $$ecr(X \Rightarrow Y) = c_{XY} - c_{X\overline{Y}} = \frac{P(X \cap Y) - P(X \cap \overline{Y})}{P(X \cap Y)}$$
Range: $[0, 1]$
Fisher's Exact Test
P-value for Fisher's exact test for independence is used instead of the Chi-squared test for contingency tables with cells with low expected counts.
Range: $[0, 1]$ (p-value scale)
Gini Index
Measures quadratic entropy as $$ gini(X \Rightarrow Y) = P(X) [P(Y|X)^2+P(\overline{Y}|X)^2] + P(\overline{X}) [P(B|\overline{X})^2+P(\overline{Y}|\overline{X})^2] - P(Y)^2 - P(\overline{Y})^2 $$
Range: $[0, 1]$ (0 means that the rule does not provide any information for the dataset)
Hyper-Confidence
The confidence level for observation of too high/low counts for rules $X \Rightarrow Y$ using the hypergeometric model. Since the counts are drawn from a hypergeometric distribution (represented by the random variable $C_{XY}$ with known parameters given by the counts $c_X$ and $c_Y$, we can calculate a confidence interval for the observed counts $c_{XY}$ stemming from the distribution. Hyper-confidence reports the confidence level as $$ \textit{hyper-conf}(X \Rightarrow Y) = 1 - P[C_{XY} >= c_{XY} | c_X, c_Y]$$ A confidence level of, e.g., $> 0.95$ indicates that there is only a 5% chance that the high count for the rule has occurred randomly. Note that hyper-confidence is equivalent to the statistic used to calculate the p-value in Fisher's exact test.
Hyper-Confidence can also be used to evaluate that $X$ and $Y$ are complementary (i.e., the count is too low to have ocurred randomly). $$ \textit{hyper-conf}_\textit{complement}(X \Rightarrow Y) = 1 - P[C_{XY} < c_{XY} | c_X, c_Y]$$
Range: $[0, 1]$
Hyper-Lift
Adaptation of the lift measure which is more robust for low counts using a hypergeometric count model. Hyper-lift is defined as: $$\textit{hyper-lift}_\delta(X \Rightarrow Y) = \frac{c_{XY}}{Q_{\delta}[C_{XY}]} $$ where $c_{XY}$ is the number of transactions containing $X$ and $Y$ and $Q_{\delta}[C_{XY}]$ is the quantile of the hypergeometric distribution with parameters $c_X$ and $c_Y$ given by $\delta$ (typically the 99 or 95% quantile).
Hyper-lift can be used to calcualte confidence intervals for the lift measure. For example, the $\gamma = .95$ confidence interval is given by $[\textit{hyper-lift}_{\delta=\frac{1-\gamma}{2}}, \textit{hyper-lift}_{\delta=\frac{1+\gamma}{2}}]$.
Range: $[0, \infty]$ (1 indicates independence)
Imbalance Ratio (IR)
IR, introduced by Wu, Chen and Han, (2010), gauges the degree of imbalance between two events that the LHS and the RHS are contained in a transaction. The ratio is close to 0 if the conditional probabilities are similar (i.e., very balanced) and close to 1 if they are very different. It is defined as $$IB(X \Rightarrow Y) = \frac{|P(X|Y) - P(Y|X)|}{P(X|Y) + P(Y|X) - P(X|Y)P(Y|X))} = \frac{|supp(X) - supp(Y)|}{supp(X) + supp(Y) - supp(X \cup Y)}$$
Range: $[0, 1]$ (0 indicates a balanced, typically uninteresting rule)
Implication Index
Gras R., L'implication statistique. Nouvelle methode exploratoire de donnees. La Pensee Sauvage, Grenoble. 1996. A variation of the Lerman similarity defined as $$gras(X \Rightarrow Y) = \sqrt{N} \frac{supp(X \cup \overline{Y}) - supp(X)supp(\overline{Y})}{\sqrt{supp(X)supp(\overline{Y})}}$$
Range: $[0, 1]$
Importance
In MS Analysis Services confidence is called "probability" and a measure called importance is defined as the log likelihood of the right-hand side of the rule, given the left-hand side of the rule: $$importance(X \Rightarrow Y) = log_{10}(L(X \Rightarrow Y) / L(X \Rightarrow \overline{Y}))$$ where $L$ is the Laplace corrected confidence.
Range: $[-\infty, \infty]$
Improvement
The improvement of a rule is the minimum difference between its confidence and the confidence of any proper sub-rule with the same consequent. The idea is that we only want to extend the LHS of the rule if this improves the rule sufficiently. $$improvement(X \Rightarrow Y) = min_{X' \subset X}(conf(X \Rightarrow Y) - conf(X' \Rightarrow Y))$$
Range: $[0, 1]$
Jaccard coefficient
Tan et al. (2004), also called Coherence by Wu et al. (2010)
A null-invariant measure for dependence using the Jaccard similarity between the two sets of transactions that contain the items in $X$ and $Y$, respectively. Defined as $$jaccard(X \Rightarrow Y) = \frac{supp(X \cup Y)}{supp(X) + supp(Y) - supp(X \cup Y)}= \frac{P(X \cap Y)}{P(X)+P(Y)-P(X \cap Y)} $$
Range: $[0, 1]$
J-Measure (J)
Smyth, Padhraic and Rodney M. Goodman (1991). Rule Induction Using Information Theory. Knowledge Discovery in Databases, 159-176.
A scaled measures of cross entropy to measure the information content of a rule. $$J(X \Rightarrow Y) = P(X \cap Y) log(\frac{P(Y|X)}{P(Y)}) + P(X \cap \overline{Y})log(\frac{P(\overline{Y}|X)}{P(\overline{Y})}) $$
Range: $[0, 1]$ (0 means that $X$ does not provide information for $Y$)
Kappa ($\kappa$)
Cohen's kappa of the rule (seen as a classifier) given as the rules observed rule accuracy (i.e., confidence) corrected by the expected accuracy (of a random classifier). Kappa is defined as $$\kappa(X \Rightarrow Y) = \frac{P(X \cap Y) + P(\overline{X} \cap \overline{Y}) - P(X)P(Y) - P(\overline{X})P(\overline{Y})}{1- P(X)P(Y) - P(\overline{X})P(\overline{Y})} $$
Range: $[-1,1]$ (0 means the rule is not better than a random classifier)
Klosgen
Defined as $$klosgen(X \Rightarrow Y) = \sqrt{supp(X \cup Y)}\,(conf(X \Rightarrow Y) - supp(Y)) = \sqrt{P(X \cap Y)}\, (P(Y|X) - P(Y)) $$
Range: $[-1, 1]$ (0 for independence)
Kulczynski
Wu, Chen and Han, (2010) based on Kulczynski (1927)
Calculate the null-invariant Kulczynski measure with a preference for skewed patterns. $$kulc(X \Rightarrow Y) = \frac{1}{2} \left(conf(X \Rightarrow Y) + conf(Y \Rightarrow X) \right) = \frac{1}{2} \left(\frac{supp(X \cup Y)}{supp(X)} + \frac{supp(X \cup Y)}{supp(Y)} \right) = \frac{1}{2} \left(P(X | Y) + P(Y | X) \right) $$
Range: $[0, 1]$ (0.5 means neutral and typically uninteresting)
Goodman-Kruskal's Lambda ($\lambda$), Predictive Association
Goodman and Kruskal's lambda to assess the association between the LHS and RHS of the rule. $$\lambda(X \Rightarrow Y) = \frac{\Sigma_{x \in X} max_{y \in Y} P(x \cap y) - max_{y \in Y} P(y)} {n - max_{y \in Y} P(y)} $$
Range: $[0, 1]$
Laplace Corrected Confidence / Laplace Accuracy (L)
$$L(X \Rightarrow Y) = \frac{c_{XY}+1}{c_X+k},$$ where $k$ is the number of classes in the domain. For association rule $k$ is often set to 2. It is an approximate measure of the expected rule accuracy representing 1 - the Laplace expected error estimate of the rule. The Laplace corrected accuracy estimate decreases with lower support to account for estimation uncertainty with low counts.
Range: $[0, 1]$
Least Contradiction
Aze, J. and Y. Kodratoff (2004). Extraction de pepites de connaissances dans les donnees: Une nouvelle approche et une etude de sensibilite au bruit. In Mesures de Qualite pour la fouille de donnees. Revue des Nouvelles Technologies de l'Information, RNTI.
$$\textit{least-contradiction}(X \Rightarrow Y) = \frac{supp(X \cup Y) - supp(X \cup \overline{Y})}{supp(Y)}= \frac{P(X \cap Y) - P(X \cap \overline{Y})}{P(Y)}$$
Range: $[-1, 1]$
Lerman Similarity
Lerman, I.C. (1981). Classification et analyse ordinale des donnees. Paris.
Defined as $$lerman(X \Rightarrow Y) = \sqrt{n} \frac{supp(X \cup Y) - supp(X)supp(Y)}{\sqrt{supp(X)supp(Y)}} = \frac{c_{XY} - \frac{c_X c_Y}{n}}{\sqrt{\frac{c_X c_Y}{n}}}$$
Range: $[0, 1]$
Leverage, Piatetsky-Shapiro Measure (PS)
Introduced by Piatetsky-Shapiro, G., Discovery, analysis, and presentation of strong rules. Knowledge Discovery in Databases, 1991: p. 229-248. $$PS(X \Rightarrow Y) = leverage(X \Rightarrow Y) = supp(X \Rightarrow Y) - supp(X)supp(Y) = P(X \cap Y) - P(X)P(Y)$$ Leverage measures the difference of $X$ and $Y$ appearing together in the data set and what would be expected if $X$ and $Y$ were statistically dependent. The rational in a sales setting is to find out how many more units (items $X$ and $Y$ together) are sold than expected from the independent sells.
Using minimum leverage thresholds incorporates at the same time an implicit frequency constraint. E.g., for setting a min. leverage thresholds to 0.01% (corresponds to 10 occurrences in a data set with 100,000 transactions) one first can use an algorithm to find all itemsets with min. support of 0.01% and then filter the found item sets using the leverage constraint. Because of this property leverage also can suffer from the rare item problem.
Range: $[-1, 1]$ (0 indicates independence)
Lift, Interest
Introduced by S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynamic itemset counting and implication rules for market basket data. In Proc. of the ACM SIGMOD Int'l Conf. on Management of Data (ACM SIGMOD '97), pages 265-276, 1997.
Lift was originally called interest. It is defined as $$lift(X \Rightarrow Y) = lift(Y \Rightarrow X) = \frac{conf(X \Rightarrow Y)}{supp(Y)} = \frac{conf(Y \Rightarrow X)}{supp(X)} = \frac{P(X \cap Y)}{P(X)P(Y)}$$ Lift measures how many times more often $X$ and $Y$ occur together than expected if they were statistically independent. A lift value of 1 indicates independence between $X$ and $Y$. For statistical tests, see: Chi-squared,> Fisher's exact test, and hyper-confidence.
Lift is not downward closed and does not suffer from the rare item problem. However, lift is susceptible to noise in small databases. Rare itemsets with low counts (low probability) which by chance occur a few times (or only once) together can produce enormous lift values.
Range: $[0, \infty]$ (1 means independence)
MaxConfidence
Symmetric version of confidence (Wu et al., 2010, Tan et al. (2004)). Null-invariant measure defined as $$ maxConf(X \Rightarrow Y) = max\{ conf(X \Rightarrow Y),\ conf(Y \Rightarrow X) \} = max\{ P(Y | X),\ P(X | Y) \} $$
Range: $[0, 1]$
Mutual Information (M), Uncertainty
Measures the information gain for Y provided by X. $$M(X \Rightarrow Y) = \frac{\sum_{i \in \{X, \overline{X}\}} \sum_{j \in \{Y, \overline{Y}\}} \frac{c_{ij}}{n} log \frac{c_{ij}}{c_i c_j}}{min(-\sum_{i \in \{X, \overline{X}\}} \frac{c_i}{n} log \frac{c_i}{n}, -\sum_{j \in \{Y, \overline{Y}\}} \frac{c_j}{n} log \frac{c_j}{n})} = \frac{\sum_{i \in \{X, \overline{X}\}} \sum_{j \in \{Y, \overline{Y}\}} P(i \cap j) log \frac{P(i \cap j)}{P(i) P(j)}}{min(-\sum_{i \in \{X, \overline{X}\}} P(i) log P(i), -\sum_{j \in \{Y, \overline{Y}\}} P(j) log P(j))} $$
Range: $[0, 1]$ (0 means that X does not provide information for Y)
Odds Ratio ($\alpha$)
The odds of finding X in transactions which contain Y divided by the odds of finding X in transactions which do not contain Y. $$ \alpha(X \Rightarrow Y) = \frac{c_{XY} c_{\overline{X}\overline{Y}}} {c_{X\overline{Y}} c_{\overline{X}Y}} = \frac{P(X \cap Y) P(\overline{X} \cap \overline{Y})} {P(X \cap \overline{Y}) P(\overline{X} \cap Y)} $$
Range: $[0, \infty]$ (1 indicates that Y is not associated with X)
Correlation Coefficient $\phi$, Phi
Correlation coefficient between the transactions containing X and Y represented as two binary vectors. Phi correlation is equivalent to Pearson's Product Moment Correlation Coefficient $\rho$ with 0-1 values. $$\phi(X \Rightarrow Y) = \frac{n c_{XY} - c_X - c_Y}{\sqrt{c_X c_Y c_\overline{X} c_\overline{Y}}} $$
Range: $[-1, 1]$ (0 when X and Y are independent)}
Ralambondrainy Measure
J. Diatta, H. Ralambondrainy, and A. Totohasina (2007). Towards a unifying probabilistic implicative normalized quality measure for association rules. In Quality Measures in Data Mining, 237-250, 2007.
$$ ralambondrainy(X \Rightarrow Y) = \frac{c_{X\overline{Y}}}{n} = P(X \cap \overline{Y}) $$
Range: $[0, 1]$ (smaller is better)
Relative Linkage Disequilibrium (RLD)
Kenett, Ron and Silvia Salini (2008). Relative Linkage Disequilibrium: A New measure for association rules. In 8th Industrial Conference on Data Mining ICDM 2008, July 16-18, 2008, Leipzig/Germany.
RLD is an association measure motivated by indices used in population genetics. It evaluates the deviation of the support of the whole rule from the support expected under independence given the supports of X and Y.
Range: $[0, 1]$
Rule Power Factor
Ochin, Suresh, and Kumar, Nisheeth Joshi (2016). Rule Power Factor: A New Interest Measure in Associative Classification. 6th International Conference On Advances In Computing and Communications, ICACC 2016, 6-8 September 2016, Cochin, India.
Weights the confidence of a rule by its support. Defined as $$rpf(X \Rightarrow Y) = supp(X \cup Y) * conf(X \cup Y) $$
Range: $[0, 1]$
Sebag-Schoenauer measure
Sebag, M. and M. Schoenauer (1988). Generation of rules with certainty and confidence factors from incomplete and incoherent learning bases. In Proceedings of the European Knowledge Acquisition Workshop (EKAW'88), Gesellschaft fuer Mathematik und Datenverarbeitung mbH, 28.1-28.20.
Defined as $$sebag(X \Rightarrow Y) = \frac{conf(X \Rightarrow Y)}{conf(X \Rightarrow \overline{Y})} = \frac{supp(X \cup Y)}{supp(X \cup \overline{Y})} = \frac{P(X \cap Y)}{P(X \cap \overline{Y})} $$
Range: $[0, 1]$
Standardized Lift
P. D. McNicholas, T. B. Murphy, and M. O'Regan. 2008. Standardising the lift of an association rule. Comput. Stat. Data Anal. 52, 10 (June 2008), 4712-4721. DOI:10.1016/j.csda.2008.03.013
Standardized lift uses the minimum and maximum lift can reach for each rule to standardize lift between 0 and 1. The possible range of lift is given by the minimum $$\lambda = \frac{\mathrm{max}\{P(X) + P(Y) - 1, 1/n\}}{P(X)P(Y)}.$$ and the maximum $$\upsilon = \frac{1}{\mathrm{max}\{P(X), P(Y)\}}$$ The standardized lift is defined as $$\mathrm{stdLift}(X \Rightarrow Y) = \frac{\mathrm{lift}(X \Rightarrow Y) - \lambda}{ \upsilon - \lambda}.$$ The standardized lift measure can be corrected for minimum support and minimum confidence used in rule mining by replacing the minimum bound $\lambda$ with $$\lambda^* = \mathrm{max}\left\{\lambda, \frac{4s}{(1+s)^2}, \frac{s}{P(X)P(Y)}, \frac{c}{P(Y)}\right\}.$$
Range: $[0, 1]$
Varying Rates Liaison
Bernard, Jean-Marc and Charron, Camilo (1996). L'analyse implicative bayesienne, une methode pour l'etude des dependances orientees. II : modele logique sur un tableau de contingence Mathematiques et Sciences Humaines, Volume 135 (1996), p. 5-18.
Defined as the lift of a rule minus 1 so 0 represents independence. $$VRL(X \Rightarrow Y) = lift(X \Rightarrow Y) -1 $$
Range: $[-1, \infty]$ (0 for independence)
Yule's Q and Yule's Y
Defined as $$ Q(X \Rightarrow Y) = \frac{\alpha-1}{\alpha+1} $$ $$ Y(X \Rightarrow Y) = \frac{\sqrt{\alpha}-1}{\sqrt{\alpha}+1} $$ where $\alpha$ is the odds ratio.
Range: $[-1, 1]$
© Michael Hahsler, last modified