Code examples for the paper:

Bing Liu, Wynne Hsu and Yiming Ma, Integrating Classification and Association Rule Mining, ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 1998.

set.seed(1234)
library("tidyverse")

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──

## ✓ ggplot2 3.3.5     ✓ purrr   0.3.4
## ✓ tibble  3.1.6     ✓ dplyr   1.0.7
## ✓ tidyr   1.1.4     ✓ stringr 1.4.0
## ✓ readr   2.1.0     ✓ forcats 0.5.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library("arulesCBA")

## Loading required package: Matrix

## 
## Attaching package: 'Matrix'

## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack

## Loading required package: arules

## 
## Attaching package: 'arules'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

## 
## Attaching package: 'arulesCBA'

## The following object is masked from 'package:arules':
## 
##     rules

library("arulesViz")
library("caret")  # for confusionMatrix

## Loading required package: lattice

## 
## Attaching package: 'caret'

## The following object is masked from 'package:purrr':
## 
##     lift

Examples Using the Iris Data Set

Prepare Data

data(iris)

Train/test split. Train classifier on 90% of the data

train <- sample(1:nrow(iris), size = as.integer(nrow(iris) * .9))

Discretize using Fayyad and Irani’s MDL method

iris.disc <- iris %>% discretizeDF.supervised(Species ~ ., data = ., method = "mdlp")
iris.disc %>% head()

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1  [-Inf,5.55) [3.35, Inf]  [-Inf,2.45)  [-Inf,0.8)  setosa
## 2  [-Inf,5.55) [2.95,3.35)  [-Inf,2.45)  [-Inf,0.8)  setosa
## 3  [-Inf,5.55) [2.95,3.35)  [-Inf,2.45)  [-Inf,0.8)  setosa
## 4  [-Inf,5.55) [2.95,3.35)  [-Inf,2.45)  [-Inf,0.8)  setosa
## 5  [-Inf,5.55) [3.35, Inf]  [-Inf,2.45)  [-Inf,0.8)  setosa
## 6  [-Inf,5.55) [3.35, Inf]  [-Inf,2.45)  [-Inf,0.8)  setosa

iris.transactions <- transactions(iris.disc)

What is the support that requires a minimum of 5 observations?

5 / length(train)

## [1] 0.03703704

Build A Naive Association Rule-Based Classifier

Let’s just

mine class association rules,
find all non-redundant rules,
create a rule base prioritized by confidence and then
use the first matching rule for classification.

cars <- mineCARs(Species ~ ., transactions = iris.transactions[train,] , support = 0.04)

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE           FALSE       5    0.04      1
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 5 
## 
## set item appearances ...[15 item(s)] done [0.00s].
## set transactions ...[15 item(s), 135 transaction(s)] done [0.00s].
## sorting and recoding items ... [15 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [82 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

cars <- cars[!is.redundant(cars)]
rulebase <- cars %>% sort(by = "confidence")
inspectDT(rulebase)

classifier <- CBA_ruleset(Species ~ ., rulebase)
classifier

## CBA Classifier Object
## Formula: Species ~ .
## Number of rules: 26
## Default Class: NA
## Classification method: first  
## Description: Custom rule set

pr <- predict(classifier, iris.transactions[-train,])

confusionMatrix(reference = iris[-train, ]$Species, data = pr)

## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa          7          0         0
##   versicolor      0          3         1
##   virginica       0          2         2
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8             
##                  95% CI : (0.5191, 0.9567)
##     No Information Rate : 0.4667          
##     P-Value [Acc > NIR] : 0.009047        
##                                           
##                   Kappa : 0.6875          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                 1.0000            0.6000           0.6667
## Specificity                 1.0000            0.9000           0.8333
## Pos Pred Value              1.0000            0.7500           0.5000
## Neg Pred Value              1.0000            0.8182           0.9091
## Prevalence                  0.4667            0.3333           0.2000
## Detection Rate              0.4667            0.2000           0.1333
## Detection Prevalence        0.4667            0.2667           0.2667
## Balanced Accuracy           1.0000            0.7500           0.7500

This is not a very good classifier and it uses many rules. CBA reduces the number of rules in the rule base to make a better and more compact classifier using the missclassification error.

Build a CBA Model

This prepares the data using Fayyad and Irani’s MDL discretization method

cba_model <-
  CBA(Species ~ .,
    data = iris[train, ],
    supp = 0.03,
    conf = 0.5,
    disc.method = "mdlp",
    verbose = TRUE)

## 
## Mining CARs...
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE           FALSE       5    0.03      1
##  maxlen target  ext
##       5  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 4 
## 
## set item appearances ...[15 item(s)] done [0.00s].
## set transactions ...[15 item(s), 135 transaction(s)] done [0.00s].
## sorting and recoding items ... [15 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [98 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## 
## Pruning CARs...
## CARs left: 7

cba_model

## CBA Classifier Object
## Formula: Species ~ .
## Number of rules: 7
## Default Class: NA
## Classification method: first  
## Description: CBA algorithm (Liu et al., 1998)

CBA chooses only these rules

inspectDT(rules(cba_model))

Make predictions for the test data

pr <- predict(cba_model, newdata = iris[-train, ])
pr

##  [1] setosa     setosa     setosa     setosa     setosa     setosa    
##  [7] setosa     virginica  versicolor versicolor virginica  versicolor
## [13] versicolor virginica  virginica 
## Levels: setosa versicolor virginica

confusionMatrix(reference = iris[-train, ]$Species, data = pr)

## Confusion Matrix and Statistics
## 
##             Reference
## Prediction   setosa versicolor virginica
##   setosa          7          0         0
##   versicolor      0          3         1
##   virginica       0          2         2
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8             
##                  95% CI : (0.5191, 0.9567)
##     No Information Rate : 0.4667          
##     P-Value [Acc > NIR] : 0.009047        
##                                           
##                   Kappa : 0.6875          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: setosa Class: versicolor Class: virginica
## Sensitivity                 1.0000            0.6000           0.6667
## Specificity                 1.0000            0.9000           0.8333
## Pos Pred Value              1.0000            0.7500           0.5000
## Neg Pred Value              1.0000            0.8182           0.9091
## Prevalence                  0.4667            0.3333           0.2000
## Detection Rate              0.4667            0.2000           0.1333
## Detection Prevalence        0.4667            0.2667           0.2667
## Balanced Accuracy           1.0000            0.7500           0.7500

Compare Methods on the Zoo Data Set

Prepare Data

data(Zoo, package = "mlbench")
head(Zoo)

##           hair feathers  eggs  milk airborne aquatic predator toothed backbone
## aardvark  TRUE    FALSE FALSE  TRUE    FALSE   FALSE     TRUE    TRUE     TRUE
## antelope  TRUE    FALSE FALSE  TRUE    FALSE   FALSE    FALSE    TRUE     TRUE
## bass     FALSE    FALSE  TRUE FALSE    FALSE    TRUE     TRUE    TRUE     TRUE
## bear      TRUE    FALSE FALSE  TRUE    FALSE   FALSE     TRUE    TRUE     TRUE
## boar      TRUE    FALSE FALSE  TRUE    FALSE   FALSE     TRUE    TRUE     TRUE
## buffalo   TRUE    FALSE FALSE  TRUE    FALSE   FALSE    FALSE    TRUE     TRUE
##          breathes venomous  fins legs  tail domestic catsize   type
## aardvark     TRUE    FALSE FALSE    4 FALSE    FALSE    TRUE mammal
## antelope     TRUE    FALSE FALSE    4  TRUE    FALSE    TRUE mammal
## bass        FALSE    FALSE  TRUE    0  TRUE    FALSE   FALSE   fish
## bear         TRUE    FALSE FALSE    4 FALSE    FALSE    TRUE mammal
## boar         TRUE    FALSE FALSE    4  TRUE    FALSE    TRUE mammal
## buffalo      TRUE    FALSE FALSE    4  TRUE    FALSE    TRUE mammal

discretize legs manually

Zoo$legs <- Zoo$legs > 0

Train classifier on 90% of the data

train <- sample(1:nrow(Zoo), size = as.integer(nrow(Zoo) * .9))

Run Methods

Note: CBA could use a lower minimum support.

models <- c(CBA, C4.5_CBA, FOIL, RIPPER_CBA, PART_CBA)

accuracy <- list()
nrules <- list()

for (m in models) {
  cat("\n---------------------------------------------------------------------\n\n")
  
  fit <- m(type ~ ., data = Zoo[train, ])
  print(fit)
 
  cat("\nRule base:\n")
  inspect(rules(fit))
   
  pr <- predict(fit, newdata = Zoo[-train, ])
  
  cat("\n")
  cm <- confusionMatrix(reference = Zoo[-train, ]$type, data = pr)
  print(cm)
  
  accuracy[[fit$description]] <- cm$overall["Accuracy"]
  nrules[[fit$description]] <- length(rules(fit))
  length(rules(fit))
}

## 
## ---------------------------------------------------------------------
## 
## CBA Classifier Object
## Formula: type ~ .
## Number of rules: 4
## Default Class: NA
## Classification method: first  
## Description: CBA algorithm (Liu et al., 1998)
## 
## 
## Rule base:
##     lhs                            rhs                  support   confidence
## [1] {milk=TRUE}                 => {type=mammal}        0.4000000 1.0       
## [2] {feathers=TRUE}             => {type=bird}          0.2222222 1.0       
## [3] {breathes=FALSE, fins=TRUE} => {type=fish}          0.1111111 1.0       
## [4] {}                          => {type=mollusc.et.al} 0.1000000 0.1       
##     coverage  lift count size coveredTransactions totalErrors
## [1] 0.4000000 2.5  36    2    36                  34         
## [2] 0.2222222 4.5  20    2    20                  24         
## [3] 0.1111111 9.0  10    3    10                  15         
## [4] 1.0000000 1.0  90    1    24                  15         
## 
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      mammal bird reptile fish amphibian insect mollusc.et.al
##   mammal             5    0       0    0         0      0             0
##   bird               0    0       0    0         0      0             0
##   reptile            0    0       0    0         0      0             0
##   fish               0    0       0    3         0      0             0
##   amphibian          0    0       0    0         0      0             0
##   insect             0    0       0    0         0      0             0
##   mollusc.et.al      0    0       0    0         1      1             1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.8182          
##                  95% CI : (0.4822, 0.9772)
##     No Information Rate : 0.4545          
##     P-Value [Acc > NIR] : 0.01598         
##                                           
##                   Kappa : 0.7381          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: mammal Class: bird Class: reptile Class: fish
## Sensitivity                 1.0000          NA             NA      1.0000
## Specificity                 1.0000           1              1      1.0000
## Pos Pred Value              1.0000          NA             NA      1.0000
## Neg Pred Value              1.0000          NA             NA      1.0000
## Prevalence                  0.4545           0              0      0.2727
## Detection Rate              0.4545           0              0      0.2727
## Detection Prevalence        0.4545           0              0      0.2727
## Balanced Accuracy           1.0000          NA             NA      1.0000
##                      Class: amphibian Class: insect Class: mollusc.et.al
## Sensitivity                   0.00000       0.00000              1.00000
## Specificity                   1.00000       1.00000              0.80000
## Pos Pred Value                    NaN           NaN              0.33333
## Neg Pred Value                0.90909       0.90909              1.00000
## Prevalence                    0.09091       0.09091              0.09091
## Detection Rate                0.00000       0.00000              0.09091
## Detection Prevalence          0.00000       0.00000              0.27273
## Balanced Accuracy             0.50000       0.50000              0.90000
## 
## ---------------------------------------------------------------------
## 
## CBA Classifier Object
## Formula: type ~ .
## Number of rules: 9
## Default Class: mollusc.et.al
## Classification method: first  
## Description: RWeka classifier weka/classifiers/trees/J48
## 
## 
## Rule base:
##     lhs                  rhs                     support confidence
## [1] {feathers=TRUE}   => {type=bird}          0.22222222  1.0000000
## [2] {feathers=FALSE,                                               
##      milk=TRUE}       => {type=mammal}        0.40000000  1.0000000
## [3] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      backbone=TRUE,                                                
##      fins=TRUE}       => {type=fish}          0.11111111  1.0000000
## [4] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      backbone=TRUE,                                                
##      fins=FALSE,                                                   
##      tail=TRUE}       => {type=reptile}       0.05555556  0.8333333
## [5] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      backbone=TRUE,                                                
##      fins=FALSE,                                                   
##      tail=FALSE}      => {type=amphibian}     0.02222222  1.0000000
## [6] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      airborne=TRUE,                                                
##      backbone=TRUE,                                                
##      backbone=FALSE,                                               
##      tail=FALSE}      => {type=insect}        0.00000000        NaN
## [7] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      airborne=FALSE,                                               
##      predator=TRUE,                                                
##      backbone=TRUE,                                                
##      backbone=FALSE,                                               
##      fins=FALSE}      => {type=mollusc.et.al} 0.00000000        NaN
## [8] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      airborne=FALSE,                                               
##      predator=FALSE,                                               
##      backbone=TRUE,                                                
##      backbone=FALSE,                                               
##      fins=FALSE,                                                   
##      legs=TRUE}       => {type=insect}        0.00000000        NaN
## [9] {feathers=FALSE,                                               
##      milk=FALSE,                                                   
##      airborne=FALSE,                                               
##      predator=FALSE,                                               
##      backbone=TRUE,                                                
##      backbone=FALSE,                                               
##      fins=FALSE,                                                   
##      legs=FALSE}      => {type=mollusc.et.al} 0.00000000        NaN
## 
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      mammal bird reptile fish amphibian insect mollusc.et.al
##   mammal             5    0       0    0         0      0             0
##   bird               0    0       0    0         0      0             0
##   reptile            0    0       0    0         0      0             0
##   fish               0    0       0    3         0      0             0
##   amphibian          0    0       0    0         1      0             0
##   insect             0    0       0    0         0      0             0
##   mollusc.et.al      0    0       0    0         0      1             1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9091          
##                  95% CI : (0.5872, 0.9977)
##     No Information Rate : 0.4545          
##     P-Value [Acc > NIR] : 0.00243         
##                                           
##                   Kappa : 0.869           
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: mammal Class: bird Class: reptile Class: fish
## Sensitivity                 1.0000          NA             NA      1.0000
## Specificity                 1.0000           1              1      1.0000
## Pos Pred Value              1.0000          NA             NA      1.0000
## Neg Pred Value              1.0000          NA             NA      1.0000
## Prevalence                  0.4545           0              0      0.2727
## Detection Rate              0.4545           0              0      0.2727
## Detection Prevalence        0.4545           0              0      0.2727
## Balanced Accuracy           1.0000          NA             NA      1.0000
##                      Class: amphibian Class: insect Class: mollusc.et.al
## Sensitivity                   1.00000       0.00000              1.00000
## Specificity                   1.00000       1.00000              0.90000
## Pos Pred Value                1.00000           NaN              0.50000
## Neg Pred Value                1.00000       0.90909              1.00000
## Prevalence                    0.09091       0.09091              0.09091
## Detection Rate                0.09091       0.00000              0.09091
## Detection Prevalence          0.09091       0.00000              0.18182
## Balanced Accuracy             1.00000       0.50000              0.95000
## 
## ---------------------------------------------------------------------
## 
## CBA Classifier Object
## Formula: type ~ .
## Number of rules: 13
## Default Class: mammal
## Classification method: weighted  - using best 5 rules
## Description: FOIL-based classifier (Yin and Han, 2003)
## 
## 
## Rule base:
##      lhs                  rhs                     support confidence     lift   laplace
## [1]  {milk=TRUE}       => {type=mammal}        0.40000000  1.0000000  2.50000 0.8604651
## [2]  {feathers=TRUE}   => {type=bird}          0.22222222  1.0000000  4.50000 0.7777778
## [3]  {eggs=TRUE,                                                                       
##       fins=TRUE}       => {type=fish}          0.11111111  1.0000000  9.00000 0.6470588
## [4]  {predator=FALSE,                                                                  
##       backbone=FALSE,                                                                  
##       legs=TRUE}       => {type=insect}        0.06666667  1.0000000 12.85714 0.5384615
## [5]  {aquatic=TRUE,                                                                    
##       backbone=FALSE}  => {type=mollusc.et.al} 0.06666667  1.0000000 10.00000 0.5384615
## [6]  {airborne=TRUE,                                                                   
##       backbone=FALSE}  => {type=insect}        0.05555556  1.0000000 12.85714 0.5000000
## [7]  {hair=FALSE,                                                                      
##       aquatic=FALSE,                                                                   
##       toothed=TRUE}    => {type=reptile}       0.03333333  1.0000000 18.00000 0.4000000
## [8]  {backbone=FALSE,                                                                  
##       legs=FALSE}      => {type=mollusc.et.al} 0.03333333  1.0000000 10.00000 0.4000000
## [9]  {toothed=TRUE,                                                                    
##       venomous=TRUE,                                                                   
##       fins=FALSE}      => {type=reptile}       0.02222222  1.0000000 18.00000 0.3333333
## [10] {aquatic=TRUE,                                                                    
##       breathes=TRUE,                                                                   
##       tail=FALSE}      => {type=amphibian}     0.02222222  0.6666667 20.00000 0.3000000
## [11] {aquatic=TRUE,                                                                    
##       breathes=TRUE,                                                                   
##       catsize=FALSE}   => {type=amphibian}     0.03333333  0.4285714 12.85714 0.2857143
## [12] {feathers=FALSE,                                                                  
##       toothed=FALSE,                                                                   
##       catsize=TRUE}    => {type=reptile}       0.01111111  0.5000000  9.00000 0.2222222
## [13] {eggs=FALSE,                                                                      
##       venomous=TRUE}   => {type=mollusc.et.al} 0.01111111  0.5000000  5.00000 0.2222222
## 
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      mammal bird reptile fish amphibian insect mollusc.et.al
##   mammal             5    0       0    0         0      0             0
##   bird               0    0       0    0         0      0             0
##   reptile            0    0       0    0         1      0             0
##   fish               0    0       0    3         0      0             0
##   amphibian          0    0       0    0         0      0             0
##   insect             0    0       0    0         0      1             0
##   mollusc.et.al      0    0       0    0         0      0             1
## 
## Overall Statistics
##                                           
##                Accuracy : 0.9091          
##                  95% CI : (0.5872, 0.9977)
##     No Information Rate : 0.4545          
##     P-Value [Acc > NIR] : 0.00243         
##                                           
##                   Kappa : 0.8706          
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: mammal Class: bird Class: reptile Class: fish
## Sensitivity                 1.0000          NA             NA      1.0000
## Specificity                 1.0000           1        0.90909      1.0000
## Pos Pred Value              1.0000          NA             NA      1.0000
## Neg Pred Value              1.0000          NA             NA      1.0000
## Prevalence                  0.4545           0        0.00000      0.2727
## Detection Rate              0.4545           0        0.00000      0.2727
## Detection Prevalence        0.4545           0        0.09091      0.2727
## Balanced Accuracy           1.0000          NA             NA      1.0000
##                      Class: amphibian Class: insect Class: mollusc.et.al
## Sensitivity                   0.00000       1.00000              1.00000
## Specificity                   1.00000       1.00000              1.00000
## Pos Pred Value                    NaN       1.00000              1.00000
## Neg Pred Value                0.90909       1.00000              1.00000
## Prevalence                    0.09091       0.09091              0.09091
## Detection Rate                0.00000       0.09091              0.09091
## Detection Prevalence          0.00000       0.09091              0.09091
## Balanced Accuracy             0.50000       1.00000              1.00000
## 
## ---------------------------------------------------------------------
## 
## CBA Classifier Object
## Formula: type ~ .
## Number of rules: 7
## Default Class: mammal
## Classification method: first  
## Description: RWeka classifier weka/classifiers/rules/JRip
## 
## 
## Rule base:
##     lhs                  rhs                     support confidence
## [1] {feathers=FALSE,                                               
##      aquatic=TRUE,                                                 
##      breathes=TRUE,                                                
##      catsize=FALSE}   => {type=amphibian}     0.03333333  1.0000000
## [2] {hair=FALSE,                                                   
##      toothed=TRUE,                                                 
##      fins=FALSE}      => {type=reptile}       0.04444444  0.5714286
## [3] {predator=FALSE,                                               
##      backbone=FALSE,                                               
##      legs=TRUE}       => {type=insect}        0.06666667  1.0000000
## [4] {backbone=FALSE}  => {type=mollusc.et.al} 0.10000000  0.5625000
## [5] {breathes=FALSE}  => {type=fish}          0.11111111  0.5882353
## [6] {feathers=TRUE}   => {type=bird}          0.22222222  1.0000000
## [7] {}                => {type=mammal}        0.40000000  0.4000000
## 
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      mammal bird reptile fish amphibian insect mollusc.et.al
##   mammal             5    0       0    0         0      0             0
##   bird               0    0       0    0         0      0             0
##   reptile            0    0       0    0         0      0             0
##   fish               0    0       0    3         0      0             0
##   amphibian          0    0       0    0         1      0             0
##   insect             0    0       0    0         0      1             0
##   mollusc.et.al      0    0       0    0         0      0             1
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.7151, 1)
##     No Information Rate : 0.4545     
##     P-Value [Acc > NIR] : 0.0001711  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: mammal Class: bird Class: reptile Class: fish
## Sensitivity                 1.0000          NA             NA      1.0000
## Specificity                 1.0000           1              1      1.0000
## Pos Pred Value              1.0000          NA             NA      1.0000
## Neg Pred Value              1.0000          NA             NA      1.0000
## Prevalence                  0.4545           0              0      0.2727
## Detection Rate              0.4545           0              0      0.2727
## Detection Prevalence        0.4545           0              0      0.2727
## Balanced Accuracy           1.0000          NA             NA      1.0000
##                      Class: amphibian Class: insect Class: mollusc.et.al
## Sensitivity                   1.00000       1.00000              1.00000
## Specificity                   1.00000       1.00000              1.00000
## Pos Pred Value                1.00000       1.00000              1.00000
## Neg Pred Value                1.00000       1.00000              1.00000
## Prevalence                    0.09091       0.09091              0.09091
## Detection Rate                0.09091       0.09091              0.09091
## Detection Prevalence          0.09091       0.09091              0.09091
## Balanced Accuracy             1.00000       1.00000              1.00000
## 
## ---------------------------------------------------------------------
## 
## CBA Classifier Object
## Formula: type ~ .
## Number of rules: 8
## Default Class: mammal
## Classification method: first  
## Description: RWeka classifier weka/classifiers/rules/PART
## 
## 
## Rule base:
##     lhs                  rhs                     support confidence
## [1] {feathers=FALSE,                                               
##      milk=TRUE}       => {type=mammal}        0.40000000 1.00000000
## [2] {feathers=TRUE}   => {type=bird}          0.22222222 1.00000000
## [3] {airborne=FALSE,                                               
##      predator=TRUE,                                                
##      backbone=FALSE}  => {type=mollusc.et.al} 0.07777778 1.00000000
## [4] {backbone=FALSE,                                               
##      legs=TRUE}       => {type=insect}        0.07777778 0.53846154
## [5] {fins=TRUE}       => {type=fish}          0.11111111 0.71428571
## [6] {backbone=TRUE,                                                
##      tail=TRUE}       => {type=reptile}       0.05555556 0.07575758
## [7] {aquatic=TRUE}    => {type=amphibian}     0.03333333 0.09677419
## [8] {}                => {type=mollusc.et.al} 0.10000000 0.10000000
## 
## Confusion Matrix and Statistics
## 
##                Reference
## Prediction      mammal bird reptile fish amphibian insect mollusc.et.al
##   mammal             5    0       0    0         0      0             0
##   bird               0    0       0    0         0      0             0
##   reptile            0    0       0    0         0      0             0
##   fish               0    0       0    3         0      0             0
##   amphibian          0    0       0    0         1      0             0
##   insect             0    0       0    0         0      1             0
##   mollusc.et.al      0    0       0    0         0      0             1
## 
## Overall Statistics
##                                      
##                Accuracy : 1          
##                  95% CI : (0.7151, 1)
##     No Information Rate : 0.4545     
##     P-Value [Acc > NIR] : 0.0001711  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
## 
## Statistics by Class:
## 
##                      Class: mammal Class: bird Class: reptile Class: fish
## Sensitivity                 1.0000          NA             NA      1.0000
## Specificity                 1.0000           1              1      1.0000
## Pos Pred Value              1.0000          NA             NA      1.0000
## Neg Pred Value              1.0000          NA             NA      1.0000
## Prevalence                  0.4545           0              0      0.2727
## Detection Rate              0.4545           0              0      0.2727
## Detection Prevalence        0.4545           0              0      0.2727
## Balanced Accuracy           1.0000          NA             NA      1.0000
##                      Class: amphibian Class: insect Class: mollusc.et.al
## Sensitivity                   1.00000       1.00000              1.00000
## Specificity                   1.00000       1.00000              1.00000
## Pos Pred Value                1.00000       1.00000              1.00000
## Neg Pred Value                1.00000       1.00000              1.00000
## Prevalence                    0.09091       0.09091              0.09091
## Detection Rate                0.09091       0.09091              0.09091
## Detection Prevalence          0.09091       0.09091              0.09091
## Balanced Accuracy             1.00000       1.00000              1.00000

Compare the results

cbind(
  accuracy = unlist(accuracy),
  nrules = unlist(nrules)
)

##                                                        accuracy nrules
## CBA algorithm (Liu et al., 1998).Accuracy             0.8181818      4
## RWeka classifier weka/classifiers/trees/J48.Accuracy  0.9090909      9
## FOIL-based classifier (Yin and Han, 2003).Accuracy    0.9090909     13
## RWeka classifier weka/classifiers/rules/JRip.Accuracy 1.0000000      7
## RWeka classifier weka/classifiers/rules/PART.Accuracy 1.0000000      8

CS 8331: Code for Classification based on Association Rules (CBA)

Michael Hahsler

2021-11-20

Examples Using the Iris Data Set

Prepare Data

Build A Naive Association Rule-Based Classifier

Build a CBA Model

Compare Methods on the Zoo Data Set

Prepare Data

Run Methods

Compare the results