A Unified View on DBSCAN, JP Clustering and SNN Clustering

Code EMIS/CSE 8331 by Michael Hahsler

Common components are

  • Denoising using local KDE (typically a uniform kernel).
  • Calculating similarity as a graph.
  • Applying a threshold to the similarity graph and finding connected components.
library(dbscan)
library(igraph)
## 
## Attaching package: 'igraph'
## The following objects are masked from 'package:stats':
## 
##     decompose, spectrum
## The following object is masked from 'package:base':
## 
##     union

Remove noise (DBSCAN)

remove_noise <- function(x, eps, minPts)
  x[!sapply(frNN(x, eps)$id, length)+1L < minPts, ]

JP clustering and SNN clustering

nn_SNN <- function(x, k, kt) {
  nn <- sNN(x, k=k)
  id <- lapply(1:nrow(nn$id),
    FUN = function(i) nn$id[i, (nn$shared[i,]) >= kt])
}

Directly density-reachable (DBSCAN)

nn_reachability <- function(x, eps) frNN(x, eps)$id

# use igraph components to find components
connected <- function(nn)
  components(graph_from_adj_list(nn, mode = "total"))$membership

Create dataset

library(mlbench)
set.seed(1234)
x <- mlbench.spirals(500,1,0.05)
plot(x)

x <- x$x

DBSCAN*

cl <- dbscan(x, eps = .1, minPts = 4, borderPoints = FALSE)
cl
## DBSCAN clustering for 500 objects.
## Parameters: eps = 0.1, minPts = 4
## The clustering contains 2 cluster(s) and 16 noise points.
## 
##   0   1   2 
##  16 240 244 
## 
## Available fields: cluster, eps, minPts
plot(x, col = cl$cluster + 1L, xlim = c(-1,1), ylim = c(-1,1))