Journal of Statistical Software (Oct 2019)

dbscan: Fast Density-Based Clustering with R

  • Michael Hahsler,
  • Matthew Piekenbrock,
  • Derek Doran

DOI
https://doi.org/10.18637/jss.v091.i01
Journal volume & issue
Vol. 91, no. 1
pp. 1 – 30

Abstract

Read online

This article describes the implementation and use of the R package dbscan, which provides complete and fast implementations of the popular density-based clustering algorithm DBSCAN and the augmented ordering algorithm OPTICS. Package dbscan uses advanced open-source spatial indexing data structures implemented in C++ to speed up computation. An important advantage of this implementation is that it is up-to-date with several improvements that have been added since the original algorithms were publications (e.g., artifact corrections and dendrogram extraction methods for OPTICS). We provide a consistent presentation of the DBSCAN and OPTICS algorithms, and compare dbscan's implementation with other popular libraries such as the R package fpc, ELKI, WEKA, PyClustering, SciKit-Learn, and SPMF in terms of available features and using an experimental comparison.

Keywords