Software
I am the lead developer and maintainer of several widely used extension packages for the R software environment for statistical computing and graphics. R and Python are the two most widely used platforms for Data Science, ML, and AI.
My R packages are also available on R-universe.
Development versions of our software are available on GitHub.
Python packages published via the Python Package Index.
Software Sections
- Association Rule Mining
- Bioinformatics
- Combinatorial Optimization
- Data Stream Mining
- Recommender Systems
- Automated Planning and Optimal Control
- Educational Software
Association Rule Mining
-
arules: Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005) <doi:10.18637/jss.v014.i15>.
-
arulesViz: Extends package arules with various visualization techniques for association rules and itemsets. The package also includes several interactive visualizations for rule exploration. Michael Hahsler (2017) <doi:10.32614/RJ-2017-047>.
-
arulesSequences: Add-on package to handle and mine frequent sequences (lead developer: Christian Buchta).
-
arulesCBA: Provides the infrastructure for association rule-based classification including the algorithms CBA, CMAR, CPAR, C4.5 FOIL, PART, PRM, RCAR, and RIPPER to build associative classifiers. <doi: 10.32614/RJ-2019-048>
-
arulesNBMiner: NBMiner is an implementation of the model-based mining algorithm for mining NB-frequent itemsets and NB-precise rules. Michael Hahsler (2006) <doi: 10.1007/s10618-005-0026-2> (preprint).
Bioinformatics
-
rBLAST: Interfaces the Basic Local Alignment Search Tool (BLAST) to search genetic sequence databases with the Bioconductor infrastructure. [intro]
-
rRDP: Seamlessly interfaces the Ribosomal Database Project (RDP) classifier (version 2.9) which implements a Naive Bayesian Classifier (NBC) for biological sequences. [intro]
-
rMSA: Interface for Popular Multiple Sequence Alignment Tools like ClustalW, MAFFT, MUSCLE and Kalign. [intro]
-
QuasiAlign: Efficient sequence alignment using alignment-free methods. [Prototype at R-Forge]
Combinatorial Optimization
-
dbscan: A fast reimplementation of several density-based algorithms of the DBSCAN family for spatial data. Includes the DBSCAN (density-based spatial clustering of applications with noise) and OPTICS (ordering points to identify the clustering structure) clustering algorithms HDBSCAN (hierarchical DBSCAN) and the LOF (local outlier factor) algorithm. The implementations use the kd-tree data structure (from library ANN) for faster k-nearest neighbor search. An R interface to fast kNN and fixed-radius NN search is also provided. See Hahsler M, Piekenbrock M and Doran D (2019) <doi:10.18637/jss.v091.i01>.
-
seriation: Infrastructure for ordering objects with an implementation of several seriation/sequencing/ordination techniques to reorder matrices, dissimilarity matrices, and dendrograms. Also provides (optimally) reordered heatmaps, color images and clustering visualizations like dissimilarity plots, and visual assessment of cluster tendency plots (VAT and iVAT). Hahsler, Hornik and Buchta (2008) <doi:10.18637/jss.v025.i03>.
-
TSP: Basic infrastructure and some algorithms for the traveling salesperson problem (also traveling salesman problem; TSP). The package provides some simple algorithms and an interface to the Concorde TSP solver and its implementation of the Chained-Lin-Kernighan heuristic. The code for Concorde itself is not included in the package and has to be obtained separately. Hahsler and Hornik (2007) <doi:10.18637/jss.v023.i02>.
-
qap: Implements heuristics for the Quadratic Assignment Problem (QAP). Currently, only a simulated annealing heuristic by Burkard and Rendl (1984) <doi: 10.1016/0377-2217(84)90231-5> is available .
Data Stream Mining
-
stream: Infrastructure for data stream mining (supported in part by NSF IIS-0948893 and NIH R21HG005912). Hahsler, Bolaños and Forrest (2017) <doi: 10.18637/jss.v076.i14>.
-
streamConnect: Adds functionality to connect stream mining components from package stream using sockets and Web services. The package can be used create distributed workflows and create plumber-based Web services which can be deployed on most common cloud services.
-
streamMOA: Interface to MOA’s data stream clustering algorithms. [ intro]
-
rEMM: Implementation of the Extensible Markov Model (EMM) which adds a temporal component to data stream clustering by superimposing a dynamically adapting Markov Chain. This research is supported by NSF Grant 0948893. Hahsler and Dunham (2010) <doi: 10.18637/jss.v035.i05>.
Recommender Systems
- recommenderlab:
Provides the infrastructure to test and develop recommender algorithms.
Currently contains implementations for UBCF, IBCF, FunkSVD, popular items
and association rules and supports a wide range of evaluation techniques.
[
intro, online example]
Automated Planning and Optimal Control
-
pomdp: Provides the infrastructure to define, solve, and analyze the solutions of Partially Observable Markov Decision Processes (POMDP) models.
-
pomdpSolve: The package provides the code for pomdp-solve to solve POMDPs using a variety of exact and approximate value iteration algorithms developed by Anthony R. Cassandra.
Educational Software
- Data Mining: R code examples for the textbook Introduction to Data Mining (Tan/Steinbach/Kumar).
- Artifical Intelligence: Slides, simple Python code examples and exercises for an introduction course to AI using the textbook Artificial Intelligence: A Modern Approach (Russell/Norvig).
- fit_dist: A simple R script to fit distributions to data.
- Gridhunt2: A simple game used to teach students basic OO concepts (encapsulation, composition, inheritance, and polymorphism) using C++.