Instructor: Dr. Michael Hahsler
Syllabus and assignments: Canvas

All course material is provided under Creative Commons Attribution 4.0 International (CC BY 4.0)

Course Description

Analytics is based on collecting, managing, exploring and acting on large amounts of data and has become a source of competitive advantage for many organizations. This course provides an overview of descriptive analytics and introduces major data-mining techniques (classification, association analysis and cluster analysis) used in predictive analytics. All material covered will be reinforced through hands-on experience using state-of-the art tools to design and execute data mining processes.

Prerequisites:

  • Programming skills (e.g, CS 1342)
  • Knowledge of basic probability theory and statistics (e.g., CS 4340/EMIS 3340/STAT 4340, CS 7370/EMIS 7370)

Outline

  1. Introduction
  2. Data and Exploration
  3. Clustering
  4. Classification
  5. Association Analysis
  6. Advanced Topics

Textbooks

  • Introduction to Data Mining by Pang-Ning Tan, Michael Steinbach and Vipin Kumar, Addison Wesley, 1st or 2nd edition, 2019. (Book web page) [required]
  • An R Companion for Introduction to Data Mining by Michael Hahsler, 2021. (free web book) [required]
  • R for Data Science by Garrett Grolemund and Hadley Wickham, 2017. (free web book)
  • Data Visualization: A practical introduction by Kieran Healy, 2019. (free web book)
  • Applied Predictive Modeling (with Examples using R and caret) by Max Kuhn and Kjell Johnson, Springer, 2013. Book Web Page, fulltext via SpringerLink (free download on campus)
  • The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani and Jerome Friedman, 2nd edition, Springer, 2009. Book Web Page, fulltext via SpringerLink (free download on campus)
  • Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2011. (free e-book)

Projects

Project assignments are available on Canvas.

Models, Code and Data Used in Class

  • Download data sets (iris, Fishery, etc. ) for class here.
  • R Code accompanying the text book and additional R Code used in class.
  • For a fast and simple way to copy tables from R to Word use
    R> source(“http://michael.hahsler.net/SMU/EMIS7332/R/copytable.R”)
  • For tables in latex use package xtable

Tool

Learning resources

Data Sets