EMIS/CSE 5/7331: Data Mining,
Fall 2017
News

Some code to get you started with Project 1 (conversion of pay is now fixed).

Some code to get you started with Project 2.

Some code to get you started with Project 3.
Some more code for Project 3.

Some code to get you started with Project 4.
Course Description
Analytics is based on collecting, managing, exploring and acting on large amounts of data and has become a source of competitive advantage for many organizations. This course provides an overview of descriptive analytics and introduces major datamining techniques (classification, association analysis and cluster analysis) used in predictive analytics. All material covered will be reinforced through handson experience using stateofthe art tools to design and execute data mining processes.
Prerequisites:
Introduction to programming (CSE 1342 or any programming language),
probability and statistics (CSE 4340/EMIS 3340/STAT 4340 or CSE 7370/EMIS 7370),
databases (EMIS 3309 or CSE 3330, requirement for undergraduates only).
Outline
 Introduction
 Read Syllabus

Introduction to Data Mining (Ch. 1)
 Tools
 Data and Exploration
 Data (Ch. 2)
 Exploration (Ch. 3)
 Classification
 Classification, Classification Trees and Evaluation (Ch. 4)
 Alternative Classification Methods (Ch. 5)
 Association Analysis
 Association Analysis (Ch. 6)
 Clustering
 Advanced Topics
Textbooks

Introduction to Data Mining by PangNing Tan, Michael Steinbach and Vipin Kumar, Addison Wesley, 2006 or 2014 edition.
Book Web Page,
Code Examples
[required]

Applied Predictive Modeling (with Examples using R and caret)
by Max Kuhn and Kjell Johnson, Springer, 2013.
Book Web Page,
fulltext via SpringerLink (free download on campus)

The Elements of Statistical Learning by Trevor Hastie, Robert Tibshirani
and Jerome Friedman, 2nd edition, Springer, 2009. Book Web Page,
fulltext via SpringerLink (free download on campus)

Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2011.
Book Web Page
 R for Data Science by Garrett Grolemund and
Hadley Wickham. (free ebook)
Projects
Project assignments are available on Canvas.
Models, Code and Data Used in Class
 Download data sets (iris, Fishery, etc. ) for class here.
 R Code accompanying the text book and
additional R Code used in class.
 For a fast and simple way to copy tables from R to Word use
R> source("http://michael.hahsler.net/SMU/EMIS7332/R/copytable.R")
 For tables in latex use package xtable
Tool
Learning resources
Data Sets
Michael Hahsler, Lyle School of Engineering, SMU
Last modified: