CSE, Lyle School of Eng., SMU

CSE/EMIS 8331: Data Mining, Spring 2012

Dr. Michael Hahsler
MW 11-12:30pm, TBD

> Course Syllabus

Course Description

This is a second course in Data Mining. A prerequisite is successful completion of CSE 7331 or other Introductory Data Mining course. Please contact Dr. Hahsler if you have concerns or questions about this prerequisite. It is assumed that every student is familiar with the basic data mining topics (clustering, classification, and association rules) and has some experience with programming and one or more data mining tools (R, RapidMiner, Weka, XLMiner, etc.). The objective of this course is to get an overview of several advanced data mining techniques and understand the research methods applied in the field.


Project Presentation

Tutorial Topics

Date Topic Presenter
1/23 Data Stream Mining Michael Hahsler
2/13 Mining Time Series (slides) Xiaodian Xie
2/20 Text Mining (slides) Anurag Nagar
2/27 Data Stream Clustering (slides) Hadil Shaiba
3/5 Mining Music Data (slides) Tyler Kendrick
3/26 Social Network Mining (slides) Aliasgar Lanewala
4/2 Mining Large Data with Hadoop (slides) Yaseen Qadah
4/9 Web Usage Mining (video) Ryan Zauber
4/16 Data Mining in Biometrics (slides) John Howard
4/18 (watch on your own) Recommender Systems (video) Akshaya Aradhya V
4/18 (watch on your own) Data Mining for Wireless Communication (video, material) John Widhalm

Papers for Review

Date Paper Additional Material
1/25 MapReduce: Simplified Data Processing on Large Clusters, OSDI'04 my Hadoop installation notes, Linux shell basics
2/6 A density-based algorithm for discovering clusters in large spatial databases with noise, KDD'96 clustering in R
2/15 Clustering of Time Series Subsequences is Meaningless, ICDM'03 subsequence clustering in R
2/22 Concept Tree Based Ordering for Shaded Similarity Matrix, ICDM'02 ordering in R
2/29 Temporal structure learning for clustering massive data streams in real-time, SDM'11 rEMM package, code example from paper
3/7 ABACUS: Mining Arbitrary Shaped Clusters from Large Datasets based on Backbone Identification, SDM'11
3/28 Dissimilarity plots: A visual exploration tool for partitional clustering, JCGS, 2011 R package seriation, Dissimilarity plot in R
4/4 Discussion of paper "Further Improvements to the Statistical Hurricane Intensity Prediction Scheme (SHIPS)" with Dr. Mark DeMaria, Chief of the NESDIS Regional and Mesoscale Meteorology Branch, NOAA Original SHIPS paper, 1994
4/11 Visualizing Association Rules in Hierarchical Groups, Interface 2011 R package arulesViz
4/18 No paper and no class, watch tutorial videos (see above)

Reference Text (Recommended but not required)


Conferences: Journals: Digital Libraries: Data Mining Competitions and Data Sets

High performance computing

Tools for distance students

Michael Hahsler
Last modified: