EMIS, Lyle School of Eng., SMU

EMIS/CSE 8331: Data Mining, Spring 2017

Dr. Michael Hahsler
TTh 2:00-3:20pm, Junkins 101

> Course syllabus
> Canvas

News

Course Description

This second course in data mining focuses on understanding the research methods used in the field of data mining. The course targets students who want to gain in-dept knowledge about a particular data mining topic (e.g., PhD students who plan to use a data mining component in their research).

Prerequisites: Successful completion an introductory data mining course like EMIS/CSE 7331 (till Fall 2016 EMIS 7332). It is assumed that every student is familiar with all the basic data mining topics (clustering, classification, and association rules) and has some experience with programming and one or more data mining tools (R, RapidMiner, SAS, SPSS, Weka, XLMiner, etc.). Enrolling in this course requires department/instructor consent.

Assignments

Tutorial Topics

Date Presenter Tutorial Abstract Additional material (code, software, etc.)
2/9 Michael Hahsler Data Stream Mining R package stream, MOA, Apache Spark Streaming, Apache Storm, Apache Samza, Apache SAMOA, IBM Streams, MS Azure Stream Analytics
2/16 Xinxiang Zhang Deep Learning Abstract
2/23 Anyu Zhang Medical Image Mining
3/2 Farzad Kamalzadeh Markov Models in Health Care
3/9 Michael Prappas Natural Lanuage Processing
3/23 Scott Eisenhart Recommender Systems
3/30 Andrew Cranmer Image Mining - Local Binary Patterns
4/6 Harold Mitchell Big Data Stream Mining
4/13 Revant Reddy Katanguri Big Data Technologies
4/20 Ben Brock Mining Applications for the Internet of Things

Papers for Review

Due Date Paper Additional Material (code, papers, etc.)
2/2 Clustering Using a Similarity Measure Based on Shared Near Neighbors, IEEE Transactions on Computers 1973 Jarvis-Patrick clustering in R, SNN clustering paper
2/7 A density-based algorithm for discovering clusters in large spatial databases with noise, KDD'96 clustering with dbscan in R, more dbscan examples, a unified view
2/14 ABACUS: Mining Arbitrary Shaped Clusters from Large Datasets based on Backbone Identification, SDM'11 Partial ABACUS implementation, clustering images
2/21 Clustering of Time Series Subsequences is Meaningless, ICDM'03 Other papers: An Alternate Measure for Comparing Time Series Subsequence Clusters, Making Subsequence Time Series Clustering Meaningful.
Code: Using R for time series analysis, time series example code, subsequence clustering examples, subsequence clustering tests by Hala El-Ali
2/28 Temporal structure learning for clustering massive data streams in real-time, SDM'11 rEMM package, code example from paper
3/7 Integrating Classification and Association Rule Mining, KDD'98 LUCS-KDD Implementations, arulesCBA package

Reference Text (Recommended but not required)

Tutorials from Previous Semesters

Links

Conferences: Journals: Digital Libraries: Data Mining Competitions and Data Sets

High performance computing

Tools for distance students


Michael Hahsler
Last modified: