[1] Charlie Isaksson, Margaret H. Dunham, and Michael Hahsler. SOStream: Self organizing density-based clustering over data stream. In International Conference on Machine Learning and Data Mining (MLDM'2012). Springer, July 2012. [ bib ]

[2] Vladimir Jovanovic, Margaret H. Dunham, Michael Hahsler, and Yu Su. Evaluating hurricane intensity prediction techniques in real time. In Third IEEE ICDM Workshop on Knowledge Discovery from Climate Data, Proceedings of the of the 2011 IEEE International Conference on Data Mining Workshops (ICDMW 2011). IEEE, December 2011. [ bib | .pdf ]
While the accuracy of hurricane track prediction has been improving, predicting intensity, the maximum sustained wind speed, is still a very difficult challenge. This is problematic because the destructive power of a hurricane is directly related to its intensity. In this paper, we present Prediction Intensity Interval model for Hurricanes (PIIH) which combines sophisticated data mining techniques to create an online real time model for accurate intensity predictions and we present a web-based framework to dynamically compare PIIH to operational models used by the National Hurricane Center (NHC). The created dynamic website tracks, compares, and provides visualization to facilitate immediate comparisons of prediction techniques. This paper is a work in progress paper reporting on both, new features of the PIIH model and online visualization of the accuracy of that model as compared to other techniques.

[3] Michael Hahsler and Margaret H. Dunham. Temporal structure learning for clustering massive data streams in real-time. In SIAM Conference on Data Mining (SDM11), pages 664-675. SIAM, April 2011. [ bib | .pdf ]
This paper describes one of the first attempts to model the temporal structure of massive data streams in real-time using data stream clustering. Recently, many data stream clustering algorithms have been developed which efficiently find a partition of the data points in a data stream. However, these algorithms disregard the information represented by the temporal order of the data points in the stream which for many applications is an important part of the data stream. In this paper we propose a new framework called Temporal Relationships Among Clusters for Data Streams (TRACDS) which allows to learn the temporal structure while clustering a data stream. We identify, organize and describe the clustering operations which are used by state-of-the-art data stream clustering algorithms. Then we show that by defining a set of new operations to transform Markov Chains with states representing clusters dynamically, we can efficiently capture temporal ordering information. This framework allows us to preserve temporal relationships among clusters for any state-of-the-art data stream clustering algorithm with only minimal overhead.

To investigate the usefulness of TRACDS, we evaluate the improvement of TRACDS over pure data stream clustering for anomaly detection using several synthetic and real-world data sets. The experiments show that TRACDS is able to considerably improve the results even if we introduce a high rate of incorrect time stamps which is typical for real-world data streams.

[4] Yu Su, Sudheer Chelluboina, Michael Hahsler, and Margaret H. Dunham. A new data mining model for hurricane intensity prediction. In Second IEEE ICDM Workshop on Knowledge Discovery from Climate Data: Prediction, Extremes and Impacts, Proceedings of the of the 2010 IEEE International Conference on Data Mining Workshops (ICDMW 2010), pages 98-105. IEEE, December 2010. [ bib | at the publisher | .pdf ]
This paper proposes a new hurricane intensity prediction model, WFL-EMM, which is based on the data mining techniques of feature weight learning (WFL) and Extensible Markov Model (EMM). The data features used are those employed by one of the most popular intensity prediction models, SHIPS. In our algorithm, the weights of the features are learned by a genetic algorithm (GA) using historical hurricane data. As the GA's fitness function we use the error of the intensity prediction by an EMM learned using given feature weights. For fitness calculation we use a technique similar to k-fold cross validation on the training data. The best weights obtained by the genetic algorithm are used to build an EMM with all training data. This EMM is then applied to predict the hurricane intensities and compute prediction errors for the test data.

Using historical data for the named Atlantic tropical cyclones from 1982 to 2003, experiments demonstrate that WFL-EMM provides significantly more accurate intensity predictions than SHIPS within 72 hours. Since we report here first results, we indicate how to improve WFL-EMM in the future.

[5] Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou. Novel data stream pattern mining, Report on the StreamKDD'10 Workshop. SIGKDD Explorations, 12(2):54-55, 2010. [ bib | at the publisher ]
This report summarizes the First International Workshop on Novel Data Stream Pattern Mining held at the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, on July 25 2010 in Washington, DC.

[6] Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou, editors. Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques (StreamKDD'10). ACM Press, New York, NY, USA, 2010. [ bib | at the publisher ]
Data stream mining gained in importance over the last years because it is indispensable for many real applications such as prediction and evolution of weather phenomena; security and anomaly detection in networks; evaluating satellite data; and mining health monitoring streams. Stream mining algorithms must take account of the unique properties of stream data: infinite data, temporal ordering, concept drifts and shifts, demand for scalability etc. This workshop brings together scholars working in different areas of learning on streams, including sensor data and other forms of accumulating data. Most of the papers in the next pages are on unsupervised learning with clustering methods. Issues addressed include the detection of outliers and anomalies, evolutionary clustering and incremental clustering, learning in subspaces of the complete feature space and learning with exploitation of context, deriving models from text streams and visualizing them.

[7] Michael Hahsler and Margaret H. Dunham. rEMM: Extensible Markov model for data stream clustering in R. Journal of Statistical Software, 35(5):1-31, 2010. [ bib | at the publisher ]
Clustering streams of continuously arriving data has become an important application of data mining in recent years and efficient algorithms have been proposed by several researchers. However, clustering alone neglects the fact that data in a data stream is not only characterized by the proximity of data points which is used by clustering, but also by a temporal component. The Extensible Markov Model (EMM) adds the temporal component to data stream clustering by superimposing a dynamically adapting Markov Chain. In this paper we introduce the implementation of the R extension package rEMM which implements EMM and we discuss some examples and applications.


This file was generated by bibtex2html 1.96.