| [1] |
Charlie Isaksson, Margaret H. Dunham, and Michael Hahsler.
SOStream: Self organizing density-based clustering over data
stream.
In International Conference on Machine Learning and Data Mining
(MLDM'2012). Springer, July 2012.
[ bib ]
|
| [2] |
Vladimir Jovanovic, Margaret H. Dunham, Michael Hahsler, and Yu Su.
Evaluating hurricane intensity prediction techniques in real time.
In Third IEEE ICDM Workshop on Knowledge Discovery from Climate
Data, Proceedings of the of the 2011 IEEE International Conference on Data
Mining Workshops (ICDMW 2011). IEEE, December 2011.
[ bib |
.pdf ]
While the accuracy of hurricane track prediction has been improving, predicting intensity, the maximum sustained wind speed, is still a very difficult challenge. This is problematic because the destructive power of a hurricane is directly related to its intensity. In this paper, we present Prediction Intensity Interval model for Hurricanes (PIIH) which combines sophisticated data mining techniques to create an online real time model for accurate intensity predictions and we present a web-based framework to dynamically compare PIIH to operational models used by the National Hurricane Center (NHC). The created dynamic website tracks, compares, and provides visualization to facilitate immediate comparisons of prediction techniques. This paper is a work in progress paper reporting on both, new features of the PIIH model and online visualization of the accuracy of that model as compared to other techniques.
|
| [3] |
Michael Hahsler and Margaret H. Dunham.
Temporal structure learning for clustering massive data streams in
real-time.
In SIAM Conference on Data Mining (SDM11), pages 664-675.
SIAM, April 2011.
[ bib |
.pdf ]
This paper describes one of the first attempts to model the temporal structure of massive data streams in real-time using data stream clustering. Recently, many data stream clustering algorithms have been developed which efficiently find a partition of the data points in a data stream. However, these algorithms disregard the information represented by the temporal order of the data points in the stream which for many applications is an important part of the data stream. In this paper we propose a new framework called Temporal Relationships Among Clusters for Data Streams (TRACDS) which allows to learn the temporal structure while clustering a data stream. We identify, organize and describe the clustering operations which are used by state-of-the-art data stream clustering algorithms. Then we show that by defining a set of new operations to transform Markov Chains with states representing clusters dynamically, we can efficiently capture temporal ordering information. This framework allows us to preserve temporal relationships among clusters for any state-of-the-art data stream clustering algorithm with only minimal overhead.
|
| [4] |
Yu Su, Sudheer Chelluboina, Michael Hahsler, and Margaret H. Dunham.
A new data mining model for hurricane intensity prediction.
In Second IEEE ICDM Workshop on Knowledge Discovery from Climate
Data: Prediction, Extremes and Impacts, Proceedings of the of the 2010 IEEE
International Conference on Data Mining Workshops (ICDMW 2010), pages
98-105. IEEE, December 2010.
[ bib |
at the publisher |
.pdf ]
This paper proposes a new hurricane intensity prediction model, WFL-EMM, which is based on the data mining techniques of feature weight learning (WFL) and Extensible Markov Model (EMM). The data features used are those employed by one of the most popular intensity prediction models, SHIPS. In our algorithm, the weights of the features are learned by a genetic algorithm (GA) using historical hurricane data. As the GA's fitness function we use the error of the intensity prediction by an EMM learned using given feature weights. For fitness calculation we use a technique similar to k-fold cross validation on the training data. The best weights obtained by the genetic algorithm are used to build an EMM with all training data. This EMM is then applied to predict the hurricane intensities and compute prediction errors for the test data.
|
| [5] |
Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou.
Novel data stream pattern mining, Report on the StreamKDD'10
Workshop.
SIGKDD Explorations, 12(2):54-55, 2010.
[ bib |
at the publisher ]
This report summarizes the First International Workshop on Novel Data Stream Pattern Mining held at the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, on July 25 2010 in Washington, DC.
|
| [6] |
Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou, editors.
Proceedings of the First International Workshop on Novel Data
Stream Pattern Mining Techniques (StreamKDD'10).
ACM Press, New York, NY, USA, 2010.
[ bib |
at the publisher ]
Data stream mining gained in importance over the last years because it is indispensable for many real applications such as prediction and evolution of weather phenomena; security and anomaly detection in networks; evaluating satellite data; and mining health monitoring streams. Stream mining algorithms must take account of the unique properties of stream data: infinite data, temporal ordering, concept drifts and shifts, demand for scalability etc. This workshop brings together scholars working in different areas of learning on streams, including sensor data and other forms of accumulating data. Most of the papers in the next pages are on unsupervised learning with clustering methods. Issues addressed include the detection of outliers and anomalies, evolutionary clustering and incremental clustering, learning in subspaces of the complete feature space and learning with exploitation of context, deriving models from text streams and visualizing them.
|
| [7] |
Michael Hahsler and Margaret H. Dunham.
rEMM: Extensible Markov model for data stream clustering in
R.
Journal of Statistical Software, 35(5):1-31, 2010.
[ bib |
at the publisher ]
Clustering streams of continuously arriving data has become an important application of data mining in recent years and efficient algorithms have been proposed by several researchers. However, clustering alone neglects the fact that data in a data stream is not only characterized by the proximity of data points which is used by clustering, but also by a temporal component. The Extensible Markov Model (EMM) adds the temporal component to data stream clustering by superimposing a dynamically adapting Markov Chain. In this paper we introduce the implementation of the R extension package rEMM which implements EMM and we discuss some examples and applications.
|
This file was generated by bibtex2html 1.96.