Talks: Michael Hahsler

>> Home > Research > Talks

[1] Michael Hahsler. Dissimilarity plots: A visual exploration tool for partitional clustering, April 2009. CSE Colloquium, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas, April 3, 2009. [ .pdf ]
Cluster analysis tries to uncover structure in data by assigning each object in the data set to a group (called cluster) so that objects from the same cluster are more similar to each other than to objects from other clusters. Exploring the cluster structure and assessing the quality of the cluster solution have been a research topic since the invention of cluster analysis. This is especially important since all popular cluster algorithms produce a clustering even for data without a "cluster" structure. Many visualization techniques to judging the quality of a clustering and to explore the cluster structure were developed, but they all suffer from certain restrictions. For example, dendrograms cannot be used for non-hierarchical partitions, silhouette plots provide only a diagnostic tool without the ability to explore structure, data dimensionality may render projection-based methods less useful, and graph-based representations hide the internal structure of clusters. In this talk we introduce a new visualization technique called dissimilarity plots which uses dissimilarity matrix shading and seriation for (near) optimal cluster and object placement. Dissimilarity plots are not affected by data dimensionality, allow for directly judging cluster quality and visual analysis of the micro-structure within clusters, and make misspecification of the used number of clusters instantly apparent. Dissimilarity plots are implemented in the R extension package seriation.

[2] Michael Hahsler. A probabilistic approach to association rule mining, October 2008. CSE Colloquium, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas, October 10, 2008. [ .pdf ]
Mining association rules is an important and widely applied technique for discovering patterns in transaction databases. However, the support-confidence framework has some often overlooked weaknesses. For example, the thresholds are typically chosen arbitrarily by the analyst just to keep the set of found rules at a manageable size, confidence and other measures of interestingness are systematically influenced by support, and the risk of using `spurious rules' in an application is generally unknown and ignored. This talk will introduce a simple stochastic model and show how to apply it to simulate data for analyzing the behavior of confidence and other measures of interestingness (e.g., lift), to develop a new model-driven approach to mine rules based on the notion of NB-frequent itemsets, and to define a measure of interestingness which controls for spurious rules and has a strong foundation in statistical testing theory.

[3] Michael Hahsler. Generating top-N recommendations from binary profile data, July 2008. Berufungsvortrag Wirtschaftsinformatik, WU Wien, July 16, 2008. [ .pdf ]
Research on collaborative filtering recommender systems for binary data is limited. In this presentation we motivate why concentrating on binary data is important and we review how well known collaborative filtering methods perform using different binary data sets.

[4] Michael Hahsler and Kurt Hornik. Two applications of the TSP for data analysis, March 2007. 31th Annual Conference of the German Classification Society (GfKl 2007), Freiburg, March 7-9, 2007. [ .pdf ]
The traveling salesperson problem is a well known and important combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once. Even though simple to state, the TSP belongs to the class of NP-complete problems. The importance of the TSP arises from the variety of its applications. In addition to obvious ones for vehicle routing and computer wiring, there exist applications in data analysis which make it important for state of the art data analysis environments to provide capabilities for solving TSPs. In this talk, we introduce the recently developed package TSP which provides a basic infrastructure for handling and solving the traveling salesperson problem within R, an open source environment for statistical computing and graphics. We demonstrate how the new package can be used for two data analysis applications, clustering and the reorganization of data matrices for visualization.

[5] Michael Hahsler. TSP - A R-package for the traveling salesperson problem, December 2006. Research Seminar Statistical Computation, Wirtschaftsuniversität Wien, December 1, 2006. [ .pdf ]
The traveling salesperson or salesman problem (TSP) is a well known and important combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once and then returns to the starting city. Despite this simple problem statement, the TSP belongs to the class of NP-complete problems. The importance of the TSP arises from the variety of it's applications. Apart from the obvious application for vehicle routing, many other applications, e.g., computer wiring, cutting wallpaper, job sequencing or several data visualization techniques, require the solution of a TSP. In this talk we introduce the R package TSP which provides a basic infrastructure for handling and solving the traveling salesperson problem. The package features informal S3 classes for specifying a TSP and its (possibly optimal) solution as well as several heuristics to find good solutions. In addition, it provides an interface to Concorde, one of the best exact TSP solvers currently available.

[6] Michael Hahsler. Warenkorbanalyse mit Hilfe der Statistiksoftware R, October 2006. WU Competence Day, Wirtschaftsuniversität Wien, 19. Oktober, 2006. [ .pdf ]
Die Warenkorb- oder Sortimentsverbundanalyse bezeichnet eine Reihe von Methoden zur Untersuchung der bei einem Einkauf gemeinsam nachgefragten Produkte oder Kategorien aus einem Handelssortiment. In diesem Beitrag wird die explorative Warenkorbanalyse näher beleuchtet, welche eine Verdichtung und kompakte Darstellung der in (zumeist sehr umfangreichen) Transaktionsdaten des Einzelhandels auffindbaren Verbundbeziehungen beabsichtigt. Mit einer enormen Anzahl an verfügbaren Erweiterungspaketen bietet sich die frei verfügbare Statistik-Software R als ideale Basis für die Durchführung solcher Warenkorbanalysen an. Die im Erweiterungspaket arules vorhandene Infrastruktur für Transaktionsdaten stellt eine flexible Basis für die Warenkorbanalyse bereit. Unterstützt wird die effiziente Darstellung, Bearbeitung und Analyse von Warenkorbdaten mitsamt beliebigen Zusatzinformationen zu Produkten (zum Beispiel Sortimentshierarchie) und zu Transaktionen (zum Beispiel Umsatz oder Deckungsbeitrag). Das Paket ist nahtlos in R integriert und ermöglicht dadurch die direkte Anwendung von bereits vorhandenen modernsten Verfahren für Sampling, Clusterbildung und Visualisierung von Warenkorbdaten. Zusätzlich sind in arules gängige Algorithmen zum Auffinden von Assoziationsregeln und die notwendigen Datenstrukturen zur Analyse von Mustern vorhanden. Eine Auswahl der wichtigsten Funktionen wird anhand eines realen Transaktionsdatensatzes aus dem Lebensmitteleinzelhandel demonstriert.

[7] Michael Hahsler. Probabilistische Ansätze in der Assoziationsanalyse, May 2006. Habilitationsvortrag, Wirtschaftsuniversität Wien, 19. Mai, 2006. [ .pdf ]
In diesem Vortrag werden kurz die Grundlagen der Assoziationsanalyse mittles Assoziationsregeln (Support-Konfidenz-Framework) dargestellt. Nach der Vorstellung der probabilistischen Interpretation des Support-Konfidenz-Framework werden bekannte Schwächen und Erweiterungen (Lift, Chi-Quadrat-Test) besprochen. Schließlich wird ein stochastisches Unabhängigkeitsmodell eingeführt für das drei Anwendungen besprochen werden.

[8] Michael Hahsler and Kurt Hornik. An association rule mining infrastructure for the R data analysis toolbox, March 2006. 30th Annual Conference of the German Classification Society (GfKl 2006), Berlin, March 8-10, 2006. [ .pdf ]
The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis (advanced clustering, sampling and visualization). However, support for analyzing transaction data, e.g., mining association rules, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has not been available for R thus far. In this presentation we present the R extension package arules, an infrastructure for handling and mining transaction data. After an introduction to the infrastructure provided by the package arules, we use a market basket data set from a typical retailer to demonstrate the advantages of the seamless integration of arules into the R infrastructure.

[9] Michael Hahsler. Optimizing web sites for customer retention, November 2005. 2005 International Workshop on Customer Relationship Management: Data Mining Meets Marketing November 18th & 19th, 2005, New York City, USA. [ .pdf ]
With customer relationship management (CRM) companies move away from a mainly product-centered view to a customer-centered view. Resulting from this change, the effective management of how to keep contact with customers throughout different channels is one of the key success factors in today's business world. Company Web sites have evolved in many industries into an extremely important channel through which customers can be attracted and retained. To analyze and optimize this channel, accurate models of how customers browse through the Web site and what information within the site they repeatedly view are crucial. Typically, data mining techniques are used for this purpose. However, there already exist numerous models developed in marketing research for traditional channels which could also prove valuable to understanding this new channel. In this paper we propose the application of an extension of the Logarithmic Series Distribution (LSD) model repeat-usage of Web-based information and thus to analyze and optimize a Web Site's capability to support one goal of CRM, to retain customers. As an example, we use the university's blended learning web portal with over a thousand learning resources to demonstrate how the model can be used to evaluate and improve the Web site's effectiveness.

[10] Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of probabilistic data modeling for rule mining, March 2005. 29th Annual Conference of the German Classification Society (GfKl 2005), March 9-11, 2005, Magdeburg, Germany. [ .pdf ]
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine associations are discussed in great detail. In this talk we investigate properties of transaction data sets from a probabilistic point of view. We present a simple probabilistic framework for transaction data and its implementation using the R statistical computing environment. The framework can be used to simulate transaction data when no associations are present. We use such data to explore the ability to filter noise of confidence and lift, two popular interest measures used for rule mining. Based on the framework we develop the measure hyperlift and we compare this new measure to lift using simulated data and a real-world grocery database.

[11] Michael Hahsler and Stefan Koch. Discussion of a large-scale open source data collection methodology, January 2005. 38th Hawaii International Conference on System Sciences (HICSS-38), January 3-6, 2005, Hilton Waikoloa Village, Big Island, Hawaii. [ .pdf ]
In this talk we discusses in detail a possible methodology for collecting repository data on a large number of open source software projects from a single project hosting and community site. The process of data retrieval is described along with the possible metrics that can be computed and which can be used for further analyses. Example research areas to be addressed with the available data and first results are given. Then, both advantages and disadvantages of the proposed methodology are discussed together with implications for future approaches.

[12] Georg Fessler, Michael Hahsler, and Michaela Putz. ePubWU - Erfahrungen mit einer Volltextplattform an der Wirtschaftsuniversität Wien, September 2004. 28. Österreichischer Bibliothekartag 2004, Linz, Austria. [ .pdf ]
ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo forschungsbezogene Veröffentlichungen der WU im Volltext über das WWW zugänglich gemacht werden. ePubWU wird als Gemeinschaftsprojekt der Universitätsbibliothek der Wirtschaftsuniversität Wien und der Abteilung für Informationswirtschaft betrieben. Derzeit werden in ePubWU zwei Publikationsarten gesammelt - Working Papers und Dissertationen. In diesem Vortrag werden Erfahrungen der über zweijährigen Laufzeit des Projektes dargestellt, u.a. in den Bereichen Akquisition, Workflows, Erschließung, Vermittlung.

[13] Michael Hahsler. Association rules and the negative binomial model, April 2004. Research Seminar Statistical Learning, Wirtschaftsuniversität Wien, Austria. [ .pdf ]
In this talk we try to apply a simple stochastic model, the Negative Binomial model, for mining rules that can be used for applications like recommender systems. We present how several publicly available datasets can be fitted by the model and how the model is extended to mine rules. Finally, we present a mining algorithm and discuss evaluation and open research questions.

[14] Michael Hahsler. Gernerating synthetic transaction data for tuning usage mining algorithms, March 2003. 27th Annual GfKl-Conference, Cottbus, Germany. [ .pdf ]
The Internet was rapidly adopted as a channel for advertising and selling products. Especially the Internet is useful for selling information based goods (documents, software, music, ...) which can be delivered instantly via the net. Competition is fierce and the sellers have to provide additional services to keep their customers and attract new customers. A common approach is to improve the user interface by adding recommender services as known from the book recommender used by the successful Internet retailer Amazon.com. Market basket analysis and association rule algorithms for transaction data are frequently used to generate recommendations. However, tuning the performance and testing different parameters of algorithms is crucial for the success of such approaches. Unfortunately, the great amount of high-quality historical data needed is often not available. To generate synthetic data, that has similar characteristics like the real world data, is often the only solution to this problem. In this talk we analyze the Quest synthetic data generation code for associations (see: http://www.almaden.ibm.com/cs/quest). The transaction data generated by this program is used in several papers to evaluate performance increases of association rule algorithms.owever, the characteristics of the generated data seem to be not in line with real world data, especially data concerning the Web and information goods. As an alternative, we present the first version of a generator based on Ehrenberg's repeat buying theory to generate synthetic transaction data. The repeat buying theory has a solid empirical basis and models the micro-structure of the purchase processes. We conclude with a comparison of data generated by the two generators with real world data. We belief, that for an objective evaluation and the tuning of algorithms always real data as well as a combination of different synthetic data sets from generators using different models should be used.

[15] Michael Hahsler. Software reuse with analysis patterns, August 2002. AMCIS 2002, August 9-11, Dallas, Texas. [ .pdf ]
The purpose of this talk is to promote reuse of domain knowledge by introducing patterns already in the analysis phase of the software life-cycle. We propose an outline template for analysis patterns that strongly supports the whole analysis process from the requirements analysis to the analysis model and further on to its transformation into a flexible and reusable design and implementation. As an example we develop a family of analysis patterns in this talk that deal with a series of pressing problems in cooperative work, collaborative information filtering and sharing, and knowledge management. We evaluate the reuse potential of these patterns by analyzing several components of an information system, that was developed for the Virtual University project of the Vienna University of Economics and Business Administration. The findings of this analysis suggest that using patterns in the analysis phase has the potential to reducing development time significantly by introducing reuse already at the analysis stage and by improving the interface between analysis and design phase.

[16] Michael Hahsler. Evaluation of recommender algorithms for an internet information broker based on simple association rules and on the repeat-buying theory, July 2002. WEBKDD 2002, Edmonton, Alberta, Canada. [ .pdf ]
Association rules are a widely used technique to generate recommendations in commercial and research recommender systems. Since more and more Web sites, especially of retailers, offer automatic recommender services using Web usage mining, evaluation of recommender algorithms becomes increasingly important. In this talk we first present a framework for the evaluation of different aspects of recommender systems based on the process of discovering knowledge in databases of Fayyad et al. and then we focus on the comparison of the performance of two recommender algorithms based on frequent itemsets. The first recommender algorithm uses association rules, and the other recommender algorithm is based on the repeat-buying theory known from marketing research. For the evaluation we concentrated on how well the patterns extracted from usage data match the concept of useful recommendations of users. We use 6 month of usage data from an educational Internet information broker and compare useful recommendations identified by users from the target group of the broker with the results of the recommender algorithms. The results of the evaluation presented in this talk suggest that frequent itemsets from purchase histories match the concept of useful recommendations expressed by users with satisfactory accuracy (higher than 70%) and precision (between 60% and 90%). Also the evaluation suggests that both algorithms studied in the talk perform similar on real-world data if they are tuned properly.

[17] Michael Hahsler. Data warehouses and data mining, December 2001. Guest Lecture at Webster University, Vienna. [ http | .pdf ]
This presentation gives a very short introduction to the concepts of Data Warehouses and Data Mining with some examples from the information system called the WU Virtual University.

[18] Michael Hahsler. Patterns im Softwareentwicklungsprozeß, September 2001. ADV Arbeitsgemeinschaft für Datenverarbeitung, Wien. [ http | .pdf ]
In dieser Präsentation wird die Einbindung von Patterns in den Softwareentwicklungsprozeß eingegangen. Zuerst wird der OO-Lebenszyklus dargestellt und danach wird gezeigt wo und wie Design und Analyse Patterns zum Einsatz kommen können. Dabei werden der Nutzen und Probleme aufgezeigt. Zum Abschluß werden Beispiele für den Einsatz von Analyse Patterns im Zusammenhang mir einem Projekt zur Entwicklung eines Informationssystems für Forschung und Lehre einer Universität präsentiert.

[19] Michael Hahsler. A customer purchase incidence model applied to recommender services, August 2001. WEBKDD 2001, San Francisco, CA. [ http | .pdf ]
In this presentation we transfer a customer purchase incidence model for consumer products which is based on Ehrenberg's repeat-buying theory to Web-based information products. Ehrenberg's repeat-buying theory successfully describes regularities in a large number of consumer product markets. We show that these regularities exist in electronic markets for information goods too, and that purchase incidence models provide a well founded theoretical foundation for recommender and alert systems. The article consists of three parts. First, we present the architecture of an information market and its instrumentation for collecting data on customer behavior. In the second part Ehrenberg's repeat-buying theory and its assumptions are reviewed and adapted for Web-based information markets. Finally, we present the empirical validation of the model based on data collected from the information market of the Virtual University of the Vienna University of Economics and Business Administration at http://vu.wu-wien.ac.at

[20] Michael Hahsler. User-centered navigation re-design for web-based information systems, August 2000. AMCIS 2000, Long Beach, CA. [ http ]
[21] Michael Hahsler. Vorstellung der Virtuellen Universität im Rahmen des Jungassisstententrainings, March 2000. Wirtschaftsuniversität Wien. [ http ]
[22] Michael Hahsler. Living Lectures - WU Virtual Library: Ein Lernportal, March 2000. in Vortragsreihe ”Lernen per Internet”, Technische Universität Wien. [ http ]
[23] Andreas Geyer-Schulz, Michael Hahsler, and Georg Schneider. The living lectures - virtual university project, June 1999. Global Bangemann Challenge, Stockholm, Sweden. [ .html ]
[24] Michael Hahsler. Das Living Lectures - Virtual University Projekt: Projektbericht, June 1999. Institut für Betriebswirtschaftslehre, Universität Wien. [ http ]
[25] Michael Hahsler. Das Living Lectures - Virtual University Projekt: Informationstechnologie im universitären Bildungsbereich, June 1999. Global Village 99. [ .txt ]
[26] Andreas Geyer-Schulz and Michael Hahsler. Automatic labelling of references for internet information systems, March 1999. 23rd Annual GfKl-Conference, Bielefeld, Germany. [ http ]
[27] Andreas Geyer-Schulz, Michael Hahsler, and Georg Schneider. Workshop Living Lectures - Virtual University, September 1998. Wirtschaftsuniversität Wien.


© Michael Hahsler,

This file has been generated by bibtex2html