Publications: Michael Hahsler

>> Home > Research > Publications > Publications by Field

Fields

Data Stream Mining

[1] Margaret H. Dunham, Michael Hahsler, and Myra Spiliopoulou, editors. Proceedings of the First International Workshop on Novel Data Stream Pattern Mining Techniques (StreamKDD'10). ACM Press, New York, NY, USA, 2010. [ bib | at the publisher ]
Data stream mining gained in importance over the last years because it is indispensable for many real applications such as prediction and evolution of weather phenomena; security and anomaly detection in networks; evaluating satellite data; and mining health monitoring streams. Stream mining algorithms must take account of the unique properties of stream data: infinite data, temporal ordering, concept drifts and shifts, demand for scalability etc. This workshop brings together scholars working in different areas of learning on streams, including sensor data and other forms of accumulating data. Most of the papers in the next pages are on unsupervised learning with clustering methods. Issues addressed include the detection of outliers and anomalies, evolutionary clustering and incremental clustering, learning in subspaces of the complete feature space and learning with exploitation of context, deriving models from text streams and visualizing them.
[2] Rao M. Kotamarti, Michael Hahsler, Douglas Raiford, Monnie McGee, and Margaret H. Dunham. Analyzing Taxonomic Classification Using Extensible Markov Models. Bioinformatics, 2010. Advance Access published July 12, 2010. [ bib | DOI | at the publisher ]
Motivation: As next generation sequencing is rapidly adding new genomes, their correct placement in the taxonomy needs verification. However, the current methods for confirming classification of a taxon or suggesting revision for a potential misplacement relies on computationally intense multi-sequence alignment followed by an iterative adjustment of the distance matrix. Due to intra-heterogeneity issues with the 16S rRNA marker, no classifier is available for sub-genus level that could readily suggest a classification for a novel 16S rRNA sequence. Metagenomics further complicates the issue by generating fragmented 16S rRNA sequences. This paper proposes a novel alignment-free method for representing the microbial profiles using Extensible Markov Models (EMM) with an extended Karlin-Altschul statistical framework similar to the classic alignment paradigm. We propose a Log Odds (LOD) score classifier based on Gumbel difference distribution that confirms correct classifications with statistical significance qualifications and suggests revisions where necessary. Results: We tested our method by generating a sub-genus level classifier with which we re-evaluated classifications of 676 microbial organisms using the NCBI FTP database for the 16S rRNA. The results confirm current classification for all genera while ascertaining significance at 95%. Furthermore, this novel classifier isolates heterogeneity issues to a mere 12 strains while confirming classifications with significance qualification for the remaining 98%. The models require less memory than that needed by multi-sequence alignments and have better time complexity than the current methods. The classifier operates at sub-genus level and thus outperforms the naive Bayes classifier of the RNA Database Project where much of the taxonomic analysis is available online. Finally, using information redundancy in model building, we show that the method applies to metagenomic fragment classification of 19 E.coli strains.
[3] Michael Hahsler and Margaret H. Dunham. rEMM: Extensible markov model for data stream clustering in R. Journal of Statistical Software, 35(5):1-31, 2010. [ bib | at the publisher ]
Clustering streams of continuously arriving data has become an important application of data mining in recent years and efficient algorithms have been proposed by several researchers. However, clustering alone neglects the fact that data in a data stream is not only characterized by the proximity of data points which is used by clustering, but also by a temporal component. The Extensible Markov Model (EMM) adds the temporal component to data stream clustering by superimposing a dynamically adapting Markov Chain. In this paper we introduce the implementation of the R extension package rEMM which implements EMM and we discuss some examples and applications.
[4] Rao M Kotamarti, Michael Hahsler, Douglas W Raiford, and Margaret H Dunham. Sequence transformation to a complex signature form for consistent phylogetic tree using extensible markov model. In Proceedings of the 2010 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (IEEE CIBCB 2010). IEEE, 2010. [ bib | .pdf ]
Phylogenetic tree analysis using molecular sequences continues to expand beyond the 16S rRNA marker. By addressing the multi-copy issue known as the intra-heterogeneity, this paper restores the focus in using the 16S rRNA marker. Through use of a novel learning and model building algorithm, the multiple gene copies are integrated into a compact complex signature using the Extensible Markov Model (EMM). The method clusters related sequence segments while preserving their inherent order to create an EMM signature for a microbial organism. A library of EMM signatures is generated from which samples are drawn for phylogenetic analysis. By matching the components of two signatures, referred to as quasi-alignment, the differences are highlighted and scored. Scoring quasi-alignments is done using adapted Karlin-Altschul statistics to compute a novel distance metric. The metric satisfies conditions of identity, symmetry, triangular inequality and the four point rule required for a valid evolution distance metric. The resulting distance matrix is input to PHYologeny Inference Package (PHYLIP) to generate phylogenies using neighbor joining algorithms. Through control of clustering in signature creation, the diversity of similar organisms and their placement in the phylogeny is explained. The experiments include analysis of genus Burkholderia, a random microbial sample spanning several phyla and a diverse sample that includes RNA of Eukaryotic origin. The NCBI sequence data for 16S rRNA is used for validation.
[5] Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie McGee, and Maggie Dunham. Targeted genomic signature profiling with quasi-alignment statistics. Article 63, COBRA Preprint Series, November 2009. [ bib | at the publisher ]
Genome databases continue to expand with no change in the basic format of sequence data. The prevalent use of the classic alignment based search tools like BLAST have significantly pushed the limits of genome isolate research. The relatively new frontier of Metagenomic research deals with thousands of diverse genomes with newer demands beyond the current homologue search and analysis. Compressing sequence data into a complex form could facilitate a broader range of sequence analyses. To this end, this research explores reorganizing sequence data as complex Markov signatures also known as Extensible Markov Models. Markov models have found successful application in biological sequence analysis applications through small, but important extensions to the original theory of Markov Chains. Extensible Markov Model (EMM) offers a novel Quasi-alignment complement to the classic alignment based homologous sequence search methods like BLAST. EMM based bioinformatic analysis (EMMBA) incorporates automatic learning which allows the Markov chain creation dynamically. Oligonucletide or genomic word frequencies form the core sequence data in alignment free methods. EMMBA extends the Karlin-Altschul statistics to bring forth an analogous E-Score statistical significance to the quasi-alignment domain. By consolidating a community of sequences into a single searchable profile, EMM methodology further reduces the search space for classification. Through dynamic generation of the score matrix for each community profile, EMMBA fine tunes the score assignments. Each evaluation iteratively adjusts the profile score matrix to account for point probabilities of the query to ensure Karlin-Altschul assumptions are satisfied to derive meaningful statistical signifi- cance. The presence of multiple quasi-alignments resembles multiple local alignments of BLAST. Quasi-alignments are scored based on a difference distribution of Gumbel scores. Species signature profiles allow for statistical validation of novel species identification. Working in EMM transformation space speeds up classification and generates distance matrix for differentiation. The techniques and metrics presented are validated using the microbial 16s rRNA sequence data from NCBI.

Visualization Based on Seriation

[1] Michael Hahsler and Kurt Hornik. Dissimilarity plots: A visual exploration tool for partitional clustering. Report 89, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, September 2009. [ bib | at the publisher ]
For hierarchical clustering, dendrograms provide convenient and powerful visualization. Although many visualization methods have been suggested for partitional clustering, their usefulness deteriorates quickly with increasing dimensionality of the data and/or they fail to represent structure between and within clusters simultaneously. In this paper we extend (dissimilarity) matrix shading with several reordering steps based on seriation. Both methods, matrix shading and seriation, have been well-known for a long time. However, only recent algorithmic improvements allow to use seriation for larger problems. Furthermore, seriation is used in a novel stepwise process (within each cluster and between clusters) which leads to a visualization technique that is independent of the dimensionality of the data. A big advantage is that it presents the structure between clusters and the micro-structure within clusters in one concise plot. This not only allows for judging cluster quality but also makes mis-specification of the number of clusters apparent. We give a detailed discussion of the construction of dissimilarity plots and demonstrate their usefulness with several examples.
[2] Michael Hahsler, Kurt Hornik, and Christian Buchta. Getting things in order: An introduction to the R package seriation. Journal of Statistical Software, 25(3):1-34, March 2008. [ bib | at the publisher ]
Seriation, i.e., finding a linear order for a set of objects given data and a loss or merit function, is a basic problem in data analysis. Caused by the problem's combinatorial nature, it is hard to solve for all but very small sets. Nevertheless, both exact solution methods and heuristics are available. In this paper we present the package seriation which provides the infrastructure for seriation with R. The infrastructure comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation. To illustrate how easily the package can be applied for a variety of applications, a comprehensive collection of examples is presented.
[3] Michael Hahsler and Kurt Hornik. TSP - Infrastructure for the traveling salesperson problem. Journal of Statistical Software, 23(2):1-21, December 2007. [ bib | at the publisher ]
The traveling salesperson (or, salesman) problem (TSP) is a well known and important combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once and then returns to the starting city. Despite this simple problem statement, solving the TSP is difficult since it belongs to the class of NP-complete problems. The importance of the TSP arises besides from its theoretical appeal from the variety of its applications. Typical applications in operations research include vehicle routing, computer wiring, cutting wallpaper and job sequencing. The main application in statistics is combinatorial data analysis, e.g., reordering rows and columns of data matrices or identifying clusters. In this paper we introduce the R package TSP which provides a basic infrastructure for handling and solving the traveling salesperson problem. The package features S3 classes for specifying a TSP and its (possibly optimal) solution as well as several heuristics to find good solutions. In addition, it provides an interface to Concorde, one of the best exact TSP solvers currently available.
[4] Michael Hahsler, Kurt Hornik, and Christian Buchta. Getting things in order: An introduction to the R package seriation. Report 58, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, August 2007. [ bib | at the publisher ]
Seriation, i.e., finding a linear order for a set of objects given data and a loss or merit function, is a basic problem in data analysis. Caused by the problem's combinatorial nature, it is hard to solve for all but very small sets. Nevertheless, both exact solution methods and heuristics are available. In this paper we present the package seriation which provides the infrastructure for seriation with R. The infrastructure comprises data structures to represent linear orders as permutation vectors, a wide array of seriation methods using a consistent interface, a method to calculate the value of various loss and merit functions, and several visualization techniques which build on seriation. To illustrate how easily the package can be applied for a variety of applications, a comprehensive collection of examples is presented.
[5] Michael Hahsler and Kurt Hornik. TSP - Infrastructure for the traveling salesperson problem. Report 45, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, December 2006. [ bib | at the publisher ]
The traveling salesperson or salesman problem (TSP) is a well known and important combinatorial optimization problem. The goal is to find the shortest tour that visits each city in a given list exactly once and then returns to the starting city. Despite this simple problem statement, solving the TSP is difficult since it belongs to the class of NP-complete problems. The importance of the TSP arises besides from its theoretical appeal from the variety of its applications. In addition to vehicle routing, many other applications, e.g., computer wiring, cutting wallpaper, job sequencing or several data visualization techniques, require the solution of a TSP. In this paper we introduce the R package TSP which provides a basic infrastructure for handling and solving the traveling salesperson problem. The package features S3 classes for specifying a TSP and its (possibly optimal) solution as well as several heuristics to find good solutions. In addition, it provides an interface to Concorde, one of the best exact TSP solvers currently available.

Association Rule Mining

[1] Michael Hahsler, Christian Buchta, and Kurt Hornik. Selective association rule generation. Computational Statistics, 23(2):303-315, April 2008. [ bib | DOI | at the publisher | .pdf ]
Mining association rules is a popular and well researched method for discovering interesting relations between variables in large databases. A practical problem is that at medium to low support values often a large number of frequent itemsets and an even larger number of association rules are found in a database. A widely used approach is to gradually increase minimum support and minimum confidence or to filter the found rules using increasingly strict constraints on additional measures of interestingness until the set of rules found is reduced to a manageable size. In this paper we describe a different approach which is based on the idea to first define a set of “interesting” itemsets (e.g., by a mixture of mining and expert knowledge) and then, in a second step to selectively generate rules for only these itemsets. The main advantage of this approach over increasing thresholds or filtering rules is that the number of rules found is significantly reduced while at the same time it is not necessary to increase the support and confidence thresholds which might lead to missing important information in the database.
[2] Michael Hahsler and Kurt Hornik. Building on the arules infrastructure for analyzing transaction data with R. In R. Decker and H.-J. Lenz, editors, Advances in Data Analysis, Proceedings of the 30th Annual Conference of the Gesellschaft für Klassifikation e.V., Freie Universität Berlin, March 8-10, 2006, Studies in Classification, Data Analysis, and Knowledge Organization, pages 449-456. Springer-Verlag, 2007. [ bib | at the publisher | .pdf ]
The free and extensible statistical computing environment R with its enormous number of extension packages already provides many state-of-the-art techniques for data analysis. Support for association rule mining, a popular exploratory method which can be used, among other purposes, for uncovering cross-selling opportunities in market baskets, has become available recently with the R extension package arules. After a brief introduction to transaction data and association rules, we present the formal framework implemented in arules and demonstrate how clustering and association rule mining can be applied together using a market basket data set from a typical retailer. This paper shows that implementing a basic infrastructure with formal classes in R provides an extensible basis which can very efficiently be employed for developing new applications (such as clustering transactions) in addition to association rule mining.
[3] Michael Hahsler and Kurt Hornik. New probabilistic interest measures for association rules. Intelligent Data Analysis, 11(5):437-455, 2007. [ bib | at the publisher | .pdf ]
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significantly better performance than lift for applications where spurious rules are problematic.
[4] Michael Hahsler. A model-based frequency constraint for mining associations from transaction data. Data Mining and Knowledge Discovery, 13(2):137-166, September 2006. [ bib | DOI | at the publisher | .pdf ]
Mining frequent itemsets is a popular method for finding associated items in databases. For this method, support, the co-occurrence frequency of the items which form an association, is used as the primary indicator of the associations's significance. A single user-specified support threshold is used to decided if associations should be further investigated. Support has some known problems with rare items, favors shorter itemsets and sometimes produces misleading associations. In this paper we develop a novel model-based frequency constraint as an alternative to a single, user-specified minimum support. The constraint utilizes knowledge of the process generating transaction data by applying a simple stochastic mixture model (the NB model) which allows for transaction data's typically highly skewed item frequency distribution. A user-specified precision threshold is used together with the model to find local frequency thresholds for groups of itemsets. Based on the constraint we develop the notion of NB-frequent itemsets and adapt a mining algorithm to find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint provides improvements over a single minimum support threshold and that the precision threshold is more robust and easier to set and interpret by the user.
[5] Michael Hahsler and Kurt Hornik. New probabilistic interest measures for association rules. Report 38, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, August 2006. [ bib | at the publisher ]
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. In this paper, we start with presenting a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world database from a grocery outlet to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left hand side of rules and that lift performs poorly to filter random noise in transaction data. Based on the probabilistic framework we develop two new interest measures, hyper-lift and hyper-confidence, which can be used to filter or order mined association rules. The new measures show significant better performance than lift for applications where spurious rules are problematic.
[6] Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of probabilistic data modeling for mining association rules. In M. Spiliopoulou, R. Kruse, C. Borgelt, A. Nürnberger, and W. Gaul, editors, From Data and Information Analysis to Knowledge Engineering, Proceedings of the 29th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Magdeburg, March 9-11, 2005, Studies in Classification, Data Analysis, and Knowledge Organization, pages 598-605. Springer-Verlag, 2006. [ bib | at the publisher | .pdf ]
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine association rules are discussed in great detail. We present a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a real-world grocery database to explore the behavior of confidence and lift, two popular interest measures used for rule mining. The results show that confidence is systematically influenced by the frequency of the items in the left-hand-side of rules and that lift performs poorly to filter random noise in transaction data. The probabilistic data modeling approach presented in this paper not only is a valuable framework to analyze interest measures but also provides a starting point for further research to develop new interest measures which are based on statistical tests and geared towards the specific properties of transaction data.
[7] Michael Hahsler, Bettina Grün, and Kurt Hornik. arules - A computational environment for mining association rules and frequent item sets. Journal of Statistical Software, 14(15):1-25, October 2005. [ bib | at the publisher | .pdf ]
Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
[8] Michael Hahsler, Bettina Grün, and Kurt Hornik. A computational environment for mining association rules and frequent item sets. Report 15, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, April 2005. [ bib | at the publisher ]
Mining frequent itemsets and association rules is a popular and well researched approach to discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
[9] Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of probabilistic data modeling for rule mining. Report 14, Research Report Series, Department of Statistics and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, March 2005. [ bib | at the publisher ]
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine associations are discussed in great detail. In this paper we investigate properties of transaction data sets from a probabilistic point of view. We present a simple probabilistic framework for transaction data and its implementation using the R statistical computing environment. The framework can be used to simulate transaction data when no associations are present. We use such data to explore the ability to filter noise of confidence and lift, two popular interest measures used for rule mining. Based on the framework we develop the measure hyperlift and we compare this new measure to lift using simulated data and a real-world grocery database.
[10] Michael Hahsler. A model-based frequency constraint for mining associations from transaction data. Working Paper 07/2004, Working Papers on Information Processing and Information Management, Institut für Informationsverarbeitung und -wirtschaft, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, November 2004. [ bib | at the publisher ]
In this paper we develop an alternative to minimum support which utilizes knowledge of the process which generates transaction data and allows for highly skewed frequency distributions. We apply a simple stochastic model (the NB model), which is known for its usefulness to describe item occurrences in transaction data, to develop a frequency constraint. This model-based frequency constraint is used together with a precision threshold to find individual support thresholds for groups of associations. We develop the notion of NB-frequent itemsets and present two mining algorithms which find all NB-frequent itemsets in a database. In experiments with publicly available transaction databases we show that the new constraint can provide significant improvements over a single minimum support threshold and that the precision threshold is easier to use.
[11] Andreas Geyer-Schulz, Michael Hahsler, and Anke Thede. Comparing association-rules and repeat-buying based recommender systems in a B2B environment. In M. Schader, W. Gaul, and M. Vichi, editors, Between Data Science and Applied Data Analysis, Proceedings of the 26th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Mannheim, July 22-24, 2002, Studies in Classification, Data Analysis, and Knowledge Organization, pages 421-429. Springer-Verlag, July 2003. [ bib | at the publisher ]
In this contribution we present a systematic evaluation and comparison of recommender systems based on simple association rules and on repeat-buying theory. Both recommender services are based on the customer purchase histories of a medium-sized B2B-merchant for computer accessories. With the help of product managers an evaluation set for recommendations was generated. With regard to this evaluation set, recommendations produced by both methods are evaluated and several error measures are computed. This provides an empirical test whether frequent item sets or outliers of a stochastic purchase incidence model are suitable concepts for automatically generation recommendations. Furthermore, the loss function (performance measures) of the two models are compared and the sensitivity with regard to a misspecification of the model parameters is discussed.
[12] Andreas Geyer-Schulz and Michael Hahsler. Comparing two recommender algorithms with the help of recommendations by peers. In O.R. Zaiane, J. Srivastava, M. Spiliopoulou, and B. Masand, editors, WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles 4th International Workshop, Edmonton, Canada, July 2002, Revised Papers, Lecture Notes in Computer Science LNAI 2703, pages 137-158. Springer-Verlag, 2003. (Revised version of the WEBKDD 2002 paper “Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory”). [ bib | at the publisher | .pdf ]
Since more and more Web sites, especially sites of retailers, offer automatic recommendation services using Web usage mining, evaluation of recommender algorithms has become increasingly important. In this paper we present a framework for the evaluation of different aspects of recommender systems based on the process of discovering knowledge in databases introduced by Fayyad et al. and we summarize research already done in this area. One aspect identified in the presented evaluation framework is widely neglected when dealing with recommender algorithms. This aspect is to evaluate how useful patterns extracted by recommender algorithms are to support the social process of recommending products to others, a process normally driven by recommendations by peers or experts. To fill this gap for recommender algorithms based on frequent itemsets extracted from usage data we evaluate the usefulness of two algorithms. The first recommender algorithm uses association rules, and the other algorithm is based on the repeat-buying theory known from marketing research. We use 6 months of usage data from an educational Internet information broker and compare useful recommendations identified by users from the target group of the broker (peers) with the recommendations produced by the algorithms. The results of the evaluation presented in this paper suggest that frequent itemsets from usage histories match the concept of useful recommendations expressed by peers with satisfactory accuracy (higher than 70%) and precision (between 60% and 90%). Also the evaluation suggests that both algorithms studied in the paper perform similar on real-world data if they are tuned properly.
[13] Andreas Geyer-Schulz and Michael Hahsler. Evaluation of recommender algorithms for an internet information broker based on simple association rules and on the repeat-buying theory. In Brij Masand, Myra Spiliopoulou, Jaideep Srivastava, and Osmar R. Zaiane, editors, Fourth WEBKDD Workshop: Web Mining for Usage Patterns & User Profiles, pages 100-114, Edmonton, Canada, July 2002. [ bib | .pdf ]
Association rules are a widely used technique to generate recommendations in commercial and research recommender systems. Since more and more Web sites, especially of retailers, offer automatic recommender services using Web usage mining, evaluation of recommender algorithms becomes increasingly important. In this paper we first present a framework for the evaluation of different aspects of recommender systems based on the process of discovering knowledge in databases of Fayyad et al. and then we focus on the comparison of the performance of two recommender algorithms based on frequent itemsets. The first recommender algorithm uses association rules, and the other recommender algorithm is based on the repeat-buying theory known from marketing research. For the evaluation we concentrated on how well the patterns extracted from usage data match the concept of useful recommendations of users. We use 6 month of usage data from an educational Internet information broker and compare useful recommendations identified by users from the target group of the broker with the results of the recommender algorithms. The results of the evaluation presented in this paper suggest that frequent itemsets from purchase histories match the concept of useful recommendations expressed by users with satisfactory accuracy (higher than 70%) and precision (between 60% and 90%). Also the evaluation suggests that both algorithms studied in the paper perform similar on real-world data if they are tuned properly.

Recommender Systems

[1] Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke Thede. Behavior-based recommender systems as value-added services for scientific libraries. In Hamparsum Bozdogan, editor, Statistical Data Mining & Knowledge Discovery, pages 433-454. Chapman & Hall / CRC, July 2003. [ bib | at the publisher ]
Amazon.com paved the way for several large-scale, behavior-based recommendation services as an important value-added expert advice service for online book shops. In this contribution we discuss the effects (and possible reductions of transaction costs) for such services and investigate how such value-added services can be implemented in context of scientific libraries. For this purpose we present a new, recently developed recommender system based on a stochastic purchase incidence model, present the underlying stochastic model from repeat-buying theory and analyze whether the underlying assumptions on consumer behavior holds for users of scientific libraries, too. We analyzed the logfiles with approximately 85 million HTTP-transactions of the web-based online public access catalog (OPAC) of the library of the Universität Karlsruhe (TH) since January 2001 and performed some diagnostic checks. The recommender service is fully operational within the library system of the Universität Karlsruhe (TH) since 2002/06/22.
[2] Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke Thede. An integration strategy for distributed recommender services in legacy library systems. In M. Schader, W. Gaul, and M. Vichi, editors, Between Data Science and Applied Data Analysis, Proceedings of the 26th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Mannheim, July 22-24, 2002, Studies in Classification, Data Analysis, and Knowledge Organization, pages 412-420. Springer-Verlag, July 2003. [ bib | at the publisher ]
Scientific library systems are a very promising application area for recommender services. Scientific libraries could easily develop customer-oriented service portals in the style of amazon.com. Students, university teachers and researchers can reduce their transaction cost (i.e. search and evaluation cost of information products). For librarians, the advantage is an improvement of the customer support by recommendations and the additional support in marketing research, product evaluation, and book selection. In this contribution we present a strategy for integrating a behavior-based distributed recommender service in legacy library systems with minimal changes in the legacy system.
[3] Andreas Geyer-Schulz, Michael Hahsler, and Anke Thede. Comparing association-rules and repeat-buying based recommender systems in a B2B environment. In M. Schader, W. Gaul, and M. Vichi, editors, Between Data Science and Applied Data Analysis, Proceedings of the 26th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Mannheim, July 22-24, 2002, Studies in Classification, Data Analysis, and Knowledge Organization, pages 421-429. Springer-Verlag, July 2003. [ bib | at the publisher ]
In this contribution we present a systematic evaluation and comparison of recommender systems based on simple association rules and on repeat-buying theory. Both recommender services are based on the customer purchase histories of a medium-sized B2B-merchant for computer accessories. With the help of product managers an evaluation set for recommendations was generated. With regard to this evaluation set, recommendations produced by both methods are evaluated and several error measures are computed. This provides an empirical test whether frequent item sets or outliers of a stochastic purchase incidence model are suitable concepts for automatically generation recommendations. Furthermore, the loss function (performance measures) of the two models are compared and the sensitivity with regard to a misspecification of the model parameters is discussed.
[4] Andreas Geyer-Schulz and Michael Hahsler. Comparing two recommender algorithms with the help of recommendations by peers. In O.R. Zaiane, J. Srivastava, M. Spiliopoulou, and B. Masand, editors, WEBKDD 2002 - Mining Web Data for Discovering Usage Patterns and Profiles 4th International Workshop, Edmonton, Canada, July 2002, Revised Papers, Lecture Notes in Computer Science LNAI 2703, pages 137-158. Springer-Verlag, 2003. (Revised version of the WEBKDD 2002 paper “Evaluation of Recommender Algorithms for an Internet Information Broker based on Simple Association Rules and on the Repeat-Buying Theory”). [ bib | at the publisher | .pdf ]
Since more and more Web sites, especially sites of retailers, offer automatic recommendation services using Web usage mining, evaluation of recommender algorithms has become increasingly important. In this paper we present a framework for the evaluation of different aspects of recommender systems based on the process of discovering knowledge in databases introduced by Fayyad et al. and we summarize research already done in this area. One aspect identified in the presented evaluation framework is widely neglected when dealing with recommender algorithms. This aspect is to evaluate how useful patterns extracted by recommender algorithms are to support the social process of recommending products to others, a process normally driven by recommendations by peers or experts. To fill this gap for recommender algorithms based on frequent itemsets extracted from usage data we evaluate the usefulness of two algorithms. The first recommender algorithm uses association rules, and the other algorithm is based on the repeat-buying theory known from marketing research. We use 6 months of usage data from an educational Internet information broker and compare useful recommendations identified by users from the target group of the broker (peers) with the recommendations produced by the algorithms. The results of the evaluation presented in this paper suggest that frequent itemsets from usage histories match the concept of useful recommendations expressed by peers with satisfactory accuracy (higher than 70%) and precision (between 60% and 90%). Also the evaluation suggests that both algorithms studied in the paper perform similar on real-world data if they are tuned properly.
[5] Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke Thede. Recommenderdienste für wissenschaftliche Bibliotheken und Bibliotheksverbünde. In Andreas Geyer-Schulz and Alfred Taudes, editors, Informationswirtschaft: Ein Sektor mit Zukunft, Symposium 4.-5. September 2003, Wien, Österreich, Lecture Notes in Informatics (LNI) P-33, pages 43-58. Gesellschaft für Informatik, 2003. [ bib | at the publisher ]
Wissenschaftliche Bibliotheken stellen ein vielversprechendes Anwendungsfeld für Recommenderdienste dar. Wissenschaftliche Bibliotheken können leicht kundenzentrierte Serviceportale im Stil von amazon.com entwickeln. Studenten, Universitätslehrer und -forscher können ihren Anteil an den Transaktionskosten (z.B. Such- und Bewertungskosten für Informationsprodukte) reduzieren. Für Bibliothekare liegt der Vorteil in einer Verbesserung der Kundenberatung durch Empfehlungen und einer zusätzlichen Unterstützung bei der Marktforschung, Produktbewertung und dem Bestandsmanagement. In diesem Beitrag präsentieren wir eine Strategie, mit der verhaltensbasierte, verteilte Recommenderdienste in bestehende Bibliothekssysteme mit minimalem Aufwand integriert werden können und berichten über unsere Erfahrungen bei der Einführung eines solchen Dienstes an der Universitätsbibliothek der Universität Karlsruhe (TH).
[6] Andreas Geyer-Schulz and Michael Hahsler. Evaluation of recommender algorithms for an internet information broker based on simple association rules and on the repeat-buying theory. In Brij Masand, Myra Spiliopoulou, Jaideep Srivastava, and Osmar R. Zaiane, editors, Fourth WEBKDD Workshop: Web Mining for Usage Patterns & User Profiles, pages 100-114, Edmonton, Canada, July 2002. [ bib | .pdf ]
Association rules are a widely used technique to generate recommendations in commercial and research recommender systems. Since more and more Web sites, especially of retailers, offer automatic recommender services using Web usage mining, evaluation of recommender algorithms becomes increasingly important. In this paper we first present a framework for the evaluation of different aspects of recommender systems based on the process of discovering knowledge in databases of Fayyad et al. and then we focus on the comparison of the performance of two recommender algorithms based on frequent itemsets. The first recommender algorithm uses association rules, and the other recommender algorithm is based on the repeat-buying theory known from marketing research. For the evaluation we concentrated on how well the patterns extracted from usage data match the concept of useful recommendations of users. We use 6 month of usage data from an educational Internet information broker and compare useful recommendations identified by users from the target group of the broker with the results of the recommender algorithms. The results of the evaluation presented in this paper suggest that frequent itemsets from purchase histories match the concept of useful recommendations expressed by users with satisfactory accuracy (higher than 70%) and precision (between 60% and 90%). Also the evaluation suggests that both algorithms studied in the paper perform similar on real-world data if they are tuned properly.
[7] Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. A customer purchase incidence model applied to recommender systems. In R. Kohavi, B.M. Masand, M. Spiliopoulou, and J. Srivastava, editors, WEBKDD 2001 - Mining Log Data Across All Customer Touch Points, Third International Workshop, San Francisco, CA, USA, August 26, 2001, Revised Papers, Lecture Notes in Computer Science LNAI 2356, pages 25-47. Springer-Verlag, July 2002. (Revised version of the WEBKDD 2001 paper “A Customer Purchase Incidence Model Applied to Recommender Systems”). [ bib | at the publisher | .pdf ]
In this contribution we transfer a customer purchase incidence model for consumer products which is based on Ehrenberg s repeat-buying theory to Web-based information products. Ehrenberg s repeat-buying theory successfully describes regularities on a large number of consumer product markets. We show that these regularities exist in electronic markets for information goods, too, and that purchase incidence models provide a well founded theoretical base for re-commender and alert services. The article consists of two parts. In the first part Ehrenberg s repeat-buying theory and its assumptions are reviewed and adapted for web-based information markets. Second, we present the empirical validation of the model based on data collected from the information market of the Virtual University of the Vienna University of Economics and Business Administration from September 1999 to May 2001.
[8] Walter Böhm, Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. Repeat buying theory and its application for recommender services. In O. Opitz and M. Schwaiger, editors, Exploratory Data Analysis in Empirical Research, Proceedings of the 25th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Munich, March 14-16, 2001, pages 229-239. Springer-Verlag, 2002. [ bib | at the publisher | .pdf ]
In the context of a virtual university's information broker we study the consumption patterns for information goods and we investigate if Ehrenberg's repeat-buying theory which successfully models regularities in a large number of consumer product markets can be applied in electronic markets for information goods too. First results indicate that Ehrenberg's repeat-buying theory succeeds in describing the consumption patterns of bundles of complementary information goods reasonably well and that this can be exploited for automatically generating anonymous recommendation services based on such information bundles. An experimental anonymous recommender service has been implemented and is currently evaluated in the Virtual University of the Vienna University of Economics and Business Administration at http://vu.wu-wien.ac.at.
[9] Wolfgang Gaul, Andreas Geyer-Schulz, Michael Hahsler, and Lars Schmidt-Thieme. eMarketing mittels Recommendersystemen. Marketing ZFP, 24:47-55, 2002. [ bib | at the publisher ]
Recommendersysteme liefern einen wichtigen Beitrag für die Ausgestaltung von eMarketing Aktivitäten. Ausgehend von einer Diskussion von Input/Output Charakteristika zur Beschreibung solcher Systeme, die bereits eine geeignete Unterscheidung praxisrelevanter Erscheinungsformen erlauben, wird motiviert, warum eine solche Charakterisierung durch die Einbeziehung methodischer Aspekte aus der Marketing Forschung angereichert werden muss. Ein auf der Theorie des Wiederkaufverhaltens basierendes Recommendersystem sowie ein System, das Empfehlungen mittels Analyse des Navigationsverhaltens von Site Besuchern erzeugt, werden vorgestellt. Am Beispiel der Amazon Site werden die Marketing Möglichkeiten von Recommendersystemen verdeutlicht. Abschließend wird zur Abrundung auf weitere Literatur mit Recommendersystem Bezug eingegangen. In einem Ausblick werden Hinweise gegeben, in welche Richtungen Weiterentwicklungen geplant sind.
[10] Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. Wissenschaftliche Recommendersysteme in Virtuellen Universitäten. In H.-J. Appelrath, R. Beyer, U. Marquardt, H.C. Mayr, and C. Steinberger, editors, Unternehmen Hochschule, Wien, Österreich, September 2001. Symposium UH2001, GI Lecture Notes in Informatics (LNI). [ bib | at the publisher | .pdf ]
In diesem Beitrag wird die Rolle von Recommendersystemen und ihr Potential in der Lehr-, Lern- und Forschungsumgebung einer Virtuellen Universität untersucht.Die Hauptidee dieses Beitrags besteht darin, die Informationsaggregationsfähigkeiten von Recommendersystemen in einer Virtuellen Universität auszunutzen, um Tutoren-und Beratungsdienste in einer Virtuellen Universität automatisch zu verbessern, um damit Betreuung und Beratung von Studierenden zu personalisieren und für eine größere Anzahl von Teilnehmern bei gleichzeitiger Entlastung der Lehrenden verfügbar zu machen. Im zweiten Teil dieses Beitrags werden die Recommenderdienste von myVU, der Sammlung der personalisierten Dienste der Virtuellen Universität (VU) der Wirtschaftsuniversität Wien und ihre nicht-personalisierten Variantenbeschrieben, die im Wesentlichen auf beobachtetem Benutzerverhalten und, in der personalisierten Variante, zusätzlich auf Selbstselektion durch Selbsteinschätzung der Erfahrung in einem Fachgebiet beruhen. Abschließend wird noch der innovative Einsatz solcher Systeme diskutiert und an einigen Szenarien beschrieben.
[11] Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. A customer purchase incidence model applied to recommender systems. In WEBKDD2001 Workshop: Mining Log Data Across All Customer TouchPoints, pages 35-45, San Francisco, CA, August 2001. [ bib | .pdf ]
In this contribution we transfer a customer purchase incidence model for consumer products which is based on Ehrenberg's repeat-buying theory to Web-based information products. Ehrenberg's repeat-buying theory successfully describes regularities in a large number of consumer product markets. We show that these regularities exist in electronic markets for information goods too, and that purchase incidence models provide a well founded theoretical foundation for recommender and alert systems. The article consists of three parts. First, we present the architecture of an information market and its instrumentation for collecting data on customer behavior. In the second part Ehrenberg's repeat-buying theory and its assumptions are reviewed and adapted for Web-based information markets. Finally, we present the empirical validation of the model based on data collected from the information market of the Virtual University of the Vienna University of Economics and Business Administration at http://vu.wu-wien.ac.at
[12] Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. Educational and scientific recommender systems: Designing the information channels of the virtual university. International Journal of Engineering Education, 17(2):153-163, 2001. [ bib | at the publisher | .pdf ]
In this article we investigate the role of recommender systems and their potential in the educational and scientific environment of a Virtual University. The key idea is to use the information aggregation capabilities of a recommender system to improve the tutoring and consulting services of a Virtual University in an automated way and thus scale tutoring and consulting in a personalized way to a mass audience. We describe the recommender services of myVU, the collection of the personalized services of the Virtual University (VU) of the Vienna University of Economics and Business Administration which are based on observed user behavior and self assignment of experience which are currently field-tested. We show, how the usual mechanism design problems inherent to recommender systems are addressed in this prototype.
[13] Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. myvu: A next generation recommender system based on observed consumer behavior and interactive evolutionary algorithms. In Wolfgang Gaul, Otto Opitz, and Martin Schader, editors, Data Analysis: Scientific Modeling and Practical Applications, Studies in Classification, Data Analysis, and Knowledge Organization, pages 447-457. Springer Verlag, Heidelberg, Germany, 2000. [ bib | at the publisher | .pdf ]
myVU is a next generation recommender system based on observed consumer behavior and interactive evolutionary algorithms implementing customer relationship management and one-to-one marketing in the educational and scientific broker system of a virtual university. myVU provides a personalized, adaptive WWW-based user interface for all members of a virtual university and it delivers routine recommendations for frequently used scientific and educational Web-sites.

Software Engineering with Patterns

[1] Michael Hahsler. A quantitative study of the adoption of design patterns by open source software developers. In S. Koch, editor, Free/Open Source Software Development, pages 103-123. Idea Group Publishing, 2005. [ bib | at the publisher | .pdf ]
Several successful projects (Linux, Free-BSD, BIND, Apache, etc.) showed that the collaborative and self-organizing process of developing open source software produces reliable, high quality software. Without doubt, the open source software development process differs in many ways from the traditional development process in a commercial environment. An interesting research question is how these differences influence the adoption of traditional software engineering practices. In this chapter we investigate how design patterns, a widely accepted software engineering practice, are adopted by open source developers for documenting changes. We analyze the development process of almost 1,000 open source software projects using version control information and explore differences in pattern adoption using characteristics of projects and developers. By analyzing these differences we provide evidence that design patterns are an important practice in open source projects and that there exist significant differences between developers who use design patterns and who do not.
[2] Michael Hahsler. A quantitative study of the application of design patterns in java. Working Paper 01/2003, Working Papers on Information Processing and Information Management, Institut für Informationsverarbeitung und -wirtschaft, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, January 2003. [ bib | html version | at the publisher ]
Using design patterns is a widely accepted method to improve software development. There are many benefits of the application of patterns claimed in the literature. The most cited claim is that design patterns can provide a common design vocabulary and therefore improve greatly communication between software designers. Most of the claims are supported by experiences reports of practitioners, but there is a lack of quantitative research concerning the actual application of design patterns and about the realization of the claimed benefits. In this paper we analyze the development process of over 1000 open source software projects using version control information. We explore this information to gain an insight into the differences of software development with and without design patterns. By analyzing these differences we provide evidence that design patterns are used for communication and that there is a significant difference between developers who use design patterns and who do not.
[3] Andreas Geyer-Schulz and Michael Hahsler. Software reuse with analysis patterns. In Proceedings of the 8th AMCIS, pages 1156-1165, Dallas, TX, August 2002. Association for Information Systems. [ bib | at the publisher | .pdf ]
The purpose of this article is to promote reuse of domain knowledge by introducing patterns already in the analysis phase of the software life-cycle. We propose an outline template for analysis patterns that strongly supports the whole analysis process from the requirements analysis to the analysis model and further on to its transformation into a flexible and reusable design and implementation. As an example we develop a family of analysis patterns in this paper that deal with a series of pressing problems in cooperative work, collaborative information filtering and sharing, and knowledge management. We evaluate the reuse potential of these patterns by analyzing several components of an information system, that was developed for the Virtual University project of the Vienna University of Economics and Business Administration. The findings of this analysis suggest that using patterns in the analysis phase has the potential to reducing development time significantly by introducing reuse already at the analysis stage and by improving the interface between analysis and design phase.
[4] Andreas Geyer-Schulz and Michael Hahsler. Software engineering with analysis patterns. Working Paper 01/2001, Working Papers on Information Processing and Information Management, Institut für Informationsverarbeitung und -wirtschaft, Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, November 2001. [ bib | html version | at the publisher ]
The purpose of this article is twofold, first to promote the use of patterns in the analysis phase of the software life-cycle by proposing an outline template for analysis patterns that strongly supports the whole analysis process from the requirements analysis to the analysis model and further on to its transformation into a flexible design. Second we present, as an example, a family of analysis patterns that deal with a series of pressing problems in cooperative work, collaborative information filtering and sharing, and knowledge management. We present the step-by-step evolution of the analysis pattern virtual library with active agents starting with a simple pinboard.
[5] Michael Hahsler. Analyse Patterns im Softwareentwicklungsprozeß mit Beispielen für Informationsmanagement und deren Anwendungen für die Virtuellen Universität der Wirtschaftsuniversität Wien. Dissertation, Wirtschaftsuniversität Wien, Augasse 2-6, A 1090 Wien, Österreich, January 2001. [ bib | html version | at the publisher | .ps | .pdf ]
Diese Arbeit beschäftigt sich mit Analyse Patterns, der Anwendung von Patterns in der Analysephase der Softwareentwicklung. In der Designphase werden Patterns seit einigen Jahren eingesetzt, um Expertenwissen und Wiederverwendbarkeit in den Designprozeß einfließen zu lassen. Es existiert bereits eine Fülle an solchen Design Patterns. Die Analysephase ist ein neuer Anwendungsbereich für Patterns, der bisher in der Literatur noch nicht ausreichend behandelt wurde. In dieser Arbeit wird die Anwendung des Pattern-Ansatzes in der Analysephase aufgearbeitet und konkretisiert. Analyse Patterns unterstützen den gesamten Softwareentwicklungsprozeß und helfen bekannte Probleme während der Analysephase zu lösen. Dadurch können Zeit und Kosten bei der Entwicklung neuer Softwaresysteme eingespart werden. Diese Eigenschaften von Analyse Patterns werden anhand konkreter Beispiele in einer Case Study nachgewiesen. Diese Case Study beschreibt den Einsatz von in dieser Arbeit entwickelten Analyse Pattern für Informationsmanagement anhand des Projekts Virtuelle Universität der Wirtschaftsuniversität Wien, in dem ein Internet-Informationsbroker zur Unterstützung von Lehre und Forschung realisiert wird. Die Erfahrungen aus diesem Projekt werden untersucht, und die Auswirkungen der Analyse Patterns auf Wiederverwendung bei der Softwareentwicklung und auf die Akzeptanz des resultierenden Systems werden präsentiert.

Digital Libraries

[1] Georg Fessler, Michael Hahsler, and Michaela Putz. ePubWU - Erfahrungen mit einer Volltext an der Wirtschaftsuniversität Wien. In Christian Enichlmayr, editor, Bibliotheken - Fundament der Bildung, 28. Österreichischer Bibliothekartag 2004, Schriftenreihe der Oö. Landesbibliothek, pages 190-193, 2005. [ bib ]
ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo forschungsbezogene Veröffentlichungen der WU im Volltext über das WWW zugänglich gemacht werden. ePubWU wird als Gemeinschaftsprojekt der Universitätsbibliothek der Wirtschaftsuniversität Wien und der Abteilung für Informationswirtschaft betrieben. Derzeit werden in ePubWU zwei Publikationsarten gesammelt - Working Papers und Dissertationen. In dem Beitrag werden Erfahrungen der über zweijährigen Laufzeit des Projektes dargestellt, u.a. in den Bereichen Akquisition, Workflows, Erschließung, Vermittlung.
[2] Georg Fessler, Michael Hahsler, Michaela Putz, Judith Schwarz, and Brigitta Wiebogen. Projektbericht ePubWU 2001-2003. Augasse 2-6, 1090 Wien, Wirtschaftsuniversität Wien, January 2004. [ bib | .pdf ]
ePubWU ist eine elektronische Plattform für wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo forschungsbezogene Veröffentlichungen der WU im Volltext über das WWW zugänglich gemacht werden. ePubWU ist seit Jänner 2002 im Echtbetrieb und wird als Gemeinschaftsprojekt der Universitätsbibliothek der Wirtschaftsuniversität Wien und der Abteilung für Informationswirtschaft betrieben. Dieser Bericht beinhaltet die Erfahrungen aus der 2-jährigen Pilotphase des Projekts.
[3] Michael Hahsler. Integrating digital document acquisition into a university library: A case study of social and organizational challenges. Journal of Digital Information Management, 1(4):162-171, December 2003. [ bib | at the publisher | .pdf ]
In this article we report on the effort of the university library of the Vienna University of Economics and Business Administration to integrate a digital library component for research documents authored at the university into the existing library infrastructure. Setting up a digital library has become a relatively easy task using the current data base technology and the components and tools freely available. However, to integrate such a digital library into existing library systems and to adapt existing document acquisition work-flows in the organization are non-trivial tasks. We use a research frame work to identify the key players in this change process and to analyze their incentive structures. Then we describe the light-weight integration approach employed by our university and show how it provides incentives to the key players and at the same time requires only minimal adaptation of the organization in terms of changing existing work-flows. Our experience suggests that this light-weight integration offers a cost efficient and low risk intermediate step towards switching to exclusive digital document acquisition.
[4] Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke Thede. An integration strategy for distributed recommender services in legacy library systems. In M. Schader, W. Gaul, and M. Vichi, editors, Between Data Science and Applied Data Analysis, Proceedings of the 26th Annual Conference of the Gesellschaft für Klassifikation e.V., University of Mannheim, July 22-24, 2002, Studies in Classification, Data Analysis, and Knowledge Organization, pages 412-420. Springer-Verlag, July 2003. [ bib | at the publisher ]
Scientific library systems are a very promising application area for recommender services. Scientific libraries could easily develop customer-oriented service portals in the style of amazon.com. Students, university teachers and researchers can reduce their transaction cost (i.e. search and evaluation cost of information products). For librarians, the advantage is an improvement of the customer support by recommendations and the additional support in marketing research, product evaluation, and book selection. In this contribution we present a strategy for integrating a behavior-based distributed recommender service in legacy library systems with minimal changes in the legacy system.

© Michael Hahsler,

Parts of this page have been generated by bibtex2html