| [1] |
Michael Hahsler, Christian Buchta, and Kurt Hornik. Selective
association rule generation. Computational Statistics,
23(2):303-315, April 2008. [ bib | DOI |
at the
publisher |
.pdf ]
Mining association rules is a popular and well
researched method for discovering interesting relations between
variables in large databases. A practical problem is that at medium
to low support values often a large number of frequent itemsets and
an even larger number of association rules are found in a database.
A widely used approach is to gradually increase minimum support and
minimum confidence or to filter the found rules using increasingly
strict constraints on additional measures of interestingness until
the set of rules found is reduced to a manageable size. In this
paper we describe a different approach which is based on the idea
to first define a set of “interesting” itemsets (e.g., by a mixture
of mining and expert knowledge) and then, in a second step to
selectively generate rules for only these itemsets. The main
advantage of this approach over increasing thresholds or filtering
rules is that the number of rules found is significantly reduced
while at the same time it is not necessary to increase the support
and confidence thresholds which might lead to missing important
information in the database.
|
| [2] |
Michael Hahsler, Kurt Hornik, and Christian Buchta. Getting things
in order: An introduction to the R package seriation. Journal
of Statistical Software, 25(3):1-34, March 2008.
[ bib | at the publisher |
.pdf ]
Seriation, i.e., finding a linear order for a set
of objects given data and a loss or merit function, is a basic
problem in data analysis. Caused by the problem's combinatorial
nature, it is hard to solve for all but very small sets.
Nevertheless, both exact solution methods and heuristics are
available. In this paper we present the package seriation
which provides the infrastructure for seriation with R. The
infrastructure comprises data structures to represent linear orders
as permutation vectors, a wide array of seriation methods using a
consistent interface, a method to calculate the value of various
loss and merit functions, and several visualization techniques
which build on seriation. To illustrate how easily the package can
be applied for a variety of applications, a comprehensive
collection of examples is presented.
|
| [3] |
Michael Hahsler and Kurt Hornik. TSP - Infrastructure for the
traveling salesperson problem. Journal of Statistical
Software, 23(2):1-21, December 2007. [ bib | at the publisher |
.pdf ]
The traveling salesperson (or, salesman) problem
(TSP) is a well known and important combinatorial optimization
problem. The goal is to find the shortest tour that visits each
city in a given list exactly once and then returns to the starting
city. Despite this simple problem statement, solving the TSP is
difficult since it belongs to the class of NP-complete problems.
The importance of the TSP arises besides from its theoretical
appeal from the variety of its applications. Typical applications
in operations research include vehicle routing, computer wiring,
cutting wallpaper and job sequencing. The main application in
statistics is combinatorial data analysis, e.g., reordering rows
and columns of data matrices or identifying clusters. In this paper
we introduce the R package TSP which provides a basic
infrastructure for handling and solving the traveling salesperson
problem. The package features S3 classes for specifying a TSP and
its (possibly optimal) solution as well as several heuristics to
find good solutions. In addition, it provides an interface to
Concorde, one of the best exact TSP solvers currently
available.
|
| [4] |
Michael Hahsler and Kurt Hornik. New probabilistic interest
measures for association rules. Intelligent Data Analysis,
11(5):437-455, 2007. [ bib |
at the publisher |
.pdf ]
Mining association rules is an important technique
for discovering meaningful patterns in transaction databases. Many
different measures of interestingness have been proposed for
association rules. However, these measures fail to take the
probabilistic properties of the mined data into account. In this
paper, we start with presenting a simple probabilistic framework
for transaction data which can be used to simulate transaction data
when no associations are present. We use such data and a real-world
database from a grocery outlet to explore the behavior of
confidence and lift, two popular interest measures used for rule
mining. The results show that confidence is systematically
influenced by the frequency of the items in the left hand side of
rules and that lift performs poorly to filter random noise in
transaction data. Based on the probabilistic framework we develop
two new interest measures, hyper-lift and hyper-confidence, which
can be used to filter or order mined association rules. The new
measures show significantly better performance than lift for
applications where spurious rules are problematic.
|
| [5] |
Thomas Reutterer, Michael Hahsler, and Kurt Hornik. Data Mining und
Marketing am Beispiel der explorativen Warenkorbanalyse.
Marketing ZFP, 29(3):165-181, 2007. [ bib | at the
publisher ]
Techniken des Data Mining stellen für die
Marketingforschung und -praxis eine zunehmend bedeutsamere
Bereicherung des herkömmlichen Methodenarsenals dar. Mit dem
Einsatz solcher primär datengetriebener Analysewerkzeuge wird das
Ziel verfolgt, marketingrelevante Informationen ”intelligent” aus
großen Datenbanken (sog. Data Warehouses) zu extrahieren und für
die weitere Entscheidungsvorbereitung in geeigneter Form
aufzubereiten. Im vorliegenden Beitrag werden Berührungspunkte
zwischen Data Mining und Marketing diskutiert und der konkrete
Einsatz ausgewählter Data-Mining-Methoden am Beispiel der
explorativen Warenkorb- bzw. Sortimentsverbundanalyse für einen
Transaktionsdatensatz aus dem Lebensmitteleinzelhandel
demonstriert. Zur Anwendung gelangen dabei Techniken aus dem
Bereich der klassischen Affinitätsanalyse, ein
K-Medoid-Verfahren der Clusteranalyse sowie Werkzeuge zur
Generierung und anschließenden Beurteilung von Assoziationsregeln
zwischen im Sortiment enthaltenen Warengruppen. Die Vorgehensweise
wird dabei anhand des mit der Statistik-Software R frei verfügbaren
Erweiterungspakets arules illustriert.
|
| [6] |
Michael Hahsler. A model-based frequency constraint for mining
associations from transaction data. Data Mining and Knowledge
Discovery, 13(2):137-166, September 2006. [ bib | DOI |
at the
publisher |
.pdf ]
Mining frequent itemsets is a popular method for
finding associated items in databases. For this method, support,
the co-occurrence frequency of the items which form an association,
is used as the primary indicator of the associations's
significance. A single user-specified support threshold is used to
decided if associations should be further investigated. Support has
some known problems with rare items, favors shorter itemsets and
sometimes produces misleading associations. In this paper we
develop a novel model-based frequency constraint as an alternative
to a single, user-specified minimum support. The constraint
utilizes knowledge of the process generating transaction data by
applying a simple stochastic mixture model (the NB model) which
allows for transaction data's typically highly skewed item
frequency distribution. A user-specified precision threshold is
used together with the model to find local frequency thresholds for
groups of itemsets. Based on the constraint we develop the notion
of NB-frequent itemsets and adapt a mining algorithm to find all
NB-frequent itemsets in a database. In experiments with publicly
available transaction databases we show that the new constraint
provides improvements over a single minimum support threshold and
that the precision threshold is more robust and easier to set and
interpret by the user.
|
| [7] |
Christoph Breidert, Michael Hahsler, and Thomas Reutterer. A review
of methods for measuring willingness-to-pay. Innovative
Marketing, 2(4):8-32, 2006. [ bib | at the
publisher |
.pdf ]
Knowledge about a product's willingness-to-pay on
behalf of its (potential) customers plays a crucial role in many
areas of marketing management like pricing decisions or new product
development. Numerous approaches to measure willingness-to-pay with
differential conceptual foundations and methodological implications
have been presented in the relevant literature so far. This article
provides the reader with a systematic overview of the relevant
literature on these competing approaches and associated schools of
thought, recognizes their respective merits and discusses obstacles
and issues regarding their adoption to measuring
willingness-to-pay. Because of its practical relevance, special
focus will be put on indirect surveying techniques and, in
particular, conjoint-based applications will be discussed in more
detail. The strengths and limitations of the individual approaches
are discussed and evaluated from a managerial point of view.
|
| [8] |
Michael Hahsler, Bettina Grün, and Kurt Hornik. arules - A
computational environment for mining association rules and frequent
item sets. Journal of Statistical Software, 14(15):1-25,
October 2005. [ bib | at the publisher |
.pdf ]
Mining frequent itemsets and association rules is a
popular and well researched approach for discovering interesting
relationships between variables in large databases. The R package
arules presented in this paper provides a basic infrastructure for
creating and manipulating input data sets and for analyzing the
resulting itemsets and rules. The package also includes interfaces
to two fast mining algorithms, the popular C implementations of
Apriori and Eclat by Christian Borgelt. These algorithms can be
used to mine frequent itemsets, maximal frequent itemsets, closed
frequent itemsets and association rules.
|
| [9] |
Michael Hahsler. Integrating digital document acquisition into a
university library: A case study of social and organizational
challenges. Journal of Digital Information Management,
1(4):162-171, December 2003. [ bib | at the publisher |
.pdf ]
In this article we report on the effort of the
university library of the Vienna University of Economics and
Business Administration to integrate a digital library component
for research documents authored at the university into the existing
library infrastructure. Setting up a digital library has become a
relatively easy task using the current data base technology and the
components and tools freely available. However, to integrate such a
digital library into existing library systems and to adapt existing
document acquisition work-flows in the organization are non-trivial
tasks. We use a research frame work to identify the key players in
this change process and to analyze their incentive structures. Then
we describe the light-weight integration approach employed by our
university and show how it provides incentives to the key players
and at the same time requires only minimal adaptation of the
organization in terms of changing existing work-flows. Our
experience suggests that this light-weight integration offers a
cost efficient and low risk intermediate step towards switching to
exclusive digital document acquisition.
|
| [10] |
Wolfgang Gaul, Andreas Geyer-Schulz, Michael Hahsler, and Lars
Schmidt-Thieme. eMarketing mittels Recommendersystemen.
Marketing ZFP, 24:47-55, 2002. [ bib |
at the
publisher ]
Recommendersysteme liefern einen wichtigen Beitrag
für die Ausgestaltung von eMarketing Aktivitäten. Ausgehend von
einer Diskussion von Input/Output Charakteristika zur Beschreibung
solcher Systeme, die bereits eine geeignete Unterscheidung
praxisrelevanter Erscheinungsformen erlauben, wird motiviert, warum
eine solche Charakterisierung durch die Einbeziehung methodischer
Aspekte aus der Marketing Forschung angereichert werden muss. Ein
auf der Theorie des Wiederkaufverhaltens basierendes
Recommendersystem sowie ein System, das Empfehlungen mittels
Analyse des Navigationsverhaltens von Site Besuchern erzeugt,
werden vorgestellt. Am Beispiel der Amazon Site werden die
Marketing Möglichkeiten von Recommendersystemen verdeutlicht.
Abschließend wird zur Abrundung auf weitere Literatur mit
Recommendersystem Bezug eingegangen. In einem Ausblick werden
Hinweise gegeben, in welche Richtungen Weiterentwicklungen geplant
sind.
|
| [11] |
Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn.
Educational and scientific recommender systems: Designing the
information channels of the virtual university. International
Journal of Engineering Education, 17(2):153-163, 2001.
[ bib |
at the
publisher | .pdf ]
In this article we investigate the role of
recommender systems and their potential in the educational and
scientific environment of a Virtual University. The key idea is to
use the information aggregation capabilities of a recommender
system to improve the tutoring and consulting services of a Virtual
University in an automated way and thus scale tutoring and
consulting in a personalized way to a mass audience. We describe
the recommender services of myVU, the collection of the
personalized services of the Virtual University (VU) of the Vienna
University of Economics and Business Administration which are based
on observed user behavior and self assignment of experience which
are currently field-tested. We show, how the usual mechanism design
problems inherent to recommender systems are addressed in this
prototype.
|
| [12] |
Andreas Geyer-Schulz, Michael Hahsler, and Georg Schneider. The
virtual university and its embedded agents. ÖGAI Journal,
18(1):14-19, 1999. [ bib ]
In this article we present the current state of
usage of (intelligent) Internet agents in the Virtual University
(VU) of the Vienna University of Economics and BA. We discuss
opportunities and challenges for the development of several classes
of agents and their sensor systems. More specifically, agents of
the following classes embedded in the virtual university system
will be presented: (1) robots which support navigation services and
(2) robots which support communication and collaboration.
|
| [13] |
Peter Bruhn, Andreas Geyer-Schulz, Michael Hahsler, and Markus
Mottel. Genetic machine learning and intelligent internet agents.
ÖGAI Journal, 17(1):18-25, 1998. [ bib ]
In this paper we report on the status quo of the
current machine learning research projects at the Department of
Applied Computer Science of the Institute of Information Processing
and Information Economics of the Vienna University of Economics and
Business Administration. The current research activities can be
categorized as follows: (1) Development of a theoretic framework of
genetic programming. (2) Application of genetic programming for
managerial and economic decision-making and for breeding agents'
strategies in organizational learning. (3) Development, adaptation,
and integration of (intelligent) Internet agents for support of the
virtual organizations. (4) Development of an infrastructure for
intelligent Internet agents in the ”Living Lectures - Virtual
University” project. (5) Cost-benefit analysis of agents, analysis
of tactical and strategic consequences of agents and the analysis
of their economic applications.
|
| [1] |
Michael Hahsler, Kurt Hornik, and Thomas Reutterer.
Warenkorbanalyse mit Hilfe der Statistik-Software R. In Peter
Schnedlitz, Renate Buber, Thomas Reutterer, Arnold Schuh, and
Christoph Teller, editors, Innovationen in Marketing,
pages 144-163. Linde-Verlag, 2006. [ bib | at the
publisher |
.pdf ]
Die Warenkorb- oder Sortimentsverbundanalyse
bezeichnet eine Reihe von Methoden zur Untersuchung der bei einem
Einkauf gemeinsam nachgefragten Produkte oder Kategorien aus einem
Handelssortiment. In diesem Beitrag wird die explorative
Warenkorbanalyse näher beleuchtet, welche eine Verdichtung und
kompakte Darstellung der in (zumeist sehr umfangreichen)
Transaktionsdaten des Einzelhandels auffindbaren Verbundbeziehungen
beabsichtigt. Mit einer enormen Anzahl an verfügbaren
Erweiterungspaketen bietet sich die frei verfügbare
Statistik-Software R als ideale Basis für die Durchführung solcher
Warenkorbanalysen an. Die im Erweiterungspaket arules vorhandene
Infrastruktur für Transaktionsdaten stellt eine flexible Basis für
die Warenkorbanalyse bereit. Unterstützt wird die effiziente
Darstellung, Bearbeitung und Analyse von Warenkorbdaten mitsamt
beliebigen Zusatzinformationen zu Produkten (zum Beispiel
Sortimentshierarchie) und zu Transaktionen (zum Beispiel Umsatz
oder Deckungsbeitrag). Das Paket ist nahtlos in R integriert und
ermöglicht dadurch die direkte Anwendung von bereits vorhandenen
modernsten Verfahren für Sampling, Clusterbildung und
Visualisierung von Warenkorbdaten. Zusätzlich sind in arules
gängige Algorithmen zum Auffinden von Assoziationsregeln und die
notwendigen Datenstrukturen zur Analyse von Mustern vorhanden. Eine
Auswahl der wichtigsten Funktionen wird anhand eines realen
Transaktionsdatensatzes aus dem Lebensmitteleinzelhandel
demonstriert.
|
| [2] |
Michael Hahsler. A quantitative study of the adoption of design
patterns by open source software developers. In S. Koch,
editor, Free/Open Source Software Development, pages
103-123. Idea Group Publishing, 2005. [ bib | at the
publisher |
.pdf ]
Several successful projects (Linux, Free-BSD, BIND,
Apache, etc.) showed that the collaborative and self-organizing
process of developing open source software produces reliable, high
quality software. Without doubt, the open source software
development process differs in many ways from the traditional
development process in a commercial environment. An interesting
research question is how these differences influence the adoption
of traditional software engineering practices. In this chapter we
investigate how design patterns, a widely accepted software
engineering practice, are adopted by open source developers for
documenting changes. We analyze the development process of almost
1,000 open source software projects using version control
information and explore differences in pattern adoption using
characteristics of projects and developers. By analyzing these
differences we provide evidence that design patterns are an
important practice in open source projects and that there exist
significant differences between developers who use design patterns
and who do not.
|
| [3] |
Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke
Thede. Behavior-based recommender systems as value-added services
for scientific libraries. In Hamparsum Bozdogan, editor,
Statistical Data Mining & Knowledge Discovery, pages
433-454. Chapman & Hall / CRC, July 2003. [ bib |
at the publisher ]
Amazon.com paved the way for several large-scale,
behavior-based recommendation services as an important value-added
expert advice service for online book shops. In this contribution
we discuss the effects (and possible reductions of transaction
costs) for such services and investigate how such value-added
services can be implemented in context of scientific libraries. For
this purpose we present a new, recently developed recommender
system based on a stochastic purchase incidence model, present the
underlying stochastic model from repeat-buying theory and analyze
whether the underlying assumptions on consumer behavior holds for
users of scientific libraries, too. We analyzed the logfiles with
approximately 85 million HTTP-transactions of the web-based online
public access catalog (OPAC) of the library of the Universität
Karlsruhe (TH) since January 2001 and performed some diagnostic
checks. The recommender service is fully operational within the
library system of the Universität Karlsruhe (TH) since
2002/06/22.
|
| [4] |
Andreas Geyer-Schulz and Michael Hahsler. Comparing two recommender
algorithms with the help of recommendations by peers. In O.R.
Zaiane, J. Srivastava, M. Spiliopoulou, and
B. Masand, editors, WEBKDD 2002 - Mining Web Data for
Discovering Usage Patterns and Profiles 4th International Workshop,
Edmonton, Canada, July 2002, Revised Papers, Lecture Notes in
Computer Science LNAI 2703, pages 137-158. Springer-Verlag, 2003.
(Revised version of the WEBKDD 2002 paper “Evaluation of
Recommender Algorithms for an Internet Information Broker based on
Simple Association Rules and on the Repeat-Buying Theory”).
[ bib |
at the publisher | .pdf ]
Since more and more Web sites, especially sites of
retailers, offer automatic recommendation services using Web usage
mining, evaluation of recommender algorithms has become
increasingly important. In this paper we present a framework for
the evaluation of different aspects of recommender systems based on
the process of discovering knowledge in databases introduced by
Fayyad et al. and we summarize research already done in this area.
One aspect identified in the presented evaluation framework is
widely neglected when dealing with recommender algorithms. This
aspect is to evaluate how useful patterns extracted by recommender
algorithms are to support the social process of recommending
products to others, a process normally driven by recommendations by
peers or experts. To fill this gap for recommender algorithms based
on frequent itemsets extracted from usage data we evaluate the
usefulness of two algorithms. The first recommender algorithm uses
association rules, and the other algorithm is based on the
repeat-buying theory known from marketing research. We use 6 months
of usage data from an educational Internet information broker and
compare useful recommendations identified by users from the target
group of the broker (peers) with the recommendations produced by
the algorithms. The results of the evaluation presented in this
paper suggest that frequent itemsets from usage histories match the
concept of useful recommendations expressed by peers with
satisfactory accuracy (higher than 70%) and precision (between 60%
and 90%). Also the evaluation suggests that both algorithms studied
in the paper perform similar on real-world data if they are tuned
properly.
|
| [5] |
Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. A
customer purchase incidence model applied to recommender systems.
In R. Kohavi, B.M. Masand, M. Spiliopoulou, and
J. Srivastava, editors, WEBKDD 2001 - Mining Log Data
Across All Customer Touch Points, Third International Workshop, San
Francisco, CA, USA, August 26, 2001, Revised Papers, Lecture
Notes in Computer Science LNAI 2356, pages 25-47. Springer-Verlag,
July 2002. (Revised version of the WEBKDD 2001 paper “A Customer
Purchase Incidence Model Applied to Recommender Systems”).
[ bib |
at the
publisher |
.pdf ]
In this contribution we transfer a customer
purchase incidence model for consumer products which is based on
Ehrenberg s repeat-buying theory to Web-based information products.
Ehrenberg s repeat-buying theory successfully describes
regularities on a large number of consumer product markets. We show
that these regularities exist in electronic markets for information
goods, too, and that purchase incidence models provide a well
founded theoretical base for re-commender and alert services. The
article consists of two parts. In the first part Ehrenberg s
repeat-buying theory and its assumptions are reviewed and adapted
for web-based information markets. Second, we present the empirical
validation of the model based on data collected from the
information market of the Virtual University of the Vienna
University of Economics and Business Administration from September
1999 to May 2001.
|
| [6] |
Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. myvu:
A next generation recommender system based on observed consumer
behavior and interactive evolutionary algorithms. In Wolfgang Gaul,
Otto Opitz, and Martin Schader, editors, Data Analysis:
Scientific Modeling and Practical Applications, Studies in
Classification, Data Analysis, and Knowledge Organization, pages
447-457. Springer Verlag, Heidelberg, Germany, 2000.
[ bib |
at the publisher | .pdf ]
myVU is a next generation recommender system based
on observed consumer behavior and interactive evolutionary
algorithms implementing customer relationship management and
one-to-one marketing in the educational and scientific broker
system of a virtual university. myVU provides a personalized,
adaptive WWW-based user interface for all members of a virtual
university and it delivers routine recommendations for frequently
used scientific and educational Web-sites.
|
| [1] |
Christoph Breidert and Michael Hahsler. Adaptive conjoint analysis
for pricing music downloads. In R. Decker and H.-J. Lenz,
editors, Advances in Data Analysis, Proceedings of the 30th
Annual Conference of the Gesellschaft für Klassifikation e.V.,
Freie Universität Berlin, March 8-10, 2006, Studies in
Classification, Data Analysis, and Knowledge Organization, pages
409-416. Springer-Verlag, 2007. [ bib | at the
publisher |
.pdf ]
Finding the right pricing for music downloads is of
ample importance to the recording industry and music download
service providers. For the recently introduced music downloads,
reference prices are still developing and to find a revenue
maximizing pricing scheme is a challenging task. The most commonly
used approach is to employ linear pricing (e.g., iTunes,
musicload). Lately, subscription models have emerged, offering
their customers unlimited access to streaming music for a monthly
fee (e.g., Napster, RealNetworks). However, other pricing
strategies could also be used, such as quantity rebates starting at
certain download volumes. Research has been done in this field and
Buxmann et al. (2005) have shown that price cuts can improve
revenue. In this paper we apply different approaches to estimate
consumer's willingness to pay (WTP) for music downloads and compare
our findings with the pricing strategies currently used in the
market. To make informed decisions about pricing, knowledge about
the consumer's WTP is essential. Three approaches based on adaptive
conjoint analysis to estimate the WTP for bundles of music
downloads are compared. Two of the approaches are based on a
status-quo product (at market price and alternatively at an
individually self-stated price), the third approach uses a linear
model assuming a fixed utility per title. All three methods seem to
be robust and deliver reasonable estimations of the respondent's
WTPs. However, all but the linear model need an externally set
price for the status-quo product which can introduce a bias.
|
| [2] |
Michael Hahsler and Kurt Hornik. Building on the arules
infrastructure for analyzing transaction data with R. In
R. Decker and H.-J. Lenz, editors, Advances in Data
Analysis, Proceedings of the 30th Annual Conference of the
Gesellschaft für Klassifikation e.V., Freie Universität Berlin,
March 8-10, 2006, Studies in Classification, Data Analysis,
and Knowledge Organization, pages 449-456. Springer-Verlag, 2007.
[ bib | at the
publisher |
.pdf ]
The free and extensible statistical computing
environment R with its enormous number of extension packages
already provides many state-of-the-art techniques for data
analysis. Support for association rule mining, a popular
exploratory method which can be used, among other purposes, for
uncovering cross-selling opportunities in market baskets,
has become available recently with the R extension
package arules. After a brief introduction to transaction data
and association rules, we present the formal framework implemented
in arules and demonstrate how clustering and association rule
mining can be applied together using a market basket data set from
a typical retailer. This paper shows that implementing a basic
infrastructure with formal classes in R provides an extensible
basis which can very efficiently be employed for developing new
applications (such as clustering transactions) in addition to
association rule mining.
|
| [3] |
Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of
probabilistic data modeling for mining association rules. In
M. Spiliopoulou, R. Kruse, C. Borgelt,
A. Nürnberger, and W. Gaul, editors, From Data and
Information Analysis to Knowledge Engineering, Proceedings of the
29th Annual Conference of the Gesellschaft für Klassifikation e.V.,
University of Magdeburg, March 9-11, 2005, Studies in
Classification, Data Analysis, and Knowledge Organization, pages
598-605. Springer-Verlag, 2006. [ bib | at the
publisher |
.pdf ]
Mining association rules is an important technique
for discovering meaningful patterns in transaction databases. In
the current literature, the properties of algorithms to mine
association rules are discussed in great detail. We present a
simple probabilistic framework for transaction data which can be
used to simulate transaction data when no associations are present.
We use such data and a real-world grocery database to explore the
behavior of confidence and lift, two popular interest measures used
for rule mining. The results show that confidence is systematically
influenced by the frequency of the items in the left-hand-side of
rules and that lift performs poorly to filter random noise in
transaction data. The probabilistic data modeling approach
presented in this paper not only is a valuable framework to analyze
interest measures but also provides a starting point for further
research to develop new interest measures which are based on
statistical tests and geared towards the specific properties of
transaction data.
|
| [4] |
Christoph Breidert, Michael Hahsler, and Lars Schmidt-Thieme.
Reservation price estimation by adaptive conjoint analysis. In
Claus Weihs and Wolfgang Gaul, editors, Classification - the
Ubiquitous Challenge, Proceedings of the 28th Annual Conference of
the Gesellschaft für Klassifikation e.V., University of Dortmund,
March 9-11, 2004, Studies in Classification, Data Analysis,
and Knowledge Organization, pages 577-584. Springer-Verlag, 2005.
[ bib | at the
publisher |
.pdf ]
Though reservation prices are needed for many
business decision processes, e.g., pricing new products, it often
turns out to be difficult to measure them. Many researchers reuse
conjoint analysis data with price as an attribute for this task
(e.g., Kohli and Mahajan (1991)). In this setting the information
if a consumer buys a product at all is not elicited which makes
reservation price estimation impossible. We propose an additional
interview scene at the end of the adaptive conjoint analysis
(Johnson (1987)) to estimate reservation prices for all product
configurations. This will be achieved by the usage of product
stimuli as well as price scales that are adapted for each proband
to reflect individual choice behavior. We present preliminary
results from an ongoing large-sample conjoint interview of
customers of a major mobile phone retailer in Germany.
|
| [5] |
Georg Fessler, Michael Hahsler, and Michaela Putz. ePubWU -
Erfahrungen mit einer Volltext an der Wirtschaftsuniversität Wien.
In Christian Enichlmayr, editor, Bibliotheken - Fundament der
Bildung, 28. Österreichischer Bibliothekartag 2004,
Schriftenreihe der Oö. Landesbibliothek, pages 190-193, 2005.
[ bib ]
ePubWU ist eine elektronische Plattform für
wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo
forschungsbezogene Veröffentlichungen der WU im Volltext über das
WWW zugänglich gemacht werden. ePubWU wird als Gemeinschaftsprojekt
der Universitätsbibliothek der Wirtschaftsuniversität Wien und der
Abteilung für Informationswirtschaft betrieben. Derzeit werden in
ePubWU zwei Publikationsarten gesammelt - Working Papers und
Dissertationen. In dem Beitrag werden Erfahrungen der über
zweijährigen Laufzeit des Projektes dargestellt, u.a. in den
Bereichen Akquisition, Workflows, Erschließung, Vermittlung.
|
| [6] |
Michael Hahsler. Optimizing web sites for customer retention. In
Bing Liu, Myra Spiliopoulou, Jaideep Srivastava, and Alex Tuzhilin,
editors, Proceedings of the 2005 International Workshop on
Customer Relationship Management: Data Mining Meets Marketing,
November 18-19, 2005, New York City, USA, 2005.
[ bib |
.pdf ]
With customer relationship management (CRM)
companies move away from a mainly product-centered view to a
customer-centered view. Resulting from this change, the effective
management of how to keep contact with customers throughout
different channels is one of the key success factors in today's
business world. Company Web sites have evolved in many industries
into an extremely important channel through which customers can be
attracted and retained. To analyze and optimize this channel,
accurate models of how customers browse through the Web site and
what information within the site they repeatedly view are crucial.
Typically, data mining techniques are used for this purpose.
However, there already exist numerous models developed in marketing
research for traditional channels which could also prove valuable
to understanding this new channel. In this paper we propose the
application of an extension of the Logarithmic Series Distribution
(LSD) model repeat-usage of Web-based information and thus to
analyze and optimize a Web Site's capability to support one goal of
CRM, to retain customers. As an example, we use the university's
blended learning web portal with over a thousand learning resources
to demonstrate how the model can be used to evaluate and improve
the Web site's effectiveness.
|
| [7] |
Michael Hahsler and Stefan Koch. Discussion of a large-scale open
source data collection methodology. In 38th Annual Hawaii
International Conference on System Sciences (HICSS'05), January
3-6, 2005 Hilton Waikoloa Village, Big Island, Hawaii. IEEE
Computer Society Press, 2005. [ bib |
at the publisher |
.pdf ]
In this paper we discusses in detail a possible
methodology for collecting repository data on a large number of
open source software projects from a single project hosting and
community site. The process of data retrieval is described along
with the possible metrics that can be computed and which can be
used for further analyses. Example research areas to be addressed
with the available data and first results are given. Then, both
advantages and disadvantages of the proposed methodology are
discussed together with implications for future approaches.
|
| [8] |
Michael Hahsler and Stefan Koch. Cooperation and disruptive
behaviour - learning from a multi-player internet gaming community.
In Piet Kommers, Pedro Isaias, and Miguel Baptista Nunes,
editors, IADIS International Conference Web Based Communities
2004, Lisbon, Portugal, 24-26 March 2004, pages 35-42.
International Association for Development of the Information
Society (IADIS), 2004. [ bib | at the
publisher |
.pdf ]
In this paper we report possibilities and
experiences from employing Counter-Strike, a popular multi-player
Internet computer game and its resulting online community in
research on cooperative behaviour. Advantages from using this game
include easy availability of rich data, the emphasis on
team-playing, as well as numerous possibilities to change the
experiment settings. We use descriptive game theory and statistical
methods to explore cooperation within the game as well as the way
the player community deals with disruptive behaviour. After a quick
introduction to the basic rules of Counter-Strike, we describe the
setup of the Internet game server used. We then present empirical
results from the game server logs where cooperation within the game
is analyzed from a game theoretic perspective. Finally we discuss
the applications of our results to other online communities,
including cooperation and self-regulation in open source
teams.
|
| [9] |
Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke
Thede. An integration strategy for distributed recommender services
in legacy library systems. In M. Schader, W. Gaul, and
M. Vichi, editors, Between Data Science and Applied Data
Analysis, Proceedings of the 26th Annual Conference of the
Gesellschaft für Klassifikation e.V., University of Mannheim, July
22-24, 2002, Studies in Classification, Data Analysis, and
Knowledge Organization, pages 412-420. Springer-Verlag, July 2003.
[ bib |
at the publisher ]
Scientific library systems are a very promising
application area for recommender services. Scientific libraries
could easily develop customer-oriented service portals in the style
of amazon.com. Students, university teachers and researchers can
reduce their transaction cost (i.e. search and evaluation cost of
information products). For librarians, the advantage is an
improvement of the customer support by recommendations and the
additional support in marketing research, product evaluation, and
book selection. In this contribution we present a strategy for
integrating a behavior-based distributed recommender service in
legacy library systems with minimal changes in the legacy
system.
|
| [10] |
Andreas Geyer-Schulz, Michael Hahsler, and Anke Thede. Comparing
association-rules and repeat-buying based recommender systems in a
B2B environment. In M. Schader, W. Gaul, and
M. Vichi, editors, Between Data Science and Applied Data
Analysis, Proceedings of the 26th Annual Conference of the
Gesellschaft für Klassifikation e.V., University of Mannheim, July
22-24, 2002, Studies in Classification, Data Analysis, and
Knowledge Organization, pages 421-429. Springer-Verlag, July 2003.
[ bib |
at
the publisher ]
In this contribution we present a systematic
evaluation and comparison of recommender systems based on simple
association rules and on repeat-buying theory. Both recommender
services are based on the customer purchase histories of a
medium-sized B2B-merchant for computer accessories. With the help
of product managers an evaluation set for recommendations was
generated. With regard to this evaluation set, recommendations
produced by both methods are evaluated and several error measures
are computed. This provides an empirical test whether frequent item
sets or outliers of a stochastic purchase incidence model are
suitable concepts for automatically generation recommendations.
Furthermore, the loss function (performance measures) of the two
models are compared and the sensitivity with regard to a
misspecification of the model parameters is discussed.
|
| [11] |
Edward Bernroider, Michael Hahsler, Stefan Koch, and Volker Stix.
Data Envelopment Analysis zur Unterstützung der Auswahl und
Einführung von ERP-Systemen. In Andreas Geyer-Schulz and Alfred
Taudes, editors, Informationswirtschaft: Ein Sektor mit
Zukunft, Symposium 4.-5. September 2003, Wien, Österreich,
Lecture Notes in Informatics (LNI) P-33, pages 11-26. Gesellschaft
für Informatik, 2003. [ bib |
at the publisher ]
Immer mehr Unternehmen setzen
betriebswirtschaftliche Standardsoftwarepakete wie beispielsweise
SAP R/3 oder BaaN ein. Die Auswahl und die Einführung solcher
Systeme stellt für die meisten Unternehmen ein strategisch
wichtiges IT-Projekt dar, das mit massiven Risiken verbunden ist.
Bei der Auswahl des am besten geeigneten Systems gilt es einen
Gruppenentscheidungsprozess zu unterstützen. Das darauf folgende
Einführungsprojekt muss effizient, den ”best practices”
entsprechend, durchgeführt werden. In dieser Arbeit wird anhand von
Beispielen aufgezeigt, wie beide Prozesse - die Auswahl und die
Einführung - durch die Data Envelopment Analysis unterstützt werden
können.
|
| [12] |
Andreas Geyer-Schulz, Michael Hahsler, Andreas Neumann, and Anke
Thede. Recommenderdienste für wissenschaftliche Bibliotheken und
Bibliotheksverbünde. In Andreas Geyer-Schulz and Alfred Taudes,
editors, Informationswirtschaft: Ein Sektor mit Zukunft,
Symposium 4.-5. September 2003, Wien, Österreich, Lecture
Notes in Informatics (LNI) P-33, pages 43-58. Gesellschaft für
Informatik, 2003. [ bib |
at the publisher ]
Wissenschaftliche Bibliotheken stellen ein
vielversprechendes Anwendungsfeld für Recommenderdienste dar.
Wissenschaftliche Bibliotheken können leicht kundenzentrierte
Serviceportale im Stil von amazon.com entwickeln. Studenten,
Universitätslehrer und -forscher können ihren Anteil an den
Transaktionskosten (z.B. Such- und Bewertungskosten für
Informationsprodukte) reduzieren. Für Bibliothekare liegt der
Vorteil in einer Verbesserung der Kundenberatung durch Empfehlungen
und einer zusätzlichen Unterstützung bei der Marktforschung,
Produktbewertung und dem Bestandsmanagement. In diesem Beitrag
präsentieren wir eine Strategie, mit der verhaltensbasierte,
verteilte Recommenderdienste in bestehende Bibliothekssysteme mit
minimalem Aufwand integriert werden können und berichten über
unsere Erfahrungen bei der Einführung eines solchen Dienstes an der
Universitätsbibliothek der Universität Karlsruhe (TH).
|
| [13] |
Andreas Geyer-Schulz and Michael Hahsler. Software reuse with
analysis patterns. In Proceedings of the 8th AMCIS, pages
1156-1165, Dallas, TX, August 2002. Association for Information
Systems. [ bib |
at
the publisher |
.pdf ]
The purpose of this article is to promote reuse of
domain knowledge by introducing patterns already in the analysis
phase of the software life-cycle. We propose an outline template
for analysis patterns that strongly supports the whole analysis
process from the requirements analysis to the analysis model and
further on to its transformation into a flexible and reusable
design and implementation. As an example we develop a family of
analysis patterns in this paper that deal with a series of pressing
problems in cooperative work, collaborative information filtering
and sharing, and knowledge management. We evaluate the reuse
potential of these patterns by analyzing several components of an
information system, that was developed for the Virtual University
project of the Vienna University of Economics and Business
Administration. The findings of this analysis suggest that using
patterns in the analysis phase has the potential to reducing
development time significantly by introducing reuse already at the
analysis stage and by improving the interface between analysis and
design phase.
|
| [14] |
Andreas Geyer-Schulz and Michael Hahsler. Evaluation of recommender
algorithms for an internet information broker based on simple
association rules and on the repeat-buying theory. In Brij Masand,
Myra Spiliopoulou, Jaideep Srivastava, and Osmar R. Zaiane,
editors, Fourth WEBKDD Workshop: Web Mining for Usage Patterns
& User Profiles, pages 100-114, Edmonton, Canada, July
2002. [ bib |
.pdf ]
Association rules are a widely used technique to
generate recommendations in commercial and research recommender
systems. Since more and more Web sites, especially of retailers,
offer automatic recommender services using Web usage mining,
evaluation of recommender algorithms becomes increasingly
important. In this paper we first present a framework for the
evaluation of different aspects of recommender systems based on the
process of discovering knowledge in databases of Fayyad et al. and
then we focus on the comparison of the performance of two
recommender algorithms based on frequent itemsets. The first
recommender algorithm uses association rules, and the other
recommender algorithm is based on the repeat-buying theory known
from marketing research. For the evaluation we concentrated on how
well the patterns extracted from usage data match the concept of
useful recommendations of users. We use 6 month of usage data from
an educational Internet information broker and compare useful
recommendations identified by users from the target group of the
broker with the results of the recommender algorithms. The results
of the evaluation presented in this paper suggest that frequent
itemsets from purchase histories match the concept of useful
recommendations expressed by users with satisfactory accuracy
(higher than 70%) and precision (between 60% and 90%). Also the
evaluation suggests that both algorithms studied in the paper
perform similar on real-world data if they are tuned
properly.
|
| [15] |
Walter Böhm, Andreas Geyer-Schulz, Michael Hahsler, and Maximillian
Jahn. Repeat buying theory and its application for recommender
services. In O. Opitz and M. Schwaiger, editors,
Exploratory Data Analysis in Empirical Research, Proceedings of
the 25th Annual Conference of the Gesellschaft für Klassifikation
e.V., University of Munich, March 14-16, 2001, pages 229-239.
Springer-Verlag, 2002. [ bib |
at the publisher | .pdf ]
In the context of a virtual university's
information broker we study the consumption patterns for
information goods and we investigate if Ehrenberg's repeat-buying
theory which successfully models regularities in a large number of
consumer product markets can be applied in electronic markets for
information goods too. First results indicate that Ehrenberg's
repeat-buying theory succeeds in describing the consumption
patterns of bundles of complementary information goods reasonably
well and that this can be exploited for automatically generating
anonymous recommendation services based on such information
bundles. An experimental anonymous recommender service has been
implemented and is currently evaluated in the Virtual University of
the Vienna University of Economics and Business Administration at
http://vu.wu-wien.ac.at.
|
| [16] |
Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn.
Recommendations for virtual universities from observed user
behavior. In W. Gaul and G. Ritter, editors,
Classification, Automation, and New Media, Proceedings of the
24th Annual Conference of the Gesellschaft für Klassifikation e.V.,
University of Passau, March 15-17, 2000, pages 273-280.
Springer-Verlag, 2002. [ bib |
at the publisher | .pdf ]
Recently recommender systems started to gain ground
in commercial Web-applications. For example, the online-bookseller
amazon.com recommends his customers books similar to the
ones they bought using the analysis of observed purchase behavior
of consumers. In this article we describe a generic architecture
for recommender services for information markets which has been
implemented in the setting of the Virtual University of the Vienna
University of Economics and Business Administration
(http://vu.wu-wien.ac.at). The architecture of a recommender
service is defined as an agency of interacting software agents. It
consists of three layers, namely the meta-data management system,
the broker management system and the business-to-customer
interface.
|
| [17] |
Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn.
Wissenschaftliche Recommendersysteme in Virtuellen Universitäten.
In H.-J. Appelrath, R. Beyer, U. Marquardt, H.C. Mayr,
and C. Steinberger, editors, Unternehmen Hochschule,
Wien, Österreich, September 2001. Symposium UH2001, GI Lecture
Notes in Informatics (LNI). [ bib |
at the publisher |
.pdf ]
In diesem Beitrag wird die Rolle von
Recommendersystemen und ihr Potential in der Lehr-, Lern- und
Forschungsumgebung einer Virtuellen Universität untersucht.Die
Hauptidee dieses Beitrags besteht darin, die
Informationsaggregationsfähigkeiten von Recommendersystemen in
einer Virtuellen Universität auszunutzen, um Tutoren-und
Beratungsdienste in einer Virtuellen Universität automatisch zu
verbessern, um damit Betreuung und Beratung von Studierenden zu
personalisieren und für eine größere Anzahl von Teilnehmern bei
gleichzeitiger Entlastung der Lehrenden verfügbar zu machen. Im
zweiten Teil dieses Beitrags werden die Recommenderdienste von
myVU, der Sammlung der personalisierten Dienste der Virtuellen
Universität (VU) der Wirtschaftsuniversität Wien und ihre
nicht-personalisierten Variantenbeschrieben, die im Wesentlichen
auf beobachtetem Benutzerverhalten und, in der personalisierten
Variante, zusätzlich auf Selbstselektion durch Selbsteinschätzung
der Erfahrung in einem Fachgebiet beruhen. Abschließend wird noch
der innovative Einsatz solcher Systeme diskutiert und an einigen
Szenarien beschrieben.
|
| [18] |
Andreas Geyer-Schulz, Michael Hahsler, and Maximillian Jahn. A
customer purchase incidence model applied to recommender systems.
In WEBKDD2001 Workshop: Mining Log Data Across All Customer
TouchPoints, pages 35-45, San Francisco, CA, August 2001.
[ bib |
.pdf ]
In this contribution we transfer a customer
purchase incidence model for consumer products which is based on
Ehrenberg's repeat-buying theory to Web-based information products.
Ehrenberg's repeat-buying theory successfully describes
regularities in a large number of consumer product markets. We show
that these regularities exist in electronic markets for information
goods too, and that purchase incidence models provide a well
founded theoretical foundation for recommender and alert systems.
The article consists of three parts. First, we present the
architecture of an information market and its instrumentation for
collecting data on customer behavior. In the second part
Ehrenberg's repeat-buying theory and its assumptions are reviewed
and adapted for Web-based information markets. Finally, we present
the empirical validation of the model based on data collected from
the information market of the Virtual University of the Vienna
University of Economics and Business Administration at
http://vu.wu-wien.ac.at
|
| [19] |
Andreas Geyer-Schulz and Michael Hahsler. Automatic labelling of
references for information systems. In Reinhold Decker and Wolfgang
Gaul, editors, Classification and Information Processing at the
Turn of the Millennium, Proceedings of the 23rd Annual Conference
of the Gesellschaft für Klassifikation e.V., University of
Bielefeld, March 10-12, 1999, Studies in Classification, Data
Analysis, and Knowledge Organization, pages 451-459.
Springer-Verlag, 2000. [ bib |
at the publisher |
.pdf ]
Today users of Internet information services like
e.g. Yahoo! or AltaVista often experience high search costs. One
important reason for this is the necessity to browse long reference
lists manually, because of the well-known problems of relevance
ranking. A possible remedy is to complement the references with
automatically generated labels which provide valuable information
about the referenced information source. Presenting suitably
labelled lists of references to users aims at improving the clarity
and thus comprehensibility of the information offered and at
reducing the search cost. In the following we survey several
dimensions for labelling (time, frequency of usage, region,
language, subject, industry, and preferences) and the corresponding
classification problems. To solve these problems automatically we
sketch for each problem a pragmatic mix of machine learning methods
and report selected results.
|
| [20] |
Andreas Geyer-Schulz and Michael Hahsler. Lebenslanges
virtuelles Lernen. In Franciszek Grucza, editor, Europas
Arbeitswelt von Morgen, pages 51-54, Wien, 2000. Wiener
Zentrum der Polnischen Akademie der Wissenschaften. [ bib |
at the
publisher ] |
| [21] |
Michael Hahsler and Bernd Simon. User-centered navigation re-design
for web-based information systems. In H. Michael Chung,
editor, Proceedings of the Sixth Americas Conference on
Information Systems (AMCIS 2000), pages 192-198, Long Beach,
CA, 2000. Association for Information Systems. [ bib | at
the publisher |
.pdf ]
Navigation design for web-based information systems
(e.g. e-commerce sites, intranet solutions) that ignores
user-participation reduces the system's value and can even lead to
system failure. In this paper we introduce a user-centered,
explorative approach to re-designing navigation structures of
web-based information systems, and describe how it can be
implemented in order to provide flexibility and reduce maintenance
costs. We conclude with lessons learned from the navigation
re-design project at the Vienna University of Economics and
Business Administration.
|
| [22] |
Andreas Geyer-Schulz, Michael Hahsler, and Georg Schneider. The
virtual university as a network economy. In Heinrich C. Mayr,
Claudia Steinberger, Hans-Jürgen Appelrath, and Uwe Marquardt,
editors, Informatik '99, Unternehmen Hochschule '99,
Workshop-Unterlagen, pages 75-86, Bielefeld, Germany, October
1999. [ bib ] |
| [1] |
Rao Mallik Kotamarti, Douglas W. Raiford, Michael
Hahsler, Yuhang Wang, Monnie McGee, and Maggie Dunham. Targeted
genomic signature profiling with quasi-alignment statistics.
Article 63, COBRA Preprint Series, November 2009.
[ bib |
at the
publisher ]
Genome databases continue to expand with no change
in the basic format of sequence data. The prevalent use of the
classic alignment based search tools like BLAST have significantly
pushed the limits of genome isolate research. The relatively new
frontier of Metagenomic research deals with thousands of diverse
genomes with newer demands beyond the current homologue search and
analysis. Compressing sequence data into a complex form could
facilitate a broader range of sequence analyses. To this end, this
research explores reorganizing sequence data as complex Markov
signatures also known as Extensible Markov Models. Markov models
have found successful application in biological sequence analysis
applications through small, but important extensions to the
original theory of Markov Chains. Extensible Markov Model (EMM)
offers a novel Quasi-alignment complement to the classic alignment
based homologous sequence search methods like BLAST. EMM based
bioinformatic analysis (EMMBA) incorporates automatic learning
which allows the Markov chain creation dynamically. Oligonucletide
or genomic word frequencies form the core sequence data in
alignment free methods. EMMBA extends the Karlin-Altschul
statistics to bring forth an analogous E-Score statistical
significance to the quasi-alignment domain. By consolidating a
community of sequences into a single searchable profile, EMM
methodology further reduces the search space for classification.
Through dynamic generation of the score matrix for each community
profile, EMMBA fine tunes the score assignments. Each evaluation
iteratively adjusts the profile score matrix to account for point
probabilities of the query to ensure Karlin-Altschul assumptions
are satisfied to derive meaningful statistical signifi- cance. The
presence of multiple quasi-alignments resembles multiple local
alignments of BLAST. Quasi-alignments are scored based on a
difference distribution of Gumbel scores. Species signature
profiles allow for statistical validation of novel species
identification. Working in EMM transformation space speeds up
classification and generates distance matrix for differentiation.
The techniques and metrics presented are validated using the
microbial 16s rRNA sequence data from NCBI.
|
| [2] |
Michael Hahsler and Kurt Hornik. Dissimilarity plots: A visual
exploration tool for partitional clustering. Report 89,
Research Report Series, Department of Statistics and Mathematics,
Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria,
September 2009. [ bib |
at the publisher ]
For hierarchical clustering, dendrograms provide
convenient and powerful visualization. Although many visualization
methods have been suggested for partitional clustering, their
usefulness deteriorates quickly with increasing dimensionality of
the data and/or they fail to represent structure between and within
clusters simultaneously. In this paper we extend (dissimilarity)
matrix shading with several reordering steps based on seriation.
Both methods, matrix shading and seriation, have been well-known
for a long time. However, only recent algorithmic improvements
allow to use seriation for larger problems. Furthermore, seriation
is used in a novel stepwise process (within each cluster and
between clusters) which leads to a visualization technique that is
independent of the dimensionality of the data. A big advantage is
that it presents the structure between clusters and the
micro-structure within clusters in one concise plot. This not only
allows for judging cluster quality but also makes mis-specification
of the number of clusters apparent. We give a detailed discussion
of the construction of dissimilarity plots and demonstrate their
usefulness with several examples.
|
| [3] |
Michael Hahsler, Kurt Hornik, and Christian Buchta. Getting things
in order: An introduction to the R package seriation.
Report 58, Research Report Series, Department of Statistics
and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090
Wien, Austria, August 2007. [ bib |
at the publisher ]
Seriation, i.e., finding a linear order for a set
of objects given data and a loss or merit function, is a basic
problem in data analysis. Caused by the problem's combinatorial
nature, it is hard to solve for all but very small sets.
Nevertheless, both exact solution methods and heuristics are
available. In this paper we present the package seriation which
provides the infrastructure for seriation with R. The
infrastructure comprises data structures to represent linear orders
as permutation vectors, a wide array of seriation methods using a
consistent interface, a method to calculate the value of various
loss and merit functions, and several visualization techniques
which build on seriation. To illustrate how easily the package can
be applied for a variety of applications, a comprehensive
collection of examples is presented.
|
| [4] |
Michael Hahsler and Kurt Hornik. TSP - Infrastructure for the
traveling salesperson problem. Report 45, Research Report
Series, Department of Statistics and Mathematics,
Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria,
December 2006. [ bib |
at the publisher ]
The traveling salesperson or salesman problem (TSP)
is a well known and important combinatorial optimization problem.
The goal is to find the shortest tour that visits each city in a
given list exactly once and then returns to the starting city.
Despite this simple problem statement, solving the TSP is difficult
since it belongs to the class of NP-complete problems. The
importance of the TSP arises besides from its theoretical appeal
from the variety of its applications. In addition to vehicle
routing, many other applications, e.g., computer wiring, cutting
wallpaper, job sequencing or several data visualization techniques,
require the solution of a TSP. In this paper we introduce the R
package TSP which provides a basic infrastructure for handling and
solving the traveling salesperson problem. The package features S3
classes for specifying a TSP and its (possibly optimal) solution as
well as several heuristics to find good solutions. In addition, it
provides an interface to Concorde, one of the best exact TSP
solvers currently available.
|
| [5] |
Michael Hahsler and Kurt Hornik. New probabilistic interest
measures for association rules. Report 38, Research Report
Series, Department of Statistics and Mathematics,
Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria,
August 2006. [ bib |
at the publisher ]
Mining association rules is an important technique
for discovering meaningful patterns in transaction databases. Many
different measures of interestingness have been proposed for
association rules. However, these measures fail to take the
probabilistic properties of the mined data into account. In this
paper, we start with presenting a simple probabilistic framework
for transaction data which can be used to simulate transaction data
when no associations are present. We use such data and a real-world
database from a grocery outlet to explore the behavior of
confidence and lift, two popular interest measures used for rule
mining. The results show that confidence is systematically
influenced by the frequency of the items in the left hand side of
rules and that lift performs poorly to filter random noise in
transaction data. Based on the probabilistic framework we develop
two new interest measures, hyper-lift and hyper-confidence, which
can be used to filter or order mined association rules. The new
measures show significant better performance than lift for
applications where spurious rules are problematic.
|
| [6] |
Michael Hahsler, Bettina Grün, and Kurt Hornik. A computational
environment for mining association rules and frequent item sets.
Report 15, Research Report Series, Department of Statistics
and Mathematics, Wirtschaftsuniversität Wien, Augasse 2-6, 1090
Wien, Austria, April 2005. [ bib |
at the publisher ]
Mining frequent itemsets and association rules is a
popular and well researched approach to discovering interesting
relationships between variables in large databases. The R package
arules presented in this paper provides a basic infrastructure for
creating and manipulating input data sets and for analyzing the
resulting itemsets and rules. The package also includes interfaces
to two fast mining algorithms, the popular C implementations of
Apriori and Eclat by Christian Borgelt. These algorithms can be
used to mine frequent itemsets, maximal frequent itemsets, closed
frequent itemsets and association rules.
|
| [7] |
Michael Hahsler, Kurt Hornik, and Thomas Reutterer. Implications of
probabilistic data modeling for rule mining. Report 14,
Research Report Series, Department of Statistics and Mathematics,
Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria, March
2005. [ bib |
at the publisher ]
Mining association rules is an important technique
for discovering meaningful patterns in transaction databases. In
the current literature, the properties of algorithms to mine
associations are discussed in great detail. In this paper we
investigate properties of transaction data sets from a
probabilistic point of view. We present a simple probabilistic
framework for transaction data and its implementation using the R
statistical computing environment. The framework can be used to
simulate transaction data when no associations are present. We use
such data to explore the ability to filter noise of confidence and
lift, two popular interest measures used for rule mining. Based on
the framework we develop the measure hyperlift and we compare this
new measure to lift using simulated data and a real-world grocery
database.
|
| [8] |
Michael Hahsler. A model-based frequency constraint for mining
associations from transaction data. Working Paper 07/2004, Working
Papers on Information Processing and Information Management,
Institut für Informationsverarbeitung und -wirtschaft,
Wirtschaftsuniversität Wien, Augasse 2-6, 1090 Wien, Austria,
November 2004. [ bib |
at the publisher ]
In this paper we develop an alternative to minimum
support which utilizes knowledge of the process which generates
transaction data and allows for highly skewed frequency
distributions. We apply a simple stochastic model (the NB model),
which is known for its usefulness to describe item occurrences in
transaction data, to develop a frequency constraint. This
model-based frequency constraint is used together with a precision
threshold to find individual support thresholds for groups of
associations. We develop the notion of NB-frequent itemsets and
present two mining algorithms which find all NB-frequent itemsets
in a database. In experiments with publicly available transaction
databases we show that the new constraint can provide significant
improvements over a single minimum support threshold and that the
precision threshold is easier to use.
|
| [9] |
Susanne Hafner and Michael Hahsler. Preisvergleich zwischen
Online-Shops und traditionellen Geschäften: Fallstudie
Spieleeinzelhandel. Working Paper 04/2004, Working Papers on
Information Processing and Information Management, Institut für
Informationsverarbeitung und -wirtschaft, Wirtschaftsuniversität
Wien, Augasse 2-6, 1090 Wien, Austria, August 2004. [ bib |
at the publisher ]
Die vorliegende Arbeit beschäftigt sich mit dem
Preisvergleich zwischen Online-Shops und traditionellen Geschäften.
In einigen Studien wurde bisher versucht Preisunterschiede zwischen
online und traditionellen Geschäften nachzuweisen, um die These,
dass Online-Märkte aufgrund höherer Transparenz und niedrigerer
Transaktionskosten effizienter sind, zu bestätigen. Studien
untersuchten bisher Produktgruppen wie CDs und Bücher. In dieser
Studie beschäftigen wir uns mit dem bisher noch nicht untersuchten
Spieleeinzelhandel und konzentrieren uns dabei auf den
österreichischen Markt. Es soll untersucht werden, ob der
österreichische Markt ähnliche oder andere Ergebnisse liefert als
die bisher untersuchten Märkte (hauptsächlich im nordamerikanischer
Raum). Die Untersuchung zeigt folgendes: Die Preise für Spiele sind
im elektronischen Markt um ca. 20 Prozent niedriger als im
traditionellen Markt. Die Preisstreuungen im elektronischen und
traditionellen Markt unterscheiden sich nicht signifikant. Beide
Ergebnisse decken sich mit den Ergebnissen anderer Studien. Damit
ist der österreichische Online-Brettspieleinzelhandel ähnlich
entwickelt wie der Online-Handel in anderen Ländern und für andere
Produktgruppen.
|
| [10] |
Georg Fessler, Michael Hahsler, Michaela Putz, Judith Schwarz, and
Brigitta Wiebogen. Projektbericht ePubWU 2001-2003. Augasse 2-6,
1090 Wien, Wirtschaftsuniversität Wien, January 2004.
[ bib |
.pdf ]
ePubWU ist eine elektronische Plattform für
wissenschaftliche Publikationen der Wirtschaftsuniversität Wien, wo
forschungsbezogene Veröffentlichungen der WU im Volltext über das
WWW zugänglich gemacht werden. ePubWU ist seit Jänner 2002 im
Echtbetrieb und wird als Gemeinschaftsprojekt der
Universitätsbibliothek der Wirtschaftsuniversität Wien und der
Abteilung für Informationswirtschaft betrieben. Dieser Bericht
beinhaltet die Erfahrungen aus der 2-jährigen Pilotphase des
Projekts.
|
| [11] |
Michael Hahsler. A quantitative study of the application of design
patterns in java. Working Paper 01/2003, Working Papers on
Information Processing and Information Management, Institut für
Informationsverarbeitung und -wirtschaft, Wirtschaftsuniversität
Wien, Augasse 2-6, 1090 Wien, Austria, January 2003.
[ bib |
html version |
at the publisher ]
Using design patterns is a widely accepted method
to improve software development. There are many benefits of the
application of patterns claimed in the literature. The most cited
claim is that design patterns can provide a common design
vocabulary and therefore improve greatly communication between
software designers. Most of the claims are supported by experiences
reports of practitioners, but there is a lack of quantitative
research concerning the actual application of design patterns and
about the realization of the claimed benefits. In this paper we
analyze the development process of over 1000 open source software
projects using version control information. We explore this
information to gain an insight into the differences of software
development with and without design patterns. By analyzing these
differences we provide evidence that design patterns are used for
communication and that there is a significant difference between
developers who use design patterns and who do not.
|
| [12] |
Andreas Geyer-Schulz and Michael Hahsler. Software engineering with
analysis patterns. Working Paper 01/2001, Working Papers on
Information Processing and Information Management, Institut für
Informationsverarbeitung und -wirtschaft, Wirtschaftsuniversität
Wien, Augasse 2-6, 1090 Wien, Austria, November 2001.
[ bib |
html version |
at the publisher ]
The purpose of this article is twofold, first to
promote the use of patterns in the analysis phase of the software
life-cycle by proposing an outline template for analysis patterns
that strongly supports the whole analysis process from the
requirements analysis to the analysis model and further on to its
transformation into a flexible design. Second we present, as an
example, a family of analysis patterns that deal with a series of
pressing problems in cooperative work, collaborative information
filtering and sharing, and knowledge management. We present the
step-by-step evolution of the analysis pattern virtual library with
active agents starting with a simple pinboard.
|
| [13] |
Michael Hahsler. Analyse Patterns im Softwareentwicklungsprozeß
mit Beispielen für Informationsmanagement und deren Anwendungen für
die Virtuellen Universität der Wirtschaftsuniversität Wien.
Dissertation, Wirtschaftsuniversität Wien, Augasse 2-6, A 1090
Wien, Österreich, January 2001. [ bib | html
version |
at the publisher | .ps |
.pdf ]
Diese Arbeit beschäftigt sich mit Analyse Patterns,
der Anwendung von Patterns in der Analysephase der
Softwareentwicklung. In der Designphase werden Patterns seit
einigen Jahren eingesetzt, um Expertenwissen und
Wiederverwendbarkeit in den Designprozeß einfließen zu lassen. Es
existiert bereits eine Fülle an solchen Design Patterns. Die
Analysephase ist ein neuer Anwendungsbereich für Patterns, der
bisher in der Literatur noch nicht ausreichend behandelt wurde. In
dieser Arbeit wird die Anwendung des Pattern-Ansatzes in der
Analysephase aufgearbeitet und konkretisiert. Analyse Patterns
unterstützen den gesamten Softwareentwicklungsprozeß und helfen
bekannte Probleme während der Analysephase zu lösen. Dadurch können
Zeit und Kosten bei der Entwicklung neuer Softwaresysteme
eingespart werden. Diese Eigenschaften von Analyse Patterns werden
anhand konkreter Beispiele in einer Case Study nachgewiesen. Diese
Case Study beschreibt den Einsatz von in dieser Arbeit entwickelten
Analyse Pattern für Informationsmanagement anhand des Projekts
Virtuelle Universität der Wirtschaftsuniversität Wien, in dem ein
Internet-Informationsbroker zur Unterstützung von Lehre und
Forschung realisiert wird. Die Erfahrungen aus diesem Projekt
werden untersucht, und die Auswirkungen der Analyse Patterns auf
Wiederverwendung bei der Softwareentwicklung und auf die Akzeptanz
des resultierenden Systems werden präsentiert.
|
| [14] |
Michael Hahsler. Software Patterns: Pinwände. Diplomarbeit,
Wirtschaftsuniversität Wien, Augasse 2-6, A 1090 Wien, Österreich,
November 1997. [ bib | html
version |
.pdf ]
Diese Arbeit beschäftigt sich mit dem
Pattern-Ansatz für die Architektur von Software. Nach einer kurzen
Darstellung des Ansatzes werden das Pinwand-Pattern und seine
Varianten beschrieben. Pinwände werden verwendet, um Informationen
zu sammeln und Interessierten zur Verfügung zu stellen. Sie finden
unter anderem in den folgenden Bereichen Anwendung:
Groupware-Anwendungen, Conferencing Systeme, Diskussionsforen und
Virtuelle Bibliotheken.
|