Publication Date
8-2019
Date of Final Oral Examination (Defense)
5-31-2019
Type of Culminating Activity
Thesis
Degree Title
Master of Science in Computer Science
Department
Computer Science
Supervisory Committee Chair
Michael D. Ekstrand, Ph.D
Supervisory Committee Member
Maria Soledad Pera, Ph.D.
Supervisory Committee Member
Hoda Mehrpouyan, Ph.D.
Abstract
Recommender systems are software applications deployed on the Internet to help people find useful items (e.g. movies, books, music, products) by providing recommendation lists. Before deploying recommender systems online, researchers and practitioners generally conduct offline evaluations to compare the accuracy of top- recommendation lists among candidate algorithms using users’ history consumption data. These offline evaluations typically use metrics and methodologies borrowed from machine learning and information retrieval and have several well-known biases that affect the validity of their results, including popularity bias and other biases arising from the missing-not-at-random nature of the data used. The existence of these biases is well-established, but their extent and impact are not as well-studied. In this work, we employ controlled simulations with varying assumptions about the distribution and structure of users’ preferences and the rating process to estimate the distributions of the errors in recommender experiment outcomes as a result of these biases. We calibrate our simulated datasets to mimic key statistics of existing public datasets in different domains and use the simulated data to assess the error in estimating true accuracy with observable rating data. We find inconsistency of the evaluation metric scores and the order in which they rank recommendation algorithms in the synthetic true preference and the observation dataset. Simulation results show that offline evaluations are sometimes fooled by intrinsic effects in the data generation process into mistakenly ranking algorithms. The extent of this effect is sensitive to assumptions.
DOI
10.18122/td/1581/boisestate
Recommended Citation
Tian, Mucun, "Estimating Error and Bias of Offline Recommender System Evaluation Results" (2019). Boise State University Theses and Dissertations. 1581.
10.18122/td/1581/boisestate