Print Email Facebook Twitter Comparative Analysis of Techniques for Data Minimization for Recommender System algorithms Title Comparative Analysis of Techniques for Data Minimization for Recommender System algorithms Author Krishnaraj, Manoj (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Larson, M.A. (mentor) Degree granting institution Delft University of Technology Programme Computer Science | Data Science and Technology Date 2019-11-25 Abstract Recommender systems (RS) often use a large amount of data for a marginal gain in performance. This thesis investigates the data minimization in Recommender Systems, which is not well studied in the literature. This thesis extends the data minimization principles advocated in GDPR and studies its effects on recommender systems. Minimizing data not only reduces storage and transmission requirements but also has the potential to improve privacy and increase training and prediction speeds. This thesis investigates the effects of reducing the amount of data used to model a recommender system. It evaluates the accuracy of the Biased Matrix Factorization (BMF) algorithm by varying the training data on the MovieLens 10 million ratings (ML-10M) dataset. In this thesis, four data minimization techniques were used. We reproduced one pervious work and proposed three new data minimization techniques. In the first technique, we confirmed previous work concerning training data analysis, where the data outside the selected temporal window were dropped. The second data minimization technique, user profile truncation, retained the recent N ratings for each of the users while truncating the historical ratings. The third technique improved the user profile truncation by selectively truncating a percentage of user's historical ratings. In the fourth technique, a long user profile was split into smaller pseudo-user profiles. Analysis of the results is conducted. The most interesting results come from the third data minimization technique. Here, we show that truncating a percentage of the least recently active long user-profiles does not damage the performance and may slightly help. 60% of the long users can truncate their profiles to 20 ratings with minimal impact on the performance. Based on the results, we conclude that a substantial amount of data can be dropped without a large impact on performance. The results hold for the ML-10M dataset. It should hold for other datasets. The privacy implications of data minimization warrant future work. The proposed techniques serve as a guide for future research in data minimization of recommender systems. Subject Recommender SystemsCollaborative FilteringData Reduction To reference this document use: http://resolver.tudelft.nl/uuid:35e19f20-6161-4755-b9a5-7714af15a840 Part of collection Student theses Document type master thesis Rights © 2019 Manoj Krishnaraj Files PDF Manoj_Krishnaraj_thesis_final.pdf 1.25 MB Close viewer /islandora/object/uuid:35e19f20-6161-4755-b9a5-7714af15a840/datastream/OBJ/view