Print Email Facebook Twitter Investigating Inverse Reinforcement Learning from Human Behavior Title Investigating Inverse Reinforcement Learning from Human Behavior: Effect of Demonstrations with Temporal Biases on Learning Rewards using Inverse Reinforcement Learning Author Zatezalo, Mateja (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Cavalcante Siebert, L. (graduation committee) Caregnato Neto, A. (mentor) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-25 Abstract Inverse Reinforcement Learning (IRL) is a machine learning technique used for learning rewards from the behavior of an expert agent. With complex agents, such as humans, the maximized reward may not be easily retrievable. This is because humans are prone to cognitive biases. Cognitive biases are a form of deviation from rationality that affects everyday human decision-making. Time inconsistent decision-making is a type of a temporal cognitive bias where planning of future actions may vary at different points of time. Existing research in this field explores using IRL algorithms in numerous real-life situations. However, few works examine the effects of temporal biases on the recovered reward function. Hence in this research, we propose a methodology to generate synthetic demonstrations that emulate human data with this bias. An existing method, Maximum Entropy IRL (MEIRL) algorithm is used to recover reward functions from expert models containing aforementioned biases and compare them to the performance of unbiased models. The demonstrations are in a form of Markov Decision Process (MDP), implemented in a Grid- World environment. Temporal biases will be implemented within the expert demonstrations as different types of agents that portray a specific behavior. Our findings show that all biases affect reward learning to a considerable extent, with that effect having different magnitudes depending on different comparisons. Subject Inverse Reinforcement LearningCognitive BiasTime InconsistencyMaximum EntropyMarkov Decision ProcessTemporal To reference this document use: http://resolver.tudelft.nl/uuid:f253fb16-3aec-48f0-82dc-321fd501a665 Embargo date 2023-07-05 Part of collection Student theses Document type bachelor thesis Rights © 2023 Mateja Zatezalo Files PDF RP_Research_Paper_Mateja_ ... tezalo.pdf 1.05 MB Close viewer /islandora/object/uuid:f253fb16-3aec-48f0-82dc-321fd501a665/datastream/OBJ/view