Print Email Facebook Twitter Memory-based Modeling and Prioritized Sweeping in Reinforcement Learning Title Memory-based Modeling and Prioritized Sweeping in Reinforcement Learning Author Ramakers, M.J.G. Contributor Babuska, R. (mentor) Lopes, G. (mentor) Faculty Mechanical, Maritime and Materials Engineering Department Delft Center for Systems and Control Date 2010-08-20 Abstract Reinforcement Learning (RL) is a popular method in machine learning. In RL, an agent learns a policy by observing state-transitions and receiving feedback in the form of a reward signal. The learning problem can be solved by interaction with the system only, without prior knowledge of that system. However, real-time learning from interaction with the system only, leads to slow learning as every time-interval can only be used to observe a single state-transition. Learning can by accelerated by using a Dyna-style algorithm. This approach learns from interaction with the real system and a model of that system simultaneously. Our research investigates two aspects of this method: Building a model during learning and implementing this model into the learning algorithm. We use a memory-based modeling method called Local Linear Regression (LLR) to build a state-transition model during the learning process. It is expected that the quality of the model increases as the number of observed state-transitions increase. To assess the quality of the modeled state-transitions we introduce prediction intervals. We show that LLR is able to model various systems, including a complex humanoid robot. The LLR model was added to the learning algorithm to generate more state-transitions for the agent to learn from. We show that an increasing number of experiences leads to faster learning. We introduce Prioritized Sweeping (PS) and Look Ahead (LA) Dyna as possibilities to use the model more efficiently. We show how prediction intervals can be used to increase the performance of the various algorithms. The learning algorithms were compared using an inverted pendulum simulation, which had to learn a swing-up control task. To reference this document use: http://resolver.tudelft.nl/uuid:043ef2af-ddc3-49bd-9590-d7fd8927c169 Part of collection Student theses Document type master thesis Rights (c) 2010 Ramakers, M.J.G. Files PDF Ramakers-openbaar.pdf 2.32 MB Close viewer /islandora/object/uuid:043ef2af-ddc3-49bd-9590-d7fd8927c169/datastream/OBJ/view