Print Email Facebook Twitter Multi-Agent Actor-Critic Reinforcement Learning for Cooperative Tasks Title Multi-Agent Actor-Critic Reinforcement Learning for Cooperative Tasks Author Bayiz, Y.E. Contributor BabuĀka, R. (mentor) Faculty Mechanical, Maritime and Materials Engineering Department Delft Center for Systems and Control Date 2014-08-04 Abstract For single-agent problems, Reinforcement Learning (RL) algorithms proved to be useful learning optimal control laws for nonlinear dynamic systems without relying on a mathematical model of the system to be controlled. With their ability to work on continuous action and state spaces, actor-critic RL algorithms are especially advantageous in that manner. So far, actor-critic methods have been applied to several single-agent control problems often with impressive results. A Multi-Agent System (MAS) distributes computational resources and capabilities across a network of interconnected agents. The main advantage of using such an approach is to distribute a globally complex problem to simpler sub-problems, which is a more natural way to address source allocation and team planning. Application of MAS to domains, such as robotics, distributed control and telecommunications, gained popularity in last two decades. From the control point of view, cooperative MAS have a special importance since agents in control problems frequently seek to achieve a joint goal. So far, a significant amount of research has been dedicated to Multi-Agent Reinforcement Learning (MARL) for both cooperative and non-cooperative tasks. Yet, the actor-critic methods in MARL context have not been examined in detailed. The aim of this project is to implement actor-critic RL methods to cooperative MAS to combine the advantages of these two approaches and apply the resulting methods to a real-life control problem as a proof of concept. To achieve such task Model Learning Actor-Critic (MLAC) algorithm is extended to two of the Independent Learners (IL) based methods: optimistic learners and lenient learners. The resulting algorithms are tested on 2-link manipulator problem. The results indicate that, the initial learning speed of the proposed multi-agent MLAC algorithms is similar or faster than the centralized MLAC at the start of learning experiments, and the end performance is acceptable compared to the centralized MLAC. Subject Reinforcement LearningMulti-Agent Systems To reference this document use: http://resolver.tudelft.nl/uuid:033c4238-62e8-4ef7-b43d-b43f3b97856f Part of collection Student theses Document type master thesis Rights (c) 2014 Bayiz, Y.E. Files PDF Efe_mscThesis_v3.pdf 1.08 MB Close viewer /islandora/object/uuid:033c4238-62e8-4ef7-b43d-b43f3b97856f/datastream/OBJ/view