Multi-Agent Actor-Critic Reinforcement Learning for Cooperative Tasks

Bayiz, Y.E.

Multi-Agent Actor-Critic Reinforcement Learning for Cooperative Tasks

Title

Multi-Agent Actor-Critic Reinforcement Learning for Cooperative Tasks

Author

Bayiz, Y.E.

Contributor

Babuka, R. (mentor)

Faculty

Mechanical, Maritime and Materials Engineering

Department

Delft Center for Systems and Control

Date

2014-08-04

Abstract

For single-agent problems, Reinforcement Learning (RL) algorithms proved to be useful learning optimal control laws for nonlinear dynamic systems without relying on a mathematical model of the system to be controlled. With their ability to work on continuous action and state spaces, actor-critic RL algorithms are especially advantageous in that manner. So far, actor-critic methods have been applied to several single-agent control problems often with impressive results. A Multi-Agent System (MAS) distributes computational resources and capabilities across a network of interconnected agents. The main advantage of using such an approach is to distribute a globally complex problem to simpler sub-problems, which is a more natural way to address source allocation and team planning. Application of MAS to domains, such as robotics, distributed control and telecommunications, gained popularity in last two decades. From the control point of view, cooperative MAS have a special importance since agents in control problems frequently seek to achieve a joint goal. So far, a significant amount of research has been dedicated to Multi-Agent Reinforcement Learning (MARL) for both cooperative and non-cooperative tasks. Yet, the actor-critic methods in MARL context have not been examined in detailed. The aim of this project is to implement actor-critic RL methods to cooperative MAS to combine the advantages of these two approaches and apply the resulting methods to a real-life control problem as a proof of concept. To achieve such task Model Learning Actor-Critic (MLAC) algorithm is extended to two of the Independent Learners (IL) based methods: optimistic learners and lenient learners. The resulting algorithms are tested on 2-link manipulator problem. The results indicate that, the initial learning speed of the proposed multi-agent MLAC algorithms is similar or faster than the centralized MLAC at the start of learning experiments, and the end performance is acceptable compared to the centralized MLAC.

Subject

Reinforcement Learning
Multi-Agent Systems

To reference this document use:

http://resolver.tudelft.nl/uuid:033c4238-62e8-4ef7-b43d-b43f3b97856f

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Efe_mscThesis_v3.pdf

1.08 MB

Close viewer