Reinforcement learning with domain-­specific relational inductive biases: Using Graph Neural Networks and domain knowledge

Vester, Erik

Reinforcement learning with domain-specific relational inductive biases

Title

Reinforcement learning with domain-specific relational inductive biases: Using Graph Neural Networks and domain knowledge

Author

Vester, Erik (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Spaan, M.T.J. (mentor)
Böhmer, J.W. (graduation committee)
Cavalcante Siebert, L. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science

Date

2021-10-25

Abstract

Reinforcement Learning (RL) has been used to successfully train agents for many tasks, but generalizing to a different task - or even unseen examples of the same task - remains difficult. In this thesis, Deep Reinforcement Learning (DRL) is combined with Graph Neural Networks (GNNs) and domain knowledge, with the aim of improving the generalization capabilities of RL-agents.
In classical DRL setups, Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs) are often applied as neural network architectures for an agent’s policy and/or value network. In this thesis, however, GNNs are used to represent the policy and value network of an agent, which allows for the application of relational inductive biases that are more domain-specific than those of MLPs and CNNs. Observations received by the agent from a simple navigation task - which requires some relational reasoning - are encoded as graphs, consisting of entities and relations between them, which are based on domain knowledge. These graphs are then used as structured input for the GNN-based architecture of the agent. This approach is inspired by human relational reasoning, which is argued to be an important factor in human generalization capabilities.
Several GNN-based architectures are proposed and compared, from which two main architectures are distilled: R-GCN-domain and R-GCN-GAN. In the R-GCN-domain architecture, the graph encoding of observations is based on domain knowledge, whereas in R-GCN-GAN we aim to combine the relational encoding of a CNN with additional, learned relations, allowing for an end-to-end solution that does not require domain knowledge. Sample efficiency and both in- and out-of-distribution generalization performance of our architectures are tested on a new grid world environment called ’Key-Corridors’. We find that adding domain-specific relational inductive biases with the R-GCNdomain architecture significantly improves sample efficiency and out-of-distribution generalization, when compared to MLPs and CNNs. However, we did not succeed in learning these domain-specific relational inductive biases with R-GCN-GAN, which does not manage to significantly outperform a CNN. Overall, the results indicate that applying relational reasoning in RL - through the use of GNNs and domain knowledge - can be an important tool for improving sample efficiency and generalization performance.

Subject

Reinforcement Learning (RL)
Graph Neural Networks
Domain Knowledge
Generalization

To reference this document use:

http://resolver.tudelft.nl/uuid:b5bcf5e9-53d4-4a80-bb4d-92c04df804f3

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Thesis_Erik_Vester_Final.pdf

3.68 MB

Close viewer

Reinforcement learning with domain-­specific relational inductive biases

Reinforcement learning with domain-specific relational inductive biases