Print Email Facebook Twitter Reinforcement learning with domain-specific relational inductive biases Title Reinforcement learning with domain-specific relational inductive biases: Using Graph Neural Networks and domain knowledge Author Vester, Erik (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Spaan, M.T.J. (mentor) Böhmer, J.W. (graduation committee) Cavalcante Siebert, L. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science Date 2021-10-25 Abstract Reinforcement Learning (RL) has been used to successfully train agents for many tasks, but generalizing to a different task - or even unseen examples of the same task - remains difficult. In this thesis, Deep Reinforcement Learning (DRL) is combined with Graph Neural Networks (GNNs) and domain knowledge, with the aim of improving the generalization capabilities of RL-agents. In classical DRL setups, Convolutional Neural Networks (CNNs) and Multilayer Perceptrons (MLPs) are often applied as neural network architectures for an agent’s policy and/or value network. In this thesis, however, GNNs are used to represent the policy and value network of an agent, which allows for the application of relational inductive biases that are more domain-specific than those of MLPs and CNNs. Observations received by the agent from a simple navigation task - which requires some relational reasoning - are encoded as graphs, consisting of entities and relations between them, which are based on domain knowledge. These graphs are then used as structured input for the GNN-based architecture of the agent. This approach is inspired by human relational reasoning, which is argued to be an important factor in human generalization capabilities. Several GNN-based architectures are proposed and compared, from which two main architectures are distilled: R-GCN-domain and R-GCN-GAN. In the R-GCN-domain architecture, the graph encoding of observations is based on domain knowledge, whereas in R-GCN-GAN we aim to combine the relational encoding of a CNN with additional, learned relations, allowing for an end-to-end solution that does not require domain knowledge. Sample efficiency and both in- and out-of-distribution generalization performance of our architectures are tested on a new grid world environment called ’Key-Corridors’. We find that adding domain-specific relational inductive biases with the R-GCNdomain architecture significantly improves sample efficiency and out-of-distribution generalization, when compared to MLPs and CNNs. However, we did not succeed in learning these domain-specific relational inductive biases with R-GCN-GAN, which does not manage to significantly outperform a CNN. Overall, the results indicate that applying relational reasoning in RL - through the use of GNNs and domain knowledge - can be an important tool for improving sample efficiency and generalization performance. Subject Reinforcement Learning (RL)Graph Neural NetworksDomain KnowledgeGeneralization To reference this document use: http://resolver.tudelft.nl/uuid:b5bcf5e9-53d4-4a80-bb4d-92c04df804f3 Part of collection Student theses Document type master thesis Rights © 2021 Erik Vester Files PDF Thesis_Erik_Vester_Final.pdf 3.68 MB Close viewer /islandora/object/uuid:b5bcf5e9-53d4-4a80-bb4d-92c04df804f3/datastream/OBJ/view