Reinforcement Learning in Block Markov Chains

Lagerweij, Pascal

Reinforcement Learning in Block Markov Chains

Title

Reinforcement Learning in Block Markov Chains

Author

Lagerweij, Pascal (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Sanders, Jaron (mentor)
Van Mieghem, Piet (graduation committee)
Spaan, Matthijs (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Electrical Engineering

Date

2019-03-29

Abstract

Nowadays, reinforcement learning algorithms on Markov decision processes (MDPs) face computational issues when the state space is large. To reduce this state space of a MDP several state aggregation, or clustering, methodologies have been applied. Recently, a new clustering algorithm has been proposed that is able to cluster states from a single block Markov chain. A block Markov chain is a Markov chain with blocks in its transition matrix that correspond to clusters. Our aim was to investigate the possible combination of state aggregation in reinforcement learning on MDPs with clustering of states on a block Markov chain. First, we investigated the clustering algorithm and its properties to see its performance with different parameters and trajectory length. We compared the statistical properties of a pure Markov chain and the mixed Markov chain generated by a MDP. Afterwards, we verified the performance of the clustering algorithm on this mixed Markov chain. We proposed the BMC-MDP model that is able to model cluster based MDPs. We proposed C-PSRL, an algorithm, that consists of a single clustering step, on this newly introduced model. We compared its performance with a naïve approach and concluded that this new combined approach of clustering and MDP solving on a reduced space is a viable approach that reduces the computational complexity significantly. This research opened up the possibilities of more complex algorithms with, for example, multiple clustering steps. Moreover, if we can extend this clustering algorithm to clustering based on a state and action trajectory, this may results in an increased clustering performance and thereby enhance the performance of this general approach of optimizing on a cluster based MDP.

Subject

Markov Decision Processes
Block Markov Chain
Clustering algorithms
Optimization Algorithm

To reference this document use:

http://resolver.tudelft.nl/uuid:0f7d7ff1-d32e-4a9e-b202-73c64809873b

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

MScThesis_PLagerweij.pdf

3.64 MB

Close viewer