Print Email Facebook Twitter Reinforcement Learning in Block Markov Chains Title Reinforcement Learning in Block Markov Chains Author Lagerweij, Pascal (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Sanders, Jaron (mentor) Van Mieghem, Piet (graduation committee) Spaan, Matthijs (graduation committee) Degree granting institution Delft University of Technology Programme Electrical Engineering Date 2019-03-29 Abstract Nowadays, reinforcement learning algorithms on Markov decision processes (MDPs) face computational issues when the state space is large. To reduce this state space of a MDP several state aggregation, or clustering, methodologies have been applied. Recently, a new clustering algorithm has been proposed that is able to cluster states from a single block Markov chain. A block Markov chain is a Markov chain with blocks in its transition matrix that correspond to clusters. Our aim was to investigate the possible combination of state aggregation in reinforcement learning on MDPs with clustering of states on a block Markov chain. First, we investigated the clustering algorithm and its properties to see its performance with different parameters and trajectory length. We compared the statistical properties of a pure Markov chain and the mixed Markov chain generated by a MDP. Afterwards, we verified the performance of the clustering algorithm on this mixed Markov chain. We proposed the BMC-MDP model that is able to model cluster based MDPs. We proposed C-PSRL, an algorithm, that consists of a single clustering step, on this newly introduced model. We compared its performance with a naïve approach and concluded that this new combined approach of clustering and MDP solving on a reduced space is a viable approach that reduces the computational complexity significantly. This research opened up the possibilities of more complex algorithms with, for example, multiple clustering steps. Moreover, if we can extend this clustering algorithm to clustering based on a state and action trajectory, this may results in an increased clustering performance and thereby enhance the performance of this general approach of optimizing on a cluster based MDP. Subject Markov Decision ProcessesBlock Markov ChainClustering algorithmsOptimization Algorithm To reference this document use: http://resolver.tudelft.nl/uuid:0f7d7ff1-d32e-4a9e-b202-73c64809873b Part of collection Student theses Document type master thesis Rights © 2019 Pascal Lagerweij Files PDF MScThesis_PLagerweij.pdf 3.64 MB Close viewer /islandora/object/uuid:0f7d7ff1-d32e-4a9e-b202-73c64809873b/datastream/OBJ/view