Print Email Facebook Twitter Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors Title Flexible Heuristic Dynamic Programming for Reinforcement Learning in Quadrotors Author Helmer, Alexander de Visser, C.C. (TU Delft Control & Simulation) van Kampen, E. (TU Delft Control & Simulation) Date 2018 Abstract Reinforcement learning is a paradigm for learning decision-making tasks from interaction with the environment. Function approximators solve a part of the curse of dimensionality when learning in high-dimensional state and/or action spaces. It can be a time-consuming process to learn a good policy in a high dimensional state space directly. A method is proposed for initially limiting the state and action space to a subset of the variables of the Markov Decision Process. Therefore, the agent will initially learn a coarse policy. It is then gradually exposed to new state and action variables to increase the dimensionality of the state and action space to the ones posed by the control problem. A local function approximator has been developed that supports the expansion of state and action space. The concept is applied to the Model-Learning Actor-Critic, a model-based Heuristic Dy- namic Programming algorithm. Its functioning is demonstrated by training a reinforcement learning agent for 2-dimensional hover control of a Parrot AR 2.0 quad-rotor. It is shown that the agent is able to learn faster and to achieve a better policy when being exposed to the action and state variables gradually than all at once from the start To reference this document use: http://resolver.tudelft.nl/uuid:6e6d7d4d-3a5f-4228-ac91-ae31875bde8a DOI https://doi.org/10.2514/6.2018-2134 Publisher American Institute of Aeronautics and Astronautics Inc. (AIAA) Embargo date 2019-01-31 ISBN 978-1-62410-527-2 Source Proceedings of the 2018 AIAA Information Systems-AIAA Infotech @ Aerospace Event AIAA Information Systems-AIAA Infotech at Aerospace, 2018, 2018-01-08 → 2018-01-12, Kissimmee, United States Part of collection Institutional Repository Document type conference paper Rights © 2018 Alexander Helmer, C.C. de Visser, E. van Kampen Files PDF Flexible_Heuristic_Dynami ... Qua....pdf 1.16 MB Close viewer /islandora/object/uuid:6e6d7d4d-3a5f-4228-ac91-ae31875bde8a/datastream/OBJ/view