Print Email Facebook Twitter End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control Title End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control: A method for model-independent control through Proximal Policy Optimization with learned Options Author Ge, Zhouxin (TU Delft Aerospace Engineering) Contributor van Kampen, E. (mentor) de Croon, G.C.H.E. (graduation committee) Mitici, M.A. (graduation committee) Degree granting institution Delft University of Technology Programme Aerospace Engineering Date 2021-08-27 Abstract Aircraft with disruptive designs have no high-fidelity and accurate flight models. At the same time, developing models for stochastic phenomena for traditional aircraft configurations are costly, and classical control methods cannot operate beyond the predefined operation points or adapt to unexpected changes to the aircraft. The Proximal Policy Option Critic (PPOC) is an end-to-end hierarchical reinforcement learning method that alleviates the need for a high-fidelity flight model and allows for adaptive flight control. This research contributes to the development and analysis of online adaptive flight control by comparing PPOC against a non-hierarchical method called Proximal Policy Optimization (PPO) and PPOC with a single Option (PPOC-1). The methods are tested on an extendable mass-spring-damper system and aircraft model. Subsequently, the agents are evaluated by their sample efficiency, reference tracking capability and adaptivity. The results show, unexpectedly, that PPO and PPOC-1 are more sample efficient than PPOC. Furthermore, both PPOC agents are able to successfully track the height profile, though the agents learn a policy that results in noisy actuator inputs. Finally, PPOC with multiple learned Options has the best adaptivity, as it is able to adapt to structural failure of the horizontal tailplane, sign change of pitch damping, and generalize to different aircraft. Subject Reinforcement LearningHierarchical Reinforcement LearningFlight Control SystemsPolicy GradientProximal Policy OptimizationOption-Critic architecture To reference this document use: http://resolver.tudelft.nl/uuid:d3baec43-71d4-4f7f-ae27-2fdfdae7fea3 Part of collection Student theses Document type master thesis Rights © 2021 Zhouxin Ge Files PDF master_thesis_Zhouxin_Ge_2021.pdf 46.56 MB Close viewer /islandora/object/uuid:d3baec43-71d4-4f7f-ae27-2fdfdae7fea3/datastream/OBJ/view