End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control: A method for model-independent control through Proximal Policy Optimization with learned Options

Ge, Zhouxin

End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control

Title

End-to-End Hierarchical Reinforcement Learning for Adaptive Flight Control: A method for model-independent control through Proximal Policy Optimization with learned Options

Author

Ge, Zhouxin (TU Delft Aerospace Engineering)

Contributor

van Kampen, E. (mentor)
de Croon, G.C.H.E. (graduation committee)
Mitici, M.A. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Aerospace Engineering

Date

2021-08-27

Abstract

Aircraft with disruptive designs have no high-fidelity and accurate flight models. At the same time, developing models for stochastic phenomena for traditional aircraft configurations are costly, and classical control methods cannot operate beyond the predefined operation points or adapt to unexpected changes to the aircraft. The Proximal Policy Option Critic (PPOC) is an end-to-end hierarchical reinforcement learning method that alleviates the need for a high-fidelity flight model and allows for adaptive flight control. This research contributes to the development and analysis of online adaptive flight control by comparing PPOC against a non-hierarchical method called Proximal Policy Optimization (PPO) and PPOC with a single Option (PPOC-1). The methods are tested on an extendable mass-spring-damper system and aircraft model. Subsequently, the agents are evaluated by their sample efficiency, reference tracking capability and adaptivity. The results show, unexpectedly, that PPO and PPOC-1 are more sample efficient than PPOC. Furthermore, both PPOC agents are able to successfully track the height profile, though the agents learn a policy that results in noisy actuator inputs. Finally, PPOC with multiple learned Options has the best adaptivity, as it is able to adapt to structural failure of the horizontal tailplane, sign change of pitch damping, and generalize to different aircraft.

Subject

Reinforcement Learning
Hierarchical Reinforcement Learning
Flight Control Systems
Policy Gradient
Proximal Policy Optimization
Option-Critic architecture

To reference this document use:

http://resolver.tudelft.nl/uuid:d3baec43-71d4-4f7f-ae27-2fdfdae7fea3

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

master_thesis_Zhouxin_Ge_2021.pdf

46.56 MB

Close viewer