Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

Gravell, Benjamin; Mohajerin Esfahani, P.; Summers, Tyler H.

doi:10.1109/TAC.2020.3037046

Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

Title

Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient

Author

Gravell, Benjamin (University of Texas at Dallas)
Mohajerin Esfahani, P. (TU Delft Team Bart De Schutter)
Summers, Tyler H. (University of Texas at Dallas)

Date

2021

Abstract

The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients.

Subject

Additive noise
Convergence
Covariance matrices
gradient methods
noise
optimal control
Reinforcement learning
Robustness
Stability analysis
Stochastic processes
stochastic systems
uncertain systems
Uncertainty

To reference this document use:

http://resolver.tudelft.nl/uuid:aecd5a3d-d429-4433-9fe3-13080316fd05

DOI

https://doi.org/10.1109/TAC.2020.3037046

Embargo date

2020-05-10

ISSN

0018-9286

Source

IEEE Transactions on Automatic Control, 66 (11), 5283-5298

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Part of collection

Institutional Repository

Document type

journal article

Rights

Files

PDF

Learning_Optimal_Controll ... adient.pdf

1.18 MB

Close viewer