Print Email Facebook Twitter Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient Title Learning Optimal Controllers for Linear Systems with Multiplicative Noise via Policy Gradient Author Gravell, Benjamin (University of Texas at Dallas) Mohajerin Esfahani, P. (TU Delft Team Bart De Schutter) Summers, Tyler H. (University of Texas at Dallas) Date 2021 Abstract The linear quadratic regulator (LQR) problem has reemerged as an important theoretical benchmark for reinforcement learning-based control of complex dynamical systems with continuous state and action spaces. In contrast with nearly all recent work in this area, we consider multiplicative noise models, which are increasingly relevant because they explicitly incorporate inherent uncertainty and variation in the system dynamics and thereby improve robustness properties of the controller. Robustness is a critical and poorly understood issue in reinforcement learning; existing methods which do not account for uncertainty can converge to fragile policies or fail to converge at all. Additionally, intentional injection of multiplicative noise into learning algorithms can enhance robustness of policies, as observed in ad hoc work on domain randomization. Although policy gradient algorithms require optimization of a non-convex cost function, we show that the multiplicative noise LQR cost has a special property called gradient domination, which is exploited to prove global convergence of policy gradient algorithms to the globally optimum control policy with polynomial dependence on problem parameters. Results are provided both in the model-known and model-unknown settings where samples of system trajectories are used to estimate policy gradients. Subject Additive noiseConvergenceCovariance matricesgradient methodsnoiseoptimal controlReinforcement learningRobustnessStability analysisStochastic processesstochastic systemsuncertain systemsUncertainty To reference this document use: http://resolver.tudelft.nl/uuid:aecd5a3d-d429-4433-9fe3-13080316fd05 DOI https://doi.org/10.1109/TAC.2020.3037046 Embargo date 2020-05-10 ISSN 0018-9286 Source IEEE Transactions on Automatic Control, 66 (11), 5283-5298 Bibliographical note Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public. Part of collection Institutional Repository Document type journal article Rights © 2021 Benjamin Gravell, P. Mohajerin Esfahani, Tyler H. Summers Files PDF Learning_Optimal_Controll ... adient.pdf 1.18 MB Close viewer /islandora/object/uuid:aecd5a3d-d429-4433-9fe3-13080316fd05/datastream/OBJ/view