Print Email Facebook Twitter Deep Reinforcement Learning for Bipedal Robots Title Deep Reinforcement Learning for Bipedal Robots Author Rastogi, Divyam (TU Delft Mechanical, Maritime and Materials Engineering) Contributor Kober, Jens (mentor) Koryakovskiy, Ivan (mentor) Wisse, Martijn (graduation committee) van Kampen, Erik-jan (graduation committee) Bharatheesha, Mukunda (graduation committee) Degree granting institution Delft University of Technology Date 2017-08 Abstract Reinforcement Learning (RL) is a general purpose framework for designing controllers for non-linear systems. It tries to learn a controller (policy) by trial and error. This makes it highly suitable for systems which are difficult to control using conventional control methodologies, such as walking robots. Traditionally, RL has only been applicable to problems with low dimensional state space, but use of Deep Neural Networks as function approximators with RL have shown impressive results for control of high dimensional systems. This approach is known as Deep Reinforcement Learning (DRL).A major drawback of DRL algorithms is that they generally require a large number of samples and training time, which becomes a challenge when working with real robots. Therefore, most applications of DRL methods have been limited to simulation platforms. Moreover, due to model uncertainties like friction and model inaccuracies in mass, lengths etc., a policy that is trained on a simulation model might not work directly on a real robot.The objective of the thesis is to apply a DRL algorithm, the Deep Deterministic Policy Gradient (DDPG), for a 2D bipedal robot. The bipedal robot used for analysis is developed by the Delft BioRobotics Lab for Reinforcement Learning purposes and is known as LEO. The DDPG method is applied on a simulated model of LEO and compared with traditional RL methods like SARSA. To overcome the problem of high sample requirement when learning a policy on the real system, an iterative approach is developed in this thesis which learns a difference model and then learns a new policy with this difference model. The difference model captures the mismatch between the real robot and the simulated model.The approach is tested for two experimental setups in simulation, an inverted pendulum problem and LEO. The difference model is able to learn a policy which is almost optimal compared to training on a real system from scratch, with only \SI{10}{\percent} of the samples required. Subject Reinforcement LearningBipedal WalkingDeep neural networksModel learning To reference this document use: http://resolver.tudelft.nl/uuid:0fac495f-f87a-4a61-a80f-5f901323379a Part of collection Student theses Document type master thesis Rights © 2017 Divyam Rastogi Files PDF Final_MSc_Report_Divyam_Rastogi.pdf 3.29 MB Close viewer /islandora/object/uuid:0fac495f-f87a-4a61-a80f-5f901323379a/datastream/OBJ/view