Print Email Facebook Twitter Sample-Efficient Reinforcement Learning for Walking Robots Title Sample-Efficient Reinforcement Learning for Walking Robots Author Vennemann, B. Contributor Jonker, P.P. (mentor) Caarls, W. (mentor) Faculty Mechanical, Maritime and Materials Engineering Department BioMechanical Engineering Programme BMD Date 2013-09-16 Abstract By learning to walk, robots should be able to traverse many types of terrains. An important learning paradigm for robots is Reinforcement Learning (RL). Learning to walk through RL with real robots remains a difficult challenge however. To meet this challenge, a robot called LEO has been developed in the Delft BioRobotics Lab. LEO is a 2D bipedal robot built specifically to learn to walk through RL. Unfortunately, when learning with Sarsa(?) the robot breaks down before it has learned a successful gait. A possible solution for this is to minimize the number of interactions with the environment (samples) needed to learn a satisfactory policy. A promising technique to reduce sample complexity in RL is to re-use samples instead of discarding them after one update. One of the contribution of this thesis is providing a theoretical comparison of sample re-use techniques in the form of a novel unified framework. With the help of the framework, Experience Replay (ER) is selected to be used for evaluation and analysis of sample re-use on walking robots. Empirical comparison of ER with Sarsa(?) is done with three benchmark problems: simulations of the inverted pendulum, the simplest walker, and LEO. On initial experiments we observed slow and unpredictable learning with ER on the walking problems. We show that this is mainly caused by two issues. The first issue involves failing back-propagation due to optimism in the face of uncertainty. To deal with this, we develop a new algorithm called ER-? which makes the attitude towards uncertainty a function of the state instead initialization of the value function. The second issue is concerning local maxima emerging in the value function due to self effecting states. For this, we propose a residual gradient variant of ER. We find that the new algorithms perform well on the walking problems. In particular, (residual) ER-? gives very encouraging results when compared with Sarsa(?) and vanilla-ER. From the results, we can see that the attitude towards uncertainty during replay is of particular importance for walking problems. We conclude that while ER is a promising technique, it gives no guarantee on good learning performance. We showed that by exploiting the available data and knowledge of the representation, the result of ER can significantly be increased. Subject reinforcement learingwalking robots To reference this document use: http://resolver.tudelft.nl/uuid:dfe99bde-0a42-487b-8142-f1215744db31 Embargo date 2013-09-27 Part of collection Student theses Document type master thesis Rights (c) 2013 Vennemann, B. Files PDF Thesis_Bas_Vennemann.pdf 3.13 MB Close viewer /islandora/object/uuid:dfe99bde-0a42-487b-8142-f1215744db31/datastream/OBJ/view