Sample-Efficient Reinforcement Learning for Walking Robots

By learning to walk, robots should be able to traverse many types of terrains. An important learning paradigm for robots is Reinforcement Learning (RL). Learning to walk through RL with real robots remains a difficult challenge however. To meet this challenge, a robot called LEO has been developed in the Delft BioRobotics Lab. LEO is a 2D bipedal robot built specifically to learn to walk through RL. Unfortunately, when learning with Sarsa(?) the robot breaks down before it has learned a successful gait. A possible solution for this is to minimize the number of interactions with the environment (samples) needed to learn a satisfactory policy. A promising technique to reduce sample complexity in RL is to re-use samples instead of discarding them after one update. One of the contribution of this thesis is providing a theoretical comparison of sample re-use techniques in the form of a novel unified framework. With the help of the framework, Experience Replay (ER) is selected to be used for evaluation and analysis of sample re-use on walking robots. Empirical comparison of ER with Sarsa(?) is done with three benchmark problems: simulations of the inverted pendulum, the simplest walker, and LEO. On initial experiments we observed slow and unpredictable learning with ER on the walking problems. We show that this is mainly caused by two issues. The first issue involves failing back-propagation due to optimism in the face of uncertainty. To deal with this, we develop a new algorithm called ER-? which makes the attitude towards uncertainty a function of the state instead initialization of the value function. The second issue is concerning local maxima emerging in the value function due to self effecting states. For this, we propose a residual gradient variant of ER. We find that the new algorithms perform well on the walking problems. In particular, (residual) ER-? gives very encouraging results when compared with Sarsa(?) and vanilla-ER. From the results, we can see that the attitude towards uncertainty during replay is of particular importance for walking problems. We conclude that while ER is a promising technique, it gives no guarantee on good learning performance. We showed that by exploiting the available data and knowledge of the representation, the result of ER can significantly be increased.

Subject

reinforcement learing
walking robots

To reference this document use:

http://resolver.tudelft.nl/uuid:dfe99bde-0a42-487b-8142-f1215744db31

Embargo date

2013-09-27

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Thesis_Bas_Vennemann.pdf

3.13 MB

Close viewer