Learning to walk using minimum prior knowledge: And a small hexapod robot

Future robotic systems are expected to be so complex, that the required information -- or prior knowledge -- to program them is not always available and has thus to be discovered in an alternative way. This can be done using Reinforcement Learning (RL) algorithms, that learn from experience. This way of programming can also be beneficial for current robotic systems. For example robots where small changes in the hardware due to: design evolutions, repairs or production tolerances require updates to the control algorithms. A algorithm that learns to complete a task without prior knowledge could do this regardless of the changes and thus make frequent, time consuming, reprogramming and/or tuning unnecessary. The robot used in this thesis is a small ant like hexapod robot that is still in development. It is developed to take advantage of the high volume to power ratio, at low weights of Shape Memory Alloy actuators and to show collaborative behaviour. The task is to find a controller that lets this robot walk as fast as possible from A to B while using minimum prior knowledge. Three approaches to achieve this are proposed in this thesis, these are in descending amount of required prior knowledge: Min-Max, Cost Function Search and the Flat approach. All use a hierarchical division, where low level actions -- like forward or left -- and the combination of these actions, to reach the goal are learned separately. The approaches differ in the way they learn the low level actions. Min-Max uses fully defined sub-goals that are assumed to be sufficient to solve the problem. The Cost Function Search tries to find relevant sub-goals to solve the problem and the Flat approach tries to find the best low level actions by evaluating the performance of the found controller. In general it is expected that if more prior knowledge is used the algorithm learns faster and/or finds better results. The algorithms are compared using a simulator, where surprisingly the Cost Function Search method is found to result in controllers performing up to factor two better than the Min-Max approach and a factor four better that the flat approach. This indicates that care should be taken how prior knowledge is used as it not always leads to better results. Test on the robot show that the influence of repairs is significant and that the robot is able to learn to walk in straight line. It is believed that due to the back of the robot dragging on the floor it is unable to turn sharply and thus reach goals situated next to the robot. This in combination with frequent breakdowns made it impossible to test the complete algorithms on this version of the robot. Further test using an improved version of the robot need to be done to obtain conclusive results.

Subject

learning
walking
robot
hexapod
reinforcement
prior knowledge

To reference this document use:

http://resolver.tudelft.nl/uuid:fafd5dd4-653c-4b39-965f-c48707a2c1ca

Embargo date

2015-06-01

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

Thesis_Mart_Moerdijk.pdf

11.1 MB

Close viewer