Real-Time Optimistic Planning for the Control of Nonlinear Systems

Optimistic Planning is a model-based online planning algorithm that guarantees near-optimal actions for the control arbitrarily nonlinear systems. Planning algorithms aim to find optimal actions by starting from the current state and developing a tree representation of sequences of actions and resulting states, using a model to simulate state-transitions. Typically, online planning algorithms return a sequence of actions, apply the first action (or several actions at the start of the sequence) and start planning again from the new state, resembling the receding horizon principle as seen in Model Predictive Control. Several optimistic planning algorithms exist, of which in this work only Optimistic Planning for Deterministic systems (OPD) is considered. OPD works for large, possibly infinite state spaces, but only for finite, discrete action spaces. Unfortunately, while OPD shows good theoretical near-optimality guarantees, there is no record yet of OPD being applied to control nonlinear physical systems in real-time. This is because of the (long) computation required by OPD. This work analyzes two main methods that can be used to make OPD suitable for real-time applications. The first approach is to increase the computational speed of the planning process by parallelizing the algorithm. Unfortunately, while parallelization has been proven to be able to increase the computational speed in classical planning, in experiments no improvement is found yet for OPD using parallelization. However, a potential benefit from creating a parallel version of OPD is not ruled out and it is expected that more research and more efficient implementations could still lead to an increase in the computational speed. The second approach is to apply sequences of actions instead of single actions, which increases the time available for the planning process. Re-planning starts immediately after a sequence is returned, using as initial state a prediction of the state at the end of the previous sequence. The resulting algorithm is called Real-Time Optimistic Planning with Action Sequences (RT-OPS). Extensive analysis is performed to find restrictions on the parameters of the algorithm that, when met, can guarantee real-time applicability. Additionally, the effect of using sequences of actions on the performance of the algorithm is investigated and bounds are put on the maximum performance loss. The performance of RT-OPS has been tested in various experiments on different problems: a cart-pole simulation, an acrobot simulation and a real inverted pendulum. Different settings are compared and, overall, RT-OPS proves to perform well, without violating real-time constraints. The experiments prove that RT-OPS allows for the use of optimistic planning for real-time control of physical nonlinear systems. Future work should focus on applying the ideas used to develop RT-OPS to other optimistic planning algorithms, such as those that allow for continuous actions or stochastic systems. Furthermore, a parallelization of RT-OPS could be developed that increases its computational speed.

Subject

optimistic
planning
nonlinear
control
optimal
real-time
parallel

To reference this document use:

http://resolver.tudelft.nl/uuid:8934c9d2-02c0-4d6e-8517-492e41be7ff8

Embargo date

2015-03-07

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

mscThesis_Thijs_Wensveen_final.pdf

5.58 MB

Close viewer