Print Email Facebook Twitter Embedding machine learning into passivity theory: A port-Hamiltonian approach. Title Embedding machine learning into passivity theory: A port-Hamiltonian approach. Author Sprangers, O.R. Contributor Babuska, R. (mentor) Lopes, G. (mentor) Faculty Mechanical, Maritime and Materials Engineering Department Delft Center for Systems and Control Date 2012-03-21 Abstract Passivity-based control (PBC) is a control methodology that achieves its control objective by rendering a system passive with respect to a desired storage function. A key feature of PBC is that it exploits structural properties of the system. In this thesis, the PBC of systems endowed with a special structure, called port-Hamiltonian (PH) systems, has been investigated. The geometric structure of PH systems allows reformulating the PBC problem in terms of solving a generally complex partial differential equation (PDE). Reinforcement learning (RL), on the other hand, is a learning control method that can solve complex nonlinear (stochastic) control problems without the need for a process model or explicitly solving a set of equations. In RL the controller receives an immediate numerical reward as a function of the process state and possibly control action. The goal is to find an optimal control policy that maximizes the cumulative long-term rewards, which corresponds to maximizing a value function. In this thesis, actor-critic techniques have been used, which are a class of RL methods in which a separate actor (the control law) and critic (a "memory") function are learned. A disadvantage of RL is that without having a process model it can be slow at learning and computationally expensive. In this thesis, the goal was to design a theoretical framework using PBC techniques subject to control saturation that incorporates knowledge about the PH system and learns (optimal) control policies using actor-critic reinforcement learning. Therefore, actor-critic reinforcement learning methods have been combined with different types of PBC, e.g. EB-PBC and IDA-PBC. The combination of EB-PBC with an actor-critic method, energy-balancing actor-critic (EBAC), showed its effectiveness in the pendulum swing-up problem, which was used as a benchmark test. The advantages of the method from a PBC perspective are that no PDE has to be explicitly solved, control saturation can be incorporated, the geometric structure of the PH system is preserved, (numerical) stability can be assessed using passivity theory and the learned controllers can be interpreted in terms of energy-shaping strategies. From a RL perspective, the learning is speeded up because model knowledge is available. To reference this document use: http://resolver.tudelft.nl/uuid:c47ce23e-86ae-420b-ac8f-3f00195e3f0b Part of collection Student theses Document type master thesis Rights (c) 2012 Sprangers, O.R. Files PDF Sprangers_O.R._-_Embeddin ... proach.pdf 2.44 MB Close viewer /islandora/object/uuid:c47ce23e-86ae-420b-ac8f-3f00195e3f0b/datastream/OBJ/view