Guaranteed globally optimal continuous reinforcement learning