Policy derivation methods for critic-only reinforcement learning in continuous spaces