Self-tuning gains of a quadrotor using a simple model for policy gradient reinforcement learning