Controlling the estimation bias in deep reinforcement learning problems with sparse rewards