Optimizing deep reinforcement learning policies for deteriorating systems considering ordered action structuring and value of information