A large-scale study of agents learning from human reward (Extended abstract)