Correspondence Between Perplexity Scores and Human Evaluation of Generated TV-Show Scripts