Print Email Facebook Twitter Correspondence Between Perplexity Scores and Human Evaluation of Generated TV-Show Scripts Title Correspondence Between Perplexity Scores and Human Evaluation of Generated TV-Show Scripts Author Keukeleire, Pia (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Makrodimitris, S. (graduation committee) Naseri Jahfari, A. (graduation committee) Viering, T.J. (graduation committee) Loog, M. (mentor) Tax, D.M.J. (mentor) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2020-06-22 Abstract In recent years many new text generation models have been developed while evaluation of text generation remains a considerable challenge. Currently, the only metric that is able to fully capture the quality of a generated text is human evaluation, which is expensive and time consuming. One of the most used intrinsic evaluation metrics is perplexity. This paper researched the correspondence between perplexity scores and human evaluation of scripts for the TV-show \textit{Friends} generated using OpenAI's GPT-2 model. This was done by conducting a survey taken by 226 participants that evaluated selected scripts on creativity, realism and coherence. The survey results revealed that generations with a perplexity value close to that of an actual Friends script perform best on creativity, but score low on realism and coherence. The most realistic and coherent generations were those with a lower perplexity value, while the worst in all fields were the generations with the highest perplexity value. The research shows that perplexity is not an adequate measure for the quality of generated TV-show scripts. Subject Natural Language ProcessingNatural Language GenerationPerplexityHuman evaluation To reference this document use: http://resolver.tudelft.nl/uuid:ab543db3-f285-477c-b4ce-b6ac57507554 Part of collection Student theses Document type bachelor thesis Rights © 2020 Pia Keukeleire Files PDF Research_paper_final_version.pdf 313.45 KB Close viewer /islandora/object/uuid:ab543db3-f285-477c-b4ce-b6ac57507554/datastream/OBJ/view