Print Email Facebook Twitter Using Skip-Gram Model to Predict from which Show a Given Line is Title Using Skip-Gram Model to Predict from which Show a Given Line is Author Chen, Dina (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Viering, T.J. (mentor) Naseri Jahfari, A. (mentor) Makrodimitris, S. (mentor) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2020-06-22 Abstract Text classification has a wide range of usage such as extracting the sentiment out of a product review, analyzing the topic of a document and spam detection. In this research, the text classification task is to predict from which TV-show a given line is. The skip-gram model, originally used to train the Word2Vec sentence embeddings [Mikolov et al, 2013], is adapted to determine the likelihood of occurrence of a sentence in a TV-show. Based on this feature, a classifier is built to perform the task of this research. The results of the cross-validation show that it reaches an accuracy of 58% when running on the transcript data of 3 shows and 43% on 4 shows, while the accuracies of random guessing are supposed to be 33% and 25%. The difference between the neural networks and the skip-gram model becomes smaller when more shows are added to evaluate the model. Among each 5 fold cross-validation of the two models, the best results appear in the midmost iterations. Subject Natural Language ProcessingText ClassificationSkip-Gram Model To reference this document use: http://resolver.tudelft.nl/uuid:82350585-b0ba-4664-a6f8-b77a7340114f Bibliographical note https://github.com/DinaChen/NLP_RP Part of collection Student theses Document type bachelor thesis Rights © 2020 Dina Chen Files PDF Using_Skip_Gram_Dina.pdf 236.18 KB Close viewer /islandora/object/uuid:82350585-b0ba-4664-a6f8-b77a7340114f/datastream/OBJ/view