Using Skip-Gram Model to Predict from which Show a Given Line is

Chen, Dina

Using Skip-Gram Model to Predict from which Show a Given Line is

Title

Using Skip-Gram Model to Predict from which Show a Given Line is

Author

Chen, Dina (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Viering, T.J. (mentor)
Naseri Jahfari, A. (mentor)
Makrodimitris, S. (mentor)

Degree granting institution

Delft University of Technology

Programme

Computer Science and Engineering

Project

CSE3000 Research Project

Date

2020-06-22

Abstract

Text classification has a wide range of usage such as extracting the sentiment out of a product review, analyzing the topic of a document and spam detection. In this research, the text classification task is to predict from which TV-show a given line is. The skip-gram model, originally used to train the Word2Vec sentence embeddings [Mikolov et al, 2013], is adapted to determine the likelihood of occurrence of a sentence in a TV-show. Based on this feature, a classifier is built to perform the task of this research. The results of the cross-validation show that it reaches an accuracy of 58% when running on the transcript data of 3 shows and 43% on 4 shows, while the accuracies of random guessing are supposed to be 33% and 25%. The difference between the neural networks and the skip-gram model becomes smaller when more shows are added to evaluate the model. Among each 5 fold cross-validation of the two models, the best results appear in the midmost iterations.

Subject

Natural Language Processing
Text Classification
Skip-Gram Model

To reference this document use:

http://resolver.tudelft.nl/uuid:82350585-b0ba-4664-a6f8-b77a7340114f

Bibliographical note

https://github.com/DinaChen/NLP_RP

Part of collection

Student theses

Document type

bachelor thesis

Rights

Files

PDF

Using_Skip_Gram_Dina.pdf

236.18 KB

Close viewer