Print Email Facebook Twitter Distance Based Source Domain Selection for Automated Sentiment Classification Title Distance Based Source Domain Selection for Automated Sentiment Classification Author Razoux Schultz, Lex (TU Delft Mechanical, Maritime and Materials Engineering) Contributor Mohajerin Esfahani, P. (mentor) Loog, M. (mentor) Keviczky, T. (mentor) Degree granting institution Delft University of Technology Programme Mechanical Engineering | Systems and Control Date 2018-04-04 Abstract Automated Sentiment Classification (SC) on short text fragments has been an upcoming field of research. Different machine learning techniques and word representation models have proven to be successful in classifying sentiment of opinion expressions in various domains, i.e. different topics or source media. However, when training on a source domain different from the target domain of interest, we encounter a large domain shift resulting in poor cross domain classification performance. In this report, we first provide information on the key principles of SC, starting with the SC pipeline and the encountered domain shift. Then, we show a novel method of selecting a source domain by using four unsupervised distance measures: Chi squared distance, Maximum Mean Discrepancy (MMD), Earth Mover’s Distance (EMD) and Kullback-Leibler Divergence (KLD). We evaluate the effectiveness of using these unsupervised measures individually, and in a linear combination, to identify one or more suitable source domains for an SC task for various target domains. This linear combination is proposed as the CMEK model, an acronym of the four measures it uses.Results show that our proposed CMEK model for source domain selection results in a reduction of adaptation loss by 7 percent points compared to training on a randomly selected source domain. When selecting multiple domains, our proposed selection method is competitive with training on all data. In the light of general performance, we recommend the CMEK model for source domain selection for an SC task. The CMEK model shows significantly good performance and stable behavior in selecting multiple source domains and it has solid performance in selecting the single best domain. Subject sentiment analysissentiment classificationdomain adaptationsource selectiondomain selection To reference this document use: http://resolver.tudelft.nl/uuid:bc430a45-3377-40de-9408-428b39b4f196 Embargo date 2018-04-30 Part of collection Student theses Document type master thesis Rights © 2018 Lex Razoux Schultz Files PDF 20180319LRS_Final_report.pdf 1.68 MB Close viewer /islandora/object/uuid:bc430a45-3377-40de-9408-428b39b4f196/datastream/OBJ/view