Print Email Facebook Twitter Investigating the case of weak baselines in Ad-hoc Retrieval and Question Answering Title Investigating the case of weak baselines in Ad-hoc Retrieval and Question Answering Author Morales Martinez, Francisco (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Hauff, C. (mentor) Liem, C.C.S. (graduation committee) Verwer, S.E. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Data Science and Technology Date 2020-02-19 Abstract Weak baselines have been present in Information Retrieval (IR) fordecades. They have been associated with IR progress stagnation, baselineselection bias to publish results more readily, and models’ effectivenessreproducibility issues that hinder the validation of results by independentresearch teams. Weak baselines have been studied by the IR community;however, the focus has been almost exclusive on ad-hoc retrieval, the mostpopular IR task, leaving outside other IR tasks and datasets recently de-veloped. Current deep neural IR research is particularly vulnerable to theissues with weak baselines due to the hype surrounding deep learning.In this thesis we investigate the cases of weak baselines in ad-hocretrieval and question answering (QA), two representative IR tasks among13 cases of weak baselines we found in current deep neural IR research fromEMNLP 2018 conference. In particular, we study whether the recentlyintroduced deep neural IR models are actually significantly more effectivethan the reported IR baselines or than LambdaMART, the Learning toRank (LTR) model we propose plus hyperparameter optimization (HPO).We also benchmark two HPO methods: RS and BOHB, to determine whichmethod is more efficient to retrieve a good hyperparameter configuration.Throughout our experiments we show that the effectiveness of thenovel deep neural IR models can be difficult to replicate, it might be lowerthan reported, and that it is not necessarily significantly higher than thebaseliness. Furthermore, we demonstrate that BOHB is more efficientthan RS, but the HPO process not always improves the effectiveness ofLambdaMART significantly. Subject Information RetrievalBaselinesLearning to RankDeep Neural IRAd-hoc RetrievalQuestion Answering To reference this document use: http://resolver.tudelft.nl/uuid:c6c33089-3715-421c-ba24-48302ca708b3 Part of collection Student theses Document type master thesis Rights © 2020 Francisco Morales Martinez Files PDF thesis_report_fmorales.pdf 2.41 MB Close viewer /islandora/object/uuid:c6c33089-3715-421c-ba24-48302ca708b3/datastream/OBJ/view