Investigating the case of weak baselines in Ad-hoc Retrieval and Question Answering

Morales Martinez, Francisco

Investigating the case of weak baselines in Ad-hoc Retrieval and Question Answering

Title

Investigating the case of weak baselines in Ad-hoc Retrieval and Question Answering

Author

Morales Martinez, Francisco (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Hauff, C. (mentor)
Liem, C.C.S. (graduation committee)
Verwer, S.E. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science | Data Science and Technology

Date

2020-02-19

Abstract

Weak baselines have been present in Information Retrieval (IR) for
decades. They have been associated with IR progress stagnation, baseline
selection bias to publish results more readily, and models’ effectiveness
reproducibility issues that hinder the validation of results by independent
research teams. Weak baselines have been studied by the IR community;
however, the focus has been almost exclusive on ad-hoc retrieval, the most
popular IR task, leaving outside other IR tasks and datasets recently de-
veloped. Current deep neural IR research is particularly vulnerable to the
issues with weak baselines due to the hype surrounding deep learning.
In this thesis we investigate the cases of weak baselines in ad-hoc
retrieval and question answering (QA), two representative IR tasks among
13 cases of weak baselines we found in current deep neural IR research from
EMNLP 2018 conference. In particular, we study whether the recently
introduced deep neural IR models are actually significantly more effective
than the reported IR baselines or than LambdaMART, the Learning to
Rank (LTR) model we propose plus hyperparameter optimization (HPO).
We also benchmark two HPO methods: RS and BOHB, to determine which
method is more efficient to retrieve a good hyperparameter configuration.
Throughout our experiments we show that the effectiveness of the
novel deep neural IR models can be difficult to replicate, it might be lower
than reported, and that it is not necessarily significantly higher than the
baseliness. Furthermore, we demonstrate that BOHB is more efficient
than RS, but the HPO process not always improves the effectiveness of
LambdaMART significantly.

Subject

Information Retrieval
Baselines
Learning to Rank
Deep Neural IR
Ad-hoc Retrieval
Question Answering

To reference this document use:

http://resolver.tudelft.nl/uuid:c6c33089-3715-421c-ba24-48302ca708b3

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

thesis_report_fmorales.pdf

2.41 MB

Close viewer