Print Email Facebook Twitter Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications Title Coner: A Collaborative Approach for Long-Tail Named Entity Recognition in Scientific Publications Author Vliegenthart, Daniël (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Software Technology) Contributor Lofi, Christoph (mentor) Houben, Geert-Jan (graduation committee) Erdweg, Sebastian (graduation committee) Degree granting institution Delft University of Technology Date 2018-08-30 Abstract Named Entity Recognition (NER) for rare long-tail entities as e.g. often found in domain-specific scientific publications is a challenging task, as typically the extensive training data and test data for fine-tuning NER algorithms is lacking. Recent approaches presented promising solutions relying on training NER algorithms in a iterative distantly-supervised fashion, thus limiting human interaction to only providing a small set of seed terms. Such approaches heavily rely on heuristics in order to cope with the limited training data size. As these heuristics are prone to failure, the overall achievable performance is limited.In this thesis we introduce Coner: A collaborative approach to incrementally incorporate human feedback on the relevance of extracted entities into the training cycle of such iterative NER algorithms. Coner allows to still train new domain specific rare long-tail NER extractors with low costs, but with ever increasing performance while the algorithm is actively used. We do so by employing our intelligent entity selection mechanism that solely selects and visualises extracted entities with the highest potential knowledge gain from users interacting with them and providing feedback on facet relevance. Additionally, users can add new typed entities they deem relevant. Our Coner collaborative human feedback pipeline consists of three novel modules; a document analyser that extracts deep metadata from documents and selects a representative set of publications from a corpus to receive human feedback on, an interactive document viewer that allows users to give feedback on and add new typed entities simply by selecting the relevant text with their mouse and an explicit entity feedback analyser that calculates a facet relevance score through users' majority vote for each recognised entity. The resulting Coner entity facet relevance scores are then incorporated in the TSE-NER training cycle to boost the expansion and filtering heuristic steps. Remarkably, we revealed that even with limited availability of human resources we were able to boost TSE-NER's performance by up to 23.1% in terms of recall, up to 5.7% in terms of precision and the F-score with 13.1% depending on the setup of our smart entity selection mechanism and instructions given to evaluators. Subject Information ExtractionNamed Entity RecognitionDocument MetadataLong-Tail Entity TypesHuman FeedbackCrowdsourcing To reference this document use: http://resolver.tudelft.nl/uuid:2dbe055a-449d-45a0-875b-5e0321c33113 Part of collection Student theses Document type master thesis Rights © 2018 Daniël Vliegenthart Files PDF Master_Thesis_Daniel_Cone ... ations.pdf 6.29 MB Close viewer /islandora/object/uuid:2dbe055a-449d-45a0-875b-5e0321c33113/datastream/OBJ/view