A knowledge base approach for semantic interpretation and decomposition in concept based video retrieval

Typically, video retrieval systems apply a text based search approach to find videos that match a search query. This approach is based on textual metadata attached to the videos, such as the video title, a short textual description and tags. Although this approach has been proven to be effective, a textual search can not always be applied since textual metadata that describes what happens in the video may not be always available. An alternative approach to the text based search is a concept based video search approach, which is based on annotating the videos with concepts that can be recognized by detectors built using computer vision techniques. Examples of recognizable concepts are entities, such as a person, a face, a building, and some types of actions, such as walking. However, only a limited number of concepts can be actually recognized, compared to the potentially unlimited number of concepts that a user can use to express his query. Therefore, the concepts given by the user in his query have to be decomposed into those (fewer) concepts that can be visually recognized in the videos. The main contribution of this work is a pipeline that can be used in a concept based video retrieval setting. This pipeline is implemented following an approach that uses a knowledge base for semantic interpretation and decomposition of concepts in a user query. Particularly, the pipeline takes a user query as input and this query is parsed in such a way that concepts are detected from it. A semantic interpretation is added to the concepts by mapping them to a knowledge base. Decomposition strategies are addressed and evaluated that decompose concepts for which there are not detectors into concepts for which there are detectors. Two knowledge bases are used as sources for semantic interpretation and concept decomposition, namely YAGO2s and ConceptNet. YAGO2 contains 10 million concepts and 120 million relations extracted from several sources, such as Wikipedia, WordNet and GeoNames. ConceptNet contains concepts and semantic relations constructed by combining multiple sources, such as WordNet, DBpedia and the English Wikitionary. This work is evaluated against relevant work carried out in the context of TRECVID, the catalyst conference in the field of video retrieval, in which teams from all over the world contribute in several video retrieval related tasks.

To reference this document use:

http://resolver.tudelft.nl/uuid:cd500e1a-54b6-4764-9460-d8e737bb90cb

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

thesis.pdf

4.44 MB

Close viewer