Print Email Facebook Twitter Does text matter? Title Does text matter?: Extending CLIP with OCR and NLP for image classification and retrieval Author Sassoon, Jordan (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Zhao, Zilong (mentor) Chen, Lydia Y. (mentor) Lukina, A. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-27 Abstract Contrastive Language-Image Pretraining (CLIP) has gained vast interest due to its impressive performance on a variety of computer vision tasks: image classification, image retrieval, action recognition, feature extraction, and more. The model learns to associate images with their descriptions, a powerful method which allows it to perform well on unseen domains. Often, the descriptions fail to capture text which is contained within the image, a source of information which could prove useful for a handful of computer vision tasks. This limitation requires finetuning in domains where contained text is important. In fact, CLIP has mixed performance on Optical Character Recognition (OCR). This paper proposes a novel architecture: OSBC (OCR Sentence BERT CLIP), which combines CLIP and a custom text extraction pipeline, composed of an OCR model, and a Natural Language Processing (NLP) model. OSBC uses the text contained within images as an additional feature when performing image classification and retrieval. We tested the model on multiple datasets for each task, occasionally outperforming CLIP when images contained text, while maintaining finetunability, and improving the model's robustness. In addition, OSBC was designed to be generalizable, meaning it is expected to perform well on unseen domains without finetuning, though this was not achieved in practice. Subject zero-shot learningDeep LearningMachine LearningComputer VisionCLIPTransformers To reference this document use: http://resolver.tudelft.nl/uuid:55a159a7-461a-490e-bc73-5194c0ed3b4e Part of collection Student theses Document type bachelor thesis Rights © 2023 Jordan Sassoon Files PDF OSBC_Jordan_Sassoon_2023.pdf 4.02 MB Close viewer /islandora/object/uuid:55a159a7-461a-490e-bc73-5194c0ed3b4e/datastream/OBJ/view