Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation

Tan, Yicong

Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation

Title

Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation

Author

Tan, Yicong (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Pattern Recognition and Bioinformatics)

Contributor

van Gemert, J.C. (mentor)
Staring, Marius (graduation committee)
Mody, Prerak (graduation committee)
van der Valk, Viktor (graduation committee)
Yang, J. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science

Date

2022-09-30

Abstract

Literature on medical imaging segmentation claims that self-attention-based Transformer blocks perform better than convolution in UNet-based architectures. This recently touted success of Transformers warrants an investigation into which of its components contribute to its performance. Moreover, previous work has a limitation of analysis only at fixed data scales as well as unfair comparisons with others models where parameter counts are not equivalent. This work investigates the performance of the window-Based Transformer for prostate CT Organ-at-Risk (OAR) segmentation at different data scales in context of replacing its various components. To compare with previous literature, the first experiment replaces the window-based Transformer block with convolution. Results show that the convolution prevails as the data scale increases. In the second experiment, to reduce complexity, the self-attention mechanism is replaced with an equivalent albeit simpler spatial mixing operation i.e. max-pooling. We observe improved performance for max-pooling in smaller data scales, indicating that the window-based Transformer may not be the best choice in both small and larger data scales. Finally, since convolution has an inherent local inductive bias of positional information, we conduct a third experiment to imbibe such a property to the Transformer by exploring two kinds of positional encodings. The results show that there are insignificant improvements after adding positional encoding, indicating the Transformers deficiency in capturing positional information given our data scales. We hope that our approach can serve as a framework for others evaluating the utility of Transformers for their tasks.

Subject

Medical Segmentation
Transformer
Convolution

To reference this document use:

http://resolver.tudelft.nl/uuid:f280e796-175c-4b74-8950-92b8fa4f5652

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

master_thesis_report_fina ... ongtan.pdf

36.01 MB

Close viewer