Print Email Facebook Twitter Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation Title Analyzing Components of a Transformer under Different Data Scales in 3D Prostate CT Segmentation Author Tan, Yicong (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Pattern Recognition and Bioinformatics) Contributor van Gemert, J.C. (mentor) Staring, Marius (graduation committee) Mody, Prerak (graduation committee) van der Valk, Viktor (graduation committee) Yang, J. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science Date 2022-09-30 Abstract Literature on medical imaging segmentation claims that self-attention-based Transformer blocks perform better than convolution in UNet-based architectures. This recently touted success of Transformers warrants an investigation into which of its components contribute to its performance. Moreover, previous work has a limitation of analysis only at fixed data scales as well as unfair comparisons with others models where parameter counts are not equivalent. This work investigates the performance of the window-Based Transformer for prostate CT Organ-at-Risk (OAR) segmentation at different data scales in context of replacing its various components. To compare with previous literature, the first experiment replaces the window-based Transformer block with convolution. Results show that the convolution prevails as the data scale increases. In the second experiment, to reduce complexity, the self-attention mechanism is replaced with an equivalent albeit simpler spatial mixing operation i.e. max-pooling. We observe improved performance for max-pooling in smaller data scales, indicating that the window-based Transformer may not be the best choice in both small and larger data scales. Finally, since convolution has an inherent local inductive bias of positional information, we conduct a third experiment to imbibe such a property to the Transformer by exploring two kinds of positional encodings. The results show that there are insignificant improvements after adding positional encoding, indicating the Transformers deficiency in capturing positional information given our data scales. We hope that our approach can serve as a framework for others evaluating the utility of Transformers for their tasks. Subject Medical SegmentationTransformerConvolution To reference this document use: http://resolver.tudelft.nl/uuid:f280e796-175c-4b74-8950-92b8fa4f5652 Part of collection Student theses Document type master thesis Rights © 2022 Yicong Tan Files PDF master_thesis_report_fina ... ongtan.pdf 36.01 MB Close viewer /islandora/object/uuid:f280e796-175c-4b74-8950-92b8fa4f5652/datastream/OBJ/view