A Survey on Accelerating Sparse CNN Inference on GPUs

Chen, Qilin

A Survey on Accelerating Sparse CNN Inference on GPUs

Title

A Survey on Accelerating Sparse CNN Inference on GPUs

Author

Chen, Qilin (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Mohamed, Hasan (mentor)
Liu, Shih-Chii (mentor)
Tömen, N. (mentor)
Zuniga, Marco (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science and Engineering

Project

CSE3000 Research Project

Date

2022-06-24

Abstract

Convolutional neural networks (CNNs) are often pruned to achieve faster training and inference speed while also requiring less memory. Nevertheless, during computation, most modern GPUs cannot take advantage of the sparsity automatically, especially on networks with unstructured sparsity. Therefore, many libraries that exploit sparsity, have been proposed for accelerating CNN inference on GPUs. However, there is little research on systematically comparing them. In this paper, some state-of-the-art libraries for accelerating sparse CNN inference on GPUs are reviewed and benchmarked. Most of the libraries speedup the convolution and/or pooling operations by skipping zero calculations, therefore, they are able to perform sparse matrix calculations faster. However, many of them have hardware and software restrictions and are hard to integrate into a new model to perform end-to-end inference.

Subject

Convolutional Neural Networks (CNNs)
Sparsity
Accelerators
Inference

To reference this document use:

http://resolver.tudelft.nl/uuid:615a9965-3685-439e-8599-9c913b9902da

Part of collection

Student theses

Document type

bachelor thesis

Rights

Files

PDF

A_Survey_on_Accelerating_ ... n_GPUs.pdf

2.33 MB

Close viewer