Compressing code generation language models on CPUs: Using Group Lasso pruning and post-training quantization

Sochirca, Dan

Compressing code generation language models on CPUs

Title

Compressing code generation language models on CPUs: Using Group Lasso pruning and post-training quantization

Author

Sochirca, Dan (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Al-Kaswan, A. (mentor)
Izadi, M. (mentor)
van Deursen, A. (mentor)
Anand, A. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science and Engineering

Project

CSE3000 Research Project

Date

2023-06-28

Abstract

Code generation models have become more popular recently, due to the fact that they assist developers in writing code in a more productive manner. While these large models deliver impressive performance, they require significant computational resources and memory, making them difficult to deploy and expensive to train. Additionally, their large carbon footprint raises environmental concerns. To address these challenges, there is a need to develop techniques for compressing these models while maintaining their performance.
In this work, we study the effectiveness of Group lasso pruning and post-training quantization techniques on CPUs, applied to the code generation model CodeGPT. We evaluate the performance of the compressed model using the Exact Match (EM) and Edit Similarity (ES) metrics and study the model size on disk, memory footprint, and CPU inference. In contrast with the original CodeGPT model, our solution offers a 48% relative reduction in disk size, with only a mild drop in the accuracy metrics: 8.51% absolute drop in ES and a 5.5% in EM. Using the ONNX runtime on a regular laptop, we are able to deliver a 2x inference speedup at a 32.6% reduction in size. Our code is publicly available at https://github.com/AISE-TUDelft/LLM4CodeCompression/tree/main/CodeGPT-on-Intel.

Subject

Code generation
Transformers
Compression
CodeGPT
Group Lasso Pruning
Post-Training Quantization

To reference this document use:

http://resolver.tudelft.nl/uuid:47817baa-9c64-4cca-b206-09544ac5a75b

Part of collection

Student theses

Document type

bachelor thesis

Rights

Files

PDF

Final_paper_Dan.pdf

1.66 MB

Close viewer