Print Email Facebook Twitter Distil-CodeGPT, Distilling Code-Generation Models for Local Use Title Distil-CodeGPT, Distilling Code-Generation Models for Local Use Author Malmsten, Emil (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Al-Kaswan, A. (mentor) Izadi, M. (mentor) van Deursen, A. (mentor) Anand, A. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-28 Abstract The application of large language models (LLMs) for programming tasks, such as automatic code completion, has seen a significant upswing in recent years. However, due to their computational demands, they have to operate on servers. This both requires users to have a steady internet connection and raises potential privacy concerns. Therefore, this study aims to explore the feasibility of compressing LLMs for code using knowledge distillation (KD), thereby facilitating local usage of these models. Existing research has primarily focused on the efficacy of using KD to compress BERT models for language tasks. Its application to GPT models for coding tasks and the impact of implementing KD in-training, as opposed to the pre-training, remain largely unexplored. To address these gaps we adapted DistilBERT, a pre-training KD algorithm for distilling BERT models for language tasks. Our adapted model, Distil-CodeGPT, utilizes intraining KD to compress LLMs for Python code. The findings of this study suggest that a substantial reduction in model size is achievable, albeit accompanied by a compromise in predictive accuracy. Specifically, using 8 layers, instead of the original 12, resulted in a 24% reduction in disk size and a 28% speed increase, with an accompanying accuracy decrease of 11%. These results show that this approach has potential and is a solid first step toward smaller code models. Subject GPTLLMBERTKnowledge DistillationNLPCodeGPTCopilot To reference this document use: http://resolver.tudelft.nl/uuid:22217e2b-0db8-4c56-8808-9713dd678425 Bibliographical note https://github.com/AISE-TUDelft/LLM4CodeCompression/tree/main/distill-CodeGPT The GitHub repository of the project Part of collection Student theses Document type bachelor thesis Rights © 2023 Emil Malmsten Files PDF CSE3000_DistilBERT_Emil.pdf 201.24 KB Close viewer /islandora/object/uuid:22217e2b-0db8-4c56-8808-9713dd678425/datastream/OBJ/view