Code Smells &amp;amp; Software Quality in Machine Learning Projects

van Oort, Bart

Code Smells & Software Quality in Machine Learning Projects

Title

Code Smells & Software Quality in Machine Learning Projects

Author

van Oort, Bart (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Software Engineering; ING AI for FinTech Research)

Contributor

Cruz, Luís (mentor)
van Deursen, A. (mentor)
Loni, B. (graduation committee)
Liem, C.C.S. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science | Software Technology

Date

2021-10-18

Abstract

Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of Software Engineering (SE) experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding standards. This research first set out to measure the prevalence of code smells in ML application projects. However, the results from this study additionally showed deficiencies in the dependency management of these projects, presenting a major threat to their maintainability and reproducibility. Static code analysis practices were also found to be lacking. These issues inspired the novel concept of project smells introduced in this research, which consider the ML project as a whole, including not just the code, but also the data, tools and technologies surrounding it and its development. To help ML practitioners in detecting and mitigating these project smells, as well as to help educate on SE principles, techniques and tools, I developed an open-source static analysis tool mllint using input from experienced ML engineers at the global bank and data-driven organisation ING. This tool was then used to evaluate the concept of project smells and how they fit the industrial context of ING in a second study. This second study also investigated obstructions to implementing best practices recommended by mllint, perceptions on static analysis tools and how ML practitioners perceive the difference in importance of mllint's linting rules (by extension, project smells) for proof-of-concept versus production-ready projects. The results indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development, while requiring minimal configuration effort from the user.

Subject

code smells
software quality
machine learning
artificial intelligence
project smells
mllint
se4ml
software engineering
context-aware static analysis

To reference this document use:

http://resolver.tudelft.nl/uuid:b20883f8-a921-487a-8a65-89374a1f3867

Part of collection

Student theses

Document type

master thesis

Rights

Files

PDF

MSc_Thesis_Final.pdf

2.94 MB

Close viewer