Print Email Facebook Twitter Code Smells & Software Quality in Machine Learning Projects Title Code Smells & Software Quality in Machine Learning Projects Author van Oort, Bart (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Software Engineering; ING AI for FinTech Research) Contributor Cruz, Luís (mentor) van Deursen, A. (mentor) Loni, B. (graduation committee) Liem, C.C.S. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Software Technology Date 2021-10-18 Abstract Artificial Intelligence (AI) and Machine Learning (ML) are pervasive in the current computer science landscape. Yet, there still exists a lack of Software Engineering (SE) experience and best practices in this field. One such best practice, static code analysis, can be used to find code smells, i.e., (potential) defects in the source code, refactoring opportunities, and violations of common coding standards. This research first set out to measure the prevalence of code smells in ML application projects. However, the results from this study additionally showed deficiencies in the dependency management of these projects, presenting a major threat to their maintainability and reproducibility. Static code analysis practices were also found to be lacking. These issues inspired the novel concept of project smells introduced in this research, which consider the ML project as a whole, including not just the code, but also the data, tools and technologies surrounding it and its development. To help ML practitioners in detecting and mitigating these project smells, as well as to help educate on SE principles, techniques and tools, I developed an open-source static analysis tool mllint using input from experienced ML engineers at the global bank and data-driven organisation ING. This tool was then used to evaluate the concept of project smells and how they fit the industrial context of ING in a second study. This second study also investigated obstructions to implementing best practices recommended by mllint, perceptions on static analysis tools and how ML practitioners perceive the difference in importance of mllint's linting rules (by extension, project smells) for proof-of-concept versus production-ready projects. The results indicate a need for context-aware static analysis tools, that fit the needs of the project at its current stage of development, while requiring minimal configuration effort from the user. Subject code smellssoftware qualitymachine learningartificial intelligenceproject smellsmllintse4mlsoftware engineeringcontext-aware static analysis To reference this document use: http://resolver.tudelft.nl/uuid:b20883f8-a921-487a-8a65-89374a1f3867 Part of collection Student theses Document type master thesis Rights © 2021 Bart van Oort Files PDF MSc_Thesis_Final.pdf 2.94 MB Close viewer /islandora/object/uuid:b20883f8-a921-487a-8a65-89374a1f3867/datastream/OBJ/view