Print Email Facebook Twitter Studying the Machine Learning Lifecycle and Improving Code Quality of Machine Learning Applications Title Studying the Machine Learning Lifecycle and Improving Code Quality of Machine Learning Applications Author Haakman, M.P.A. (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor van Deursen, A. (mentor) Finavaro Aniche, M. (graduation committee) Liem, C.C.S. (graduation committee) Miranda da Cruz, L. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science Date 2020-08-07 Abstract As organizations start to adopt machine learning in critical business scenarios, the development processes change and the reliability of the applications becomes more important. To investigate these changes and improve the reliability of those applications, we conducted two studies in this thesis. The first study aims to understand the evolution of the processes by which machine learning applications are developed and how state-of-the-art lifecycle models fit the current needs of the fintech industry. Therefore, we conducted a case study with seventeen machine learning practitioners at the fintech company ING. The results indicate that the existing lifecycle models CRISP-DM and TDSP largely reflect the current development processes of machine learning applications, but there are crucial steps missing, including a feasibility study, documentation, model evaluation, and model monitoring. Our second study aims to reduce bugs and improve the code quality of machine learning applications. We developed a static code analysis tool consisting of six checkers to find probable bugs and enforcing best practices, specifically in Python code used for processing large amounts of data and modeling in the machine learning lifecycle. The evaluation of the tool using 1000 collected notebooks from Kaggle shows that static code analysis can detect and thus help prevent probable bugs in data science code. Our work shows that the real challenges of applying machine learning go much beyond sophisticated learning algorithms -- more focus is needed on the entire lifecycle. Subject Machine Learning LifecycleFinTechStatic Code Analysis To reference this document use: http://resolver.tudelft.nl/uuid:38ff4e9a-222a-4987-998c-ac9d87880907 Part of collection Student theses Document type master thesis Rights © 2020 M.P.A. Haakman Files PDF Thesis_Mark_Haakman.pdf 910.36 KB Close viewer /islandora/object/uuid:38ff4e9a-222a-4987-998c-ac9d87880907/datastream/OBJ/view