Print Email Facebook Twitter Discovering the topics of Continuous Integration Projects on GitHub Title Discovering the topics of Continuous Integration Projects on GitHub Author Ostrovskis, Lukas (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Software Engineering) Contributor Huang, S. (mentor) Proksch, S. (graduation committee) Aivaloglou, E.A. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-28 Abstract Continuous Integration (CI) is a software development technique that enhances software quality and development efficiency, but its implementation usually depends on the project's context. This creates an opportunity for studying real-world CI projects on GitHub, focusing on their CI metrics and best practices. In this paper, we explore various methods to extract the topics from CI software projects on GitHub. This data can then be used to group projects and facilitate an in-depth analysis within specific contexts and application domains, such as CI build success rates in machine learning or React Native projects. We explore the definition of a software topic, as it shows significant granularity variations in related studies. We examine existing tools and other potential topic modeling approaches, compare varying types of textual data from GitHub that could be used as inputs for these tools, and report on interesting insights from initial trials with the developed tool. Our research led us to use GitHub topic labels as topic definitions due to their relevance and prior research focus. We also evaluated three topic extraction tools - LASCAD, a Multi-label Linear Regression classifier, and ChatGPT - incorporating the last two into our CI project mining tool. Additionally, we included two tool-independent approaches: GitHub's search function with the ability to filter repositories by topics and existing project topic labels. Lastly, to test the practicality of the tool, we mined 4899 public repositories and briefly investigated workflow metrics of projects grouped into six arbitrary topics. Subject Continuous Integrationtopic modellingGitHubMining To reference this document use: http://resolver.tudelft.nl/uuid:a9738367-cd4e-476c-ad4f-ec3061df2a80 Part of collection Student theses Document type bachelor thesis Rights © 2023 Lukas Ostrovskis Files PDF Lukas_Ostrovskis_Research_Paper.pdf 218.02 KB Close viewer /islandora/object/uuid:a9738367-cd4e-476c-ad4f-ec3061df2a80/datastream/OBJ/view