Print Email Facebook Twitter From Feature Selection to Data Augmentation: the ADA Algorithm Title From Feature Selection to Data Augmentation: the ADA Algorithm Author Cruset Pla, Eduard (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Hai, R. (mentor) Ionescu, A. (mentor) Epema, D.H.J. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2022-06-22 Abstract The democratization of data science, and in particular of the machine learning pipeline, has focused on the automation of model selection, feature processing, and hyperparameter tuning. Nevertheless, the need for high-quality data for increased performance has sparked interest in the inclusion of data augmentation in these automatic machine learning techniques. This research approaches this topic by examining different feature selection techniques that will ultimately allow devising what makes a feature desirable. We introduce an automatic data augmentation process, tailored for support vector machines, that employs sample joins. This approach is evaluated through different setups, datasets, and other machine learning models: CART, random forests, and XGBoost. The results are mixed: the algorithm identifies the features containing the signal, resulting in accuracy scores close to the models trained with all the data. However, the computational time is higher. A theoretical analysis suggest that the methodology might be helpful in particular cases where data is structured in specific ways. Subject Data AugmentationFeature selectionSupport Vector Machines To reference this document use: http://resolver.tudelft.nl/uuid:ece35d68-e261-4c8b-9ae5-a497715d1059 Part of collection Student theses Document type bachelor thesis Rights © 2022 Eduard Cruset Pla Files PDF CSE3000ResearchProject.pdf 274.86 KB Close viewer /islandora/object/uuid:ece35d68-e261-4c8b-9ae5-a497715d1059/datastream/OBJ/view