From Feature Selection to Data Augmentation: the ADA Algorithm

Cruset Pla, Eduard

From Feature Selection to Data Augmentation: the ADA Algorithm

Title

From Feature Selection to Data Augmentation: the ADA Algorithm

Author

Cruset Pla, Eduard (TU Delft Electrical Engineering, Mathematics and Computer Science)

Contributor

Hai, R. (mentor)
Ionescu, A. (mentor)
Epema, D.H.J. (graduation committee)

Degree granting institution

Delft University of Technology

Programme

Computer Science and Engineering

Project

CSE3000 Research Project

Date

2022-06-22

Abstract

The democratization of data science, and in particular of the machine learning pipeline, has focused on the automation of model selection, feature processing, and hyperparameter tuning. Nevertheless, the need for high-quality data for increased performance has sparked interest in the inclusion of data augmentation in these automatic machine learning techniques. This research approaches this topic by examining different feature selection techniques that will ultimately allow devising what makes a feature desirable. We introduce an automatic data augmentation process, tailored for support vector machines, that employs sample joins. This approach is evaluated through different setups, datasets, and other machine learning models: CART, random forests, and XGBoost. The results are mixed: the algorithm identifies the features containing the signal, resulting in accuracy scores close to the models trained with all the data. However, the computational time is higher. A theoretical analysis suggest that the methodology might be helpful in particular cases where data is structured in specific ways.

Subject

Data Augmentation
Feature selection
Support Vector Machines

To reference this document use:

http://resolver.tudelft.nl/uuid:ece35d68-e261-4c8b-9ae5-a497715d1059

Part of collection

Student theses

Document type

bachelor thesis

Rights

Files

PDF

CSE3000ResearchProject.pdf

274.86 KB

Close viewer