Print Email Facebook Twitter Data-Driven Empirical Analysis of Correlation-Based Feature Selection Techniques Title Data-Driven Empirical Analysis of Correlation-Based Feature Selection Techniques Author Buşe, Florena (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Ionescu, A. (mentor) Katsifodimos, A (mentor) Isufi, E. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-26 Abstract Thus far the democratization of machine learning, which resulted in the field of AutoML, has focused on the automation of model selection and hyperparameter optimization. Nevertheless, the need for high-quality databases to increase performance has sparked interest in correlation-based feature selection, a simple and fast, yet effective approach to removing noise and redundancy in relational data. However, little to no attention has been paid to what correlation metric to choose in order to maximize the performance of ML systems. Our research investigates the effectiveness and efficiency of four widely-known correlation measures, in particular Pearson, Spearman, Cramér's V, Symmetric Uncertainty, in a manner that simulates an AutoML-like setting. We show that the exact theoretical assumptions of the methods do not always hold in practice, as well as shed light on the main aspects that need to be considered when integrating correlation-based feature selection in ML systems. Notably, the results indicate that the performance obtained by correlation-based methods is highly tied to the types and number of features present in the underlying database rather than the choice of ML algorithm. We devise promising conclusions that can further serve the advancement of AutoML systems by making feature selection fully automatic and computationally tractable. Subject Feature selectionCorrelationMachine learningAutoMLFeature EngineeringPearson correlationSymmetric UncertaintyCramér's VSpearman correlationData-driven activities To reference this document use: http://resolver.tudelft.nl/uuid:ea4b4691-bf10-4f93-b8d0-200ff2a12dec Part of collection Student theses Document type bachelor thesis Rights © 2023 Florena Buşe Files PDF Data_Driven_Empirical_Ana ... niques.pdf 906.96 KB Close viewer /islandora/object/uuid:ea4b4691-bf10-4f93-b8d0-200ff2a12dec/datastream/OBJ/view