Print Email Facebook Twitter Encoding methods for categorical data Title Encoding methods for categorical data: A comparative analysis for linear models, decision trees, and support vector machines Author Udilă, Andrei (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Web Information Systems) Contributor Ionescu, A. (mentor) Katsifodimos, A (mentor) Isufi, E. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science and Engineering Project CSE3000 Research Project Date 2023-06-28 Abstract This paper presents a comprehensive evaluation and comparison of encoding methods for categorical data in the context of machine learning. The study focuses on five popular encoding techniques: one-hot, ordinal, target, catboost, and count encoders. These methods are evaluated using linear models, decision trees, and support vector machines (SVMs).The results demonstrate that one-hot encoding consistently achieves the highest accuracy across all evaluated machine learning algorithms. However, it also incurs a higher runtime, especially when feature cardinality is high. Catboost encoding emerges as a promising alternative, striking a balance between accuracy and runtime efficiency. The ordinal, target, and catboost encoders perform similarly, with small variations depending on the specific machine learning algorithm used.Based on the findings, practitioners are advised to select one-hot encoding when accuracy is of utmost importance and computational resources are sufficient. For scenarios where runtime efficiency is critical, the catboost encoder offers competitive accuracy while minimizing training time. The ordinal encoder can be a suitable alternative when dealing with high feature cardinality. Subject Categorical DataEncodingFeature EngineeringOne-Hot EncodingOrdinal EncodingCatboost EncodingTarget EncodingCount Encoding To reference this document use: http://resolver.tudelft.nl/uuid:10b91b99-2685-4a45-b44e-48fbbf808ce2 Part of collection Student theses Document type bachelor thesis Rights © 2023 Andrei Udilă Files PDF Encoding_Methods_for_Cate ... l_Data.pdf 273.99 KB Close viewer /islandora/object/uuid:10b91b99-2685-4a45-b44e-48fbbf808ce2/datastream/OBJ/view