Print Email Facebook Twitter Classifying Candida species using Mixed Integer Optimization based optimal classification trees Title Classifying Candida species using Mixed Integer Optimization based optimal classification trees Author van Dijk, Mick (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor van Iersel, L.J.J. (mentor) Stougie, Prof. dr. L. (mentor) Kelk, Ir. S. (mentor) Boekhout, Prof. dr. T. (graduation committee) Aardal, K.I. (graduation committee) Degree granting institution Delft University of Technology Programme Applied Mathematics Date 2019-01-28 Abstract Global medical use of azole antifungals and echinocandins has led to an enormous increase in resistant Candida species, that are most commonly associated with fungal infections. A possible mechanism causing resistance are single or simultaneous point mutations in the genes responsible for encoding antifungal target enzymes. The aim of this thesis is to apply and compare several classification algorithms, in particular decision tree algorithms, on Candida data sets received from the Westerdijk Fungal Biodiversity Institute. Bertsimas and Dunn recently introduced a novel formulation based on Mixed Integer Optimization to generate optimal classification trees. We have implemented this method and applied it on C. albicans and C. glabrata data sets to construct univariate and multivariate classification trees. We were able to correctly classify 68-72% of the C. albicans isolates and 76.5-82.5% of C. glabrata isolates. Moreover, by changing the objective function and adding constraints to the original MIO formulation, we constructed trees that take into consideration false negative errors, decreasing this type of error by 64-80% for C. albicans and 56-66% for C. glabrata. To deal with ambiguous nucleotides in the C. albicans data set we introduced a novel formulation to construct non-binary classification trees. It turned out that ternary trees are a good representation of the C. albicans data set, performing strong in terms of out-of-sample accuracy. Finally, we identified combinations of amino acid substitutions and nucleotide mutations possibly related to resistance in C. albicans and C. glabrata. Subject OptimizationMachine LearningBioinformatics To reference this document use: http://resolver.tudelft.nl/uuid:068ff836-099b-4abe-8d9e-cf96706169df Part of collection Student theses Document type master thesis Rights © 2019 Mick van Dijk Files PDF Thesis_Mick_van_Dijk_TU_D ... lft_2_.pdf 575.58 KB Close viewer /islandora/object/uuid:068ff836-099b-4abe-8d9e-cf96706169df/datastream/OBJ/view