Print Email Facebook Twitter Improving the robustness of decision trees in security-sensitive setting Title Improving the robustness of decision trees in security-sensitive setting Author Buijs, Cas (TU Delft Electrical Engineering, Mathematics and Computer Science; TU Delft Cyber Security) Contributor Verwer, S.E. (mentor) Lagendijk, R.L. (graduation committee) Tax, D.M.J. (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Cyber Security Date 2020-08-13 Abstract Machine learning is used for security purposes, to differ between the benign and the malicious. Where decision trees can lead to understandable and explainable classifications, an adversary could manipulate the model input to evade detection, e.g. the malicious been classified as the benign. State-of-the-art techniques improve the robustness by taking these adversarial attacks into account when building the model. In this work, I identify three factors contributing to the robustness of a decision tree: feature frequency, shortest distance between malicious leaves and benign prediction space, and impurity of benign prediction space. I propose two splitting criteria to improve these factors and suggest a combination with two trade-off approaches to balance the use of these splitting criteria with a common splitting criterion, Gini Impurity, in order to balance accuracy and robustness. These combinations allow building robuster models against adversaries manipulating the malicious data without considering adversarial attacks. The approaches are evaluated in a white-box setting against a decision tree and random forest, considering an unbounded adversary where robustness is measured using a L1-distance norm and the false negative rate. All combinations lead to robuster models at different costs in terms of accuracy, showing that adversarial attacks do not need to be taken into account to improve robustness. Compared to state-of-the-art work, the best approach achieves on average 3.17% better accuracy with an on average lower robustness of 5.5% on the used datasets for a single decision tree. In a random forest the best approach achieves on average 2.87% better robustness with a 2.37% better accuracy on the used datasets compared to the state-of-the-art work. The state-of-the-art work does not seem to affect all of the identified factors, which leaves room for even robuster models than currently existing. Subject Adversarial Machine LearningDecision TreesRobust learning To reference this document use: http://resolver.tudelft.nl/uuid:7fa236fe-686a-423b-ad56-5ac81e07d129 Part of collection Student theses Document type master thesis Rights © 2020 Cas Buijs Files PDF Thesis_Cas.pdf 7.92 MB Close viewer /islandora/object/uuid:7fa236fe-686a-423b-ad56-5ac81e07d129/datastream/OBJ/view