Print Email Facebook Twitter Scalable machine learning algorithms on a big data infrastructure Title Scalable machine learning algorithms on a big data infrastructure Author Folkers, C. Contributor Al-Ars, Z. (mentor) Faculty Electrical Engineering, Mathematics and Computer Science Department Computer Engineering Date 2016-01-22 Abstract Two currently popular topics in computer science are machine learning and big data. Often the two are combined to obtain powerful machines with learning capabilities or high throughput data analysis programs among others. This research analyses which machine learning techniques qualify to be efficiently implemented on a scalable big data infrastructure. Several machine learning algorithms are analyzed and modified to scale on a multi-processor machine. Furthermore this thesis investigates the scalability potential of an existing image segmentation pipeline, used for cancer diagnostics, containing an artificial neural network. The neural network is implemented according to one of the proposed scalable algorithms on a 64 CPU, 256 thread PowerPC-7 cluster with 64 CPU's capable of running 256 threads. While suffering from a large overhead penalty, the pipelines run time is still reduced greatly and show excellent scalability. This scalability allows for greater input sets with equal execution times by expanding the platforms resources. This provides an opening for future research in improving the pipelines diagnostics capability. Subject Big DataMachine LearningApache SparkCancer diagnosticsImage Segmentation To reference this document use: http://resolver.tudelft.nl/uuid:d4d2230a-a034-43fa-bd29-0eb6b954410b Part of collection Student theses Document type master thesis Rights (c) 2016 Folkers, C. Files PDF thesis.pdf 7.71 MB Close viewer /islandora/object/uuid:d4d2230a-a034-43fa-bd29-0eb6b954410b/datastream/OBJ/view