Print Email Facebook Twitter Workload Characterization and Modeling, and the Design and Evaluation of Cache Policies for Big Data Storage Workloads in the Cloud Title Workload Characterization and Modeling, and the Design and Evaluation of Cache Policies for Big Data Storage Workloads in the Cloud Author Talluri, Sacheendra (TU Delft Electrical Engineering, Mathematics and Computer Science) Contributor Iosup, Alexandru (mentor) Rellermeyer, Jan (graduation committee) Kuipers, Fernando (graduation committee) Degree granting institution Delft University of Technology Programme Computer Science | Software Technology Date 2018-12-07 Abstract The proliferation of big-data processing platforms has already led to radically different system designs, such as MapReduce and the newer Spark. Understanding the workloads of such systems enables tuning and could foster new designs. However, whereas MapReduce workloads have been characterized extensively, relatively little public knowledge exists about the characteristics of Spark workloads in representative environments. In this work, we focus on understanding the behavior and cache performance of the storage sub-system used for Spark workloads in the cloud. First, we statistically characterize its usage. Second, we design a generative model to tackle the scarcity of workload traces. Third, we design a cache policy putting our insight from the characterization to work. Finally, we evaluate the performance of different cache policies for big data workloads via simulation. Subject Big DataStorageCloudModelingPerformance EvaluationCache PoliciesCharacterization To reference this document use: http://resolver.tudelft.nl/uuid:29f066b2-1e7c-4ab4-8ba8-3516032a8237 Part of collection Student theses Document type master thesis Rights © 2018 Sacheendra Talluri Files PDF report.pdf 13.57 MB Close viewer /islandora/object/uuid:29f066b2-1e7c-4ab4-8ba8-3516032a8237/datastream/OBJ/view