This Is AuburnElectronic Theses and Dissertations

Analyzing the Benefits of Graphics Processing Units for Computation-Intensive Applications on Hadoop

Date

2015-07-21

Author

Vasko, Kevin

Type of Degree

Master's Thesis

Department

Computer Science

Abstract

Due to the ever expanding amount of data that is being generated in the “Big Data era” there is an ever increasing challenge of processing this data. This work aims to tackle the challenge of improving the performance of processing unstructured data being generated by combining two different technologies of General Purpose Graphics Processor Units (GPGPUs) and Apache Hadoop. Many researchers have focused on improving either the GPGPU or Apache Hadoop; very little amount of evaluation data is available in combination of the two technologies, which we aim to study in our work. JCuda, JCublas and JCuFFT were used in conjunction with the CUDA library to incorporate GPGPU computation within the Apache Hadoop framework. We utilized the Nvidia Tesla M2050 and up to 16 compute nodes to evaluate the integration of the GPGPU and Apache Hadoop framework with three different use cases. Two synthetic benchmarks, matrix multiplication, fast Fourier transformations, and a real world application, image processing using the Gaussian blur filter. We were able to achieve up to, 6.91X, 2.48X and 1.49X of overall performance improvements for matrix multiplication, FFT and Gaussian blur respectively. We expect that our work will be useful to the reader to gauge the amount of performance that they would be able to achieve under certain workloads for their own work with the combination of GPGPU and Apache Hadoop.