This Is AuburnElectronic Theses and Dissertations

Secdoop: A Confidentiality Service for Hadoop Clusters




Majors, James

Type of Degree



Computer Science


The MapReduce model has proven to be an effective way to demonstrate academic research while using distributed file-systems. The MapReduce programming model was introduced by Google in 2004. MapReduce has proven to be a good solution for large data sets requiring intensive processing. Hadoop, an open-source Java implementation of MapReduce, was created by Yahoo in 2007. Industries that deal with sensitive data in large scales are hesitant to embrace a solution of processing that distributes their sensitive data. Cryptography is often used to protect sensitive data, but it is computing intensive, often making it undesirable as a solution. Utilizing cryptography while distributing the processing over a trusted cluster will improve the overall runtime. This is an excellent solution for large data sets that are sensitive in nature. In this paper, we describe two applications that distribute the cryptographic process over a trusted cluster. The first application will handle encryption of an input file that will be placed inside the Distributed File System (DFS). The second application will handle decryption of an input file that is located on the DFS. These two applications will demonstrate the effect of utilizing cryptography while distributing processing over a Hadoop cluster. Our application features the use of cryptography in parallel on a cluster of commodity machines. Our experimental results show the performance increase when dealing with large sets of data.