TCP/IP Implementation of Hadoop Acceleration
Type of Degreethesis
MetadataShow full item record
Cloud Computing is a booming technology in computer science. Since Google released the design details of the MapReduce technique in 2004 , cloud computing has been more and more popular. Hadoop  has been developed as an open-source implementation of MapReduce. A new network-levitated merge mechanism (Hadoop-A)  improves the existing Hadoop framework to solve many problems in the original framework. Hadoop-A avoids repetitive merging of data and introduces a full pipeline that consists of shuffle, merge and reduce phases. However, Hadoop-A is implemented based on Infiniband RDMA technology, which is not commonly deployed on commercial servers. On the other hand, data transmission based on the TCP/IP protocol is a robust technology, its speed is becoming faster and faster. Thus, we deem that it worthwhile to complement our RDMA-based connection with an implementation that is built on TCP/IP protocol. In this article, I will describe the details of design and implementation of a TCP/IP Implementation of Hadoop-A. Two components MOFSupplier (Server) and NetMerger (Client) are introduced to realize the TCP/IP connection, which can fetch data from Maptasks and send them to Reducetasks within the new network-levitated merge mechanism. Multithreading technologies are used to manage memory pool, send/receive and merge data segments. The experiment results show that the TCP/IP implementation can bring good performance for Hadoop-A on TCP/IP. Its execution time outperforms original Hadoop by 26.7% and can also achieve good scalability.