HadioFS: Improve the Performance of HDFS by Off-loading I/O to ADIOS
Type of Degreethesis
MetadataShow full item record
Hadoop Distributed File System (HDFS) is the underlying storage for the whole Hadoop stack, which includes MapReduce, HBase, Hive, Pig, etc. Because of its robustness and portability, HDFS has been widely adopted, often without using the accompanying subsystems. However, as a user-level distributed filesystem designed for portability and implemented in Java, HDFS assumes the standard POSIX I/O interfaces to access disk, which makes it difficult to take most of the platform-specific performance-enhancing features and high performance I/O techniques that have already been very mature and popular in HPC community, such as data staging, asynchronous I/O and collective I/O, because of their incompatibility to POSIX. Although it is feasible to re-implement the disk access functions inside HDFS to exploit the advanced features and techniques, such modification of HDFS can be time-consuming and error-prone. In this paper, we propose a new framework HadioFS to enhance HDFS with Adaptive I/O System (ADIOS), support many different I/O methods and enable the upper application to select optimal I/O routines for a particular platform without source code modification and re-compilation. Specifically, we first customize ADIOS into a chunk-based storage system so that the semantics of its APIs can fit the requirement of HDFS easily; then we utilize Java Native Interface (JNI) to bridge HDFS and the tailored ADIOS together. We use different I/O patterns to compare HadioFS and the original HDFS, and the experimental results show the feasibility and benefits of the design. We also shed light on the performance of HadioFS using different I/O techniques. To the best of our knowledge, this is the first attempt to leverage ADIOS to enrich the functionality of HDFS.