This Is AuburnElectronic Theses and Dissertations

Informed Prefetching in Distributed Multi-Level Storage Systems

Date

2011-12-08

Author

Al Assaf, Maen

Type of Degree

dissertation

Department

Computer Science

Abstract

In this dissertation, we present pipelined prefetching mechanisms that use application-disclosed access patterns to prefetch hinted blocks in multi-level storage systems. The fundamental concept in our approach is to split an informed prefetching process into a set of independent prefetching steps among multiple storage levels (e.g., main memory, solid state disks, and hard disk drives). In the first part of this study, we show that a prefetching pipeline across multiple storage levels is an viable and effective technique for allocating file buffers at the multiple-level storage devices. Our approaches (a.k.a., iPipe and IPO) extends previous ideas of informed prefetching in two ways: (1) our approach reduces applications' I/O stalls by keeping hinted data in caches residing in the main memory, solid state disks, and hard drives; (2) we propose a pipelined prefetching scheme in which multiple informed prefetching mechanisms semi-dependently work to fetch blocks from low-level (slow) to high-level (fast) storage devices. Our iPipe and IPO strategies integrated with the pipelining mechanism significantly reduce overall I/O access time in multiple-level storage systems. Next, we propose a third prefetching scheme called IPODS that aims to maximize the benefit of informed prefetches as well as to hide network latencies in a distributed storage systems. Finally, we develop a simulator to evaluate the performance of the proposed informed prefetching schemes in the context of multiple-level storage systems. We implement a prototype to validate the accuracy of the simulator. Our results show that our iPipe improves system performance by 56\% in most informed prefetching cases, IPO and IPODS improve system performance by 56\% and 6\% respectively in informed prefetching critical cases across a wide range of real-world I/O traces.