Improving Reliability, ENERGY-EFFICIENCY and security of Storage Systems and real-time systems Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classified information. ____________________________________ Kiranmai Bellam Certificate of Approval: __________________________ __________________________ David Umphress Xiao Qin, Chair Associate Professor Assistant Professor Computer Science and Software Computer Science and Software Engineering Engineering __________________________ __________________________ Cheryl Seals George T. Flowers Assistant Professor Interim Dean Computer Science and Software Graduate School Engineering Improving Reliability, ENERGY-EFFICIENCY and security of Storage Systems and real-time systems Kiranmai Bellam A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulllment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 10, 2009 Improving Reliability, ENERGY-EFFICIENCY and security of Storage Systems and real-time systems Kiranmai Bellam Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. ________________________ Signature of Author ________________________ Date of Graduation DISSERTATION ABSTRACT Improving Reliability, ENERGY-EFFICIENCY, and security of Storage Systems and real-time systems Kiranmai Bellam Doctor of Philosophy, August 10, 2009 (M.S. New Mexico Tech, 2006) (B.S. University of Madras, India, 2003) 150 Typed Pages Directed by Xiao Qin Many life-critical systems are required to operate without a system failure for a given period of time. Examples are nuclear, aerospace, spacecraft and other such systems. Many of these applications are either storage intensive or real time intensive, hence it is crucial to look at the reliability improvements of computer systems, mainly storage and real time systems. With respect to storage system reliability, in the business arena, data preservation and data mining has proven to be a boon in shaping business strategy. For individuals, storage is being called upon to preserve sentimental and historical artifacts such as photos, movies and personal documents. In both these areas, storage must keep pace with a growing need for efficient, reliable, long term storage. When it comes to real time systems, performance is their most important characteristic--right up until the point where it stops working. Then, suddenly, you don't care how fast it is--or rather, was. You just want it to start working again. The importance of reliability in both storage and real time systems is made clear by the above statements. Reliability of these two computer systems can be affected by a wide range of novel technologies, including energy conservation techniques and security mechanisms. In what follows, we describe two challenging issues of improving reliability in storage systems and real-time systems. The first issue is improving reliability and energy efficiency of storage systems, the second challenge is the improvement of reliability and security in real time systems. Modern day storage systems offer high levels of performance and disk capacity at low costs. In this dissertation, we propose methods for building energy-efficient and reliable large scale storage systems. The primary focus of our research is to achieve the twin goals of maximizing reliability and minimizing energy consumption by incorporating these energy-efficient and reliable techniques to large scale storage systems. The experimental results using both synthetic and real world applications (traces) have shown that the energy consumption could be significantly reduced while guaranteeing maximum reliability for storage disks with a marginal degradation of performance. Reliability of real time systems, second challenging issue, is studied in the later part of the dissertation. For real-time embedded systems we proposed techniques to integrate fault recovery and security services in real time embedded systems. Simulation results show that our techniques can significantly improve security over the conventional approaches, while achieving an efficient means of fault recovery. ACKNOWLEDGMENTS In the first place I would like to express my sincere gratitude to Dr. Xiao Qin for his supervision, advice and guidance for this research. Above all, he provided me with unflinching encouragement and support in various ways. Without his persistent help this dissertation would not have been possible. I gratefully acknowledge my committee members, Dr. David Umphress and Dr. Cheryl Seals for guiding me through the dissertation process. I am very thankful to Dr. Shiwen Mao for serving as the outside reader and proofreading my dissertation. I am also thankful to my research group, who include Xiaojun Ruan, Ziliang Zong, Adam Manzanares and Shu Yin, for their support and collaboration. I would like to take this opportunity to thank all my friends in Auburn who helped me during my study at Auburn University My deepest gratitude goes to my parents Venkateshwar Rao Bellam and Rani Bellam, for their encouragement through my PhD. Special thanks to my brother Shyam Bellam, for injecting motivative and competitive spirit into me. I would like to extend my gratitude to my new family members, my parents-in-law for bringing new joy to my life. Nitish Kosaraju, my husband, without him in my life this effort would have been worth nothing. Thank you for your endless love, support and encouragement. Thank you for believing in me. Style manual: IEEE Standard for Research Papers Software used: Microsoft Word 2007, Microsoft Excel 2007, Linux GCC Compiler, Microsoft Visio 2007, Eclipse, Adobe Photoshop, C/C++/Java TABLE OF CONTENTS LIST OF FIGURES xv LIST OF TABLES xviii 1. Introduction 1 1.1 Problem Statement 2 1.1.1 The Era of Large Scale Storage Systems 3 1.1.2 The Data Center Energy Crisis 3 1.1.3 Reliability of the Disk Systems 5 1.1.4 Security Issues of Real Time Systems 5 1.1.5 Reliability Concerns of Real Time Systems 6 1.2 Scope of Research 7 1.3 Contributions 8 1.4 Dissertation Organization 9 2. Literature Review 10 2.1 Related Work on Energy-Efficiency of Disks 11 2.2 Related Work on Reliability of Disks 13 2.3 Related Work on Reliability of Real Time Embedded Systems 14 2.4 Related Work on Security of Real Time Embedded Systems 15 2.5 Summary 16 3. energy Efficiency and Reliability of Storage Disks 18 3.1 System Model 19 3.2 Disk Failure Model 21 3.3 Reliability-Aware Energy Conservation Model 30 3.4 Analysis 35 3.5 Performance Evaluation 37 3.6 Summary 50 4. effects of Power State Transitions on Reliability of Energy Efficient Storage Systems 52 4.1 Motivation 53 4.2 System Model 55 4.2.1 Energy Efficiency 56 4.2.2 Disk Reliability 58 4.3 Control Algorithm for Power State Transition (CAPST) 62 4.4 Performance Evaluation 62 4.4.1 Experimental Setup 63 4.4.2 Experimental Results 64 4.5 Summary 70 5. Utilization based Reliable Energy Efficient Disks - UREED 71 5.1 Motivation 71 5.2 Model Description 73 5.3 UREED Algorithm 76 5.4 Mathematical Models of the File Servers 77 5.4.1Massive Arrays of Idle Disks (MAID) 78 5.4.2 Mathematical Model of MAID 78 5.4.3 Popular Data Concentration (PDC) 82 5.4.4 Mathematical Model of PDC 83 5.4.5 Utilization Based Reliable Energy Efficient Disks (UREED) 85 5.4.6 UREED Model 86 5.5 Performance Evaluation 90 5.5.1 Simulation 90 5.5.2 Simulation results 92 5.6 Summary 98 6. integrating Security and Reliability in Real-time Embedded Systems 100 6.1 Motivation 101 6.2 Checkpoints for Fault Recovery 101 6.2.1 Real-time application 102 6.2.2 Fault recovery model 103 6.3 Security Implementation 105 6.3.1 Confidentiality model 106 6.3.2 Integrity model 108 6.4 Integration of Fault Recovery with Security Mechanisms 110 6.5 Performance Evaluation 112 6.5.1 Confidentiality versus reliability 113 6.5.2 Integrity versus Reliability 117 6.5.3 Security versus reliability 119 6.6 Summary 121 7. interplay of Fault Recovery and Quality of Security using non-uniform check points 123 7.1 Motivation 124 7.2 Checkpoints for Fault Recovery 125 7.2.1 Fault tolerant model for non - uniform check point strategy 125 7.3 Security Implementation 129 7.3.1 Integration of Security Mechanisms with Fault Recovery using Non-Uniform check points 129 7.4 Performance Evaluation 132 7.4.1 Security versus reliability (Non-Uniform Check points) 132 7.5 Summary 138 8. conclusions and Future Work 140 8.1 Main Contributions 141 8.1.1 Reliability Aware Energy Efficient Algorithm for Storage Systems 142 8.1.2 Control Algorithm for Power State Transitions 143 8.1.3 Utilization Based Reliable and Energy Efficient Disks Systems 144 8.1.4 Security and Reliability for Real time embedded systems 144 8.1.5 Security and Reliability of real time systems using non uniform check point strategy 145 8.2 Future Work 145 8.2.1 Power Sensitive Applications 146 8.2.2 Meta Data Information 146 8.2.3 Dynamic Data Access Pattern Prediction 146 8.2.4 Writes, creates and purge operations 147 References 147  LIST OF FIGURES Figure 3.1. System model of parallel I/O systems with mirroring disks. 20 Figure 3.2. Annual failure rate for 6-month old disks with respect to utilization. 26 Figure 3.3. Annual failure rate for 1-year old disks with respect to utilization. 27 Figure 3.4. Annual failure rate for 3-year old disks with respect to utilization. 28 Figure 3.5 State transition diagram. 31 Figure 3.6 Reliability aware energy efficient (RAREE) algorithm for disk drives 35 Figure 3.7 Energy dissipation (Joules). IBM 40GNX. 38 Figure 3.8 Energy dissipation (Joules). IBM 73LZX. 39 Figure 3.9 Energy dissipation (Joules). IBM 3615. 40 Figure 3.10 Spin-Down Energy (Joules) vs. Energy Dissipation (Joules). 41 Figure 3.11. Spin-Down Energy (Joules) vs. Energy Dissipation (Joules). 42 Figure 3.12. Spin-Up Energy (Joules) vs. Energy Dissipation (Joules). 43 Figure 3.13. Spin-Up Energy (Joules) vs. Energy Dissipation (Joules). 44 Figure 3.14. Active Power (Watts) vs. Energy Dissipation (Joules). 45 Figure 3.15. Active Power (Watts) vs. Energy Dissipation (Joules). 46 Figure 3.16. Idle Power (Watts) vs. Energy Dissipation (Joules). 47 Figure 3.17. Idle Power (Watts) vs. Energy Dissipation (Joules). Disk Age = 1 Year. 48 Figure 3.18 Arrival rate vs. response time 49 Figure 4.1 CAPST Algorithm 62 Figure 4.2 Effect of arrival rate on Power State Transitions 65 Figure 4.3 Power State Transitions with CAPST 67 Figure 4.4 Variation of Energy consumed with Power Transition Frequency 68 Figure 4.5 Variation of Reliability with Power Transition Frequency 69 Fig 5.1 Energy savings per file request rate 94 Fig 5.2. Energy savings per file popularity 95 Fig 5.3 Energy savings per number of disks 96 Figure 6.1 Fault Recovery Model 103 Figure 6.2(a) Number of checkpoints vs. Security Level (Confidentiality) 114 Figure 6.2(b): Rho (C/d) vs. Security Level 115 Figure 6.2(c): Deadline vs. Security Level 116 Figure 6.3(a) Number of Checkpoints vs. Security Level (Integrity) 117 Figure 6.3(b): Rho(C/D) vs. Security Level(Integrity) 118 Figure 6.3(c) Deadline vs. Security Level (Integrity) 119 Figure 6.4 Security Level (Confidentiality) vs. Fault Recovery (Checkpoints) 120 Figure 6.5 Security Level (Integrity) vs. Fault Recovery (Checkpoints) 121 Fig.7.1 Uniform and Non-uniform distribution of check points with security over head 126 Figure 7.2 Confidentiality levels for Uniform check pointing and Non-Uniform check pointing when fault occurred at K=1,2,3 134 Figure 7.3 Uniform Vs Non-Uniform check pointing for Confidentiality 135 Figure 7.4 Integrity levels for Uniform check pointing and non uniform check pointing when fault occurred at k=1,2,3 136 Figure 7.5 Uniform Vs Non-Uniform check pointing for Integrity 137  LIST OF TABLES Table 3.1 List of parameters for the disk utilization model. 23 Table 3.2: Main characteristics of two SCSI disks and an IDE laptop disk 35 Table 3.3: Failure rate (%) of the parallel disk systems. 49 Table 4.1: Disk Parameters of IBM36Z15 63 Table 4.2: Comparison of Reliability 69 Table 3: Disk Parameters of IBM36Z15 91 Table 5.3 Simulator validation with mathematical model (energy in joules) 97 Table 5.4 Failure rate of the file servers 98 Table 6.1 Cryptographic Algorithms for Confidentiality 107 Table 6.2 Hash Functions for Integrity 108 Table 7.1: Security levels for Uniform and non-Uniform check points for different k 138  Chapter 1 Introduction The increasing number of large scale complex applications call for high performance computing platforms such as large scale storage systems, which provide a cost effective infrastructure for the most complicated scientific applications as well as commercial applications. Large scale storage systems translate directly to an increasing number of spinning disks creating a huge and growing energy crisis. Reliability is also an important characteristic of large scale storage systems, because sometime s the data stored on the disks might be mission critical, if not, the amount of energy saved by using some of the energy consumption techniques might have to be spent restoring the disk if it fails. Hence there is a need for highly reliable and energy efficient storage systems. By proposing energy efficient techniques which guarantee reliability, this problem can be alleviated. The first objective of this research is to explore highly energy efficient technologies to reduce the power consumption of large scale storage systems while guaranteeing the reliability of the system. Completing a task before the deadline is a very vital criteria in real time embedded systems. Along with the deadline, task security and fault tolerance are equally important in real time applications such as online banking and aircraft control systems. Hence there is a need for real time embedded systems which guarantee security while providing fault tolerance as two main goals. Many existing algorithms for real time embedded systems concentrated on one of the above goals while ignoring the other. To bridge this technology gap we have proposed two approaches to remedy this problem. The second objective of this dissertation is to integrate security and fault tolerance in real time embedded systems. This chapter first presents the problem statement in Section 1.1. In Section 1.2, we describe the scope of this research. Section 1.3 highlights the main contributions of this dissertation, and Section 1.4 outlines the dissertation organization. 1.1 Problem Statement In this section, we start with an overview of new trends in large scale storage systems. Section 1.1.2 introduces the serious data center energy crisis and section 1.1.3 presents the reliability issues of the disk systems, presenting the initial motivation for the first objective of our dissertation research, improving reliability and energy efficiency of storage systems. Section 1.1.4 outlines the security concerns of real time systems and section 1.1.5 details the importance of the reliability of real time systems. These motivated the second objective of our research, integrating the security and reliability of real time systems. 1.1.1 The Era of Large Scale Storage Systems Commodity disk drives have made online storage a way of life because of their declining costs. Hundreds of thousands of data servers are deployed in the data centers like Google and Yahoo. These data centers make life easy by proving the data or processing the data with few mouse clicks. Many large scale applications such as scientific computing, weather forecast, search engines and medical research projects are only possible after the advent of large scale storage systems. These data centers not only make life easy but they are also very important for the survival of human species in this twenty first century: For example predicting a future flood alert using weather forecast. 1.1.2 The Data Center Energy Crisis Energy crisis is a great bottleneck to any economy. Current worldwide data growth is estimated at a compound annual growth rate of 50.6% through the decade [7]. Several studies have indicated that data centres are headed towards a serious energy crisis. The energy crisis problem has a rippling effect as additional power is required. For instance 22% of the total energy consumption is due to cooling systems used to keep temperatures of data centres from rising high degrees. The solution to power consumption problem has been addressed at the hardware level to a certain extent. But the problem remains largely open in many deployments of large scale storage systems. Energy consumption of storage systems in data centres is becoming a predominant concern in the IT industry. Increasing evidences have shown that the powerful computing capability of data centers is actually in the cost of huge energy consumption. For example, Energy User News stated that the power requirements of today?s data centers range from 75 W/ft2 to 150-200 W/ft2 and will increase to 200-300 W/ft2 in the nearest future. The new data center capacity projected for 2005 in U.S. would require approximately 40 TWh ($4B at $100 per MWh) per year to run 24x7 unless they become more efficient. The supercomputing center in Seattle is forecast to increase the city's power demands by 25%. Even worse, the EPA predicted that the power usage of servers and data centers will be doubled again within five years if the historical trends are followed. However, most previous research about large scale storage systems primarily focused on the improvement of performance, security, and reliability. Energy conservation issue is neglected because it is taken for granted. With the growing energy crisis, now it has come in to attention of many researchers and industrialists. Many pioneer research groups started addressing the energy crisis issue but lot has to be done before this problem is totally rectified. Our research is motivated by the current need of energy efficiency for data centers. 1.1.3 Reliability of the Disk Systems Reliability is a major concern in today?s world because of the importance of data that is stored on the disks. Enormous amount of data is stored on disks on a daily basis and all this data need to be saved for later use or for the next generations. It is not possible to keep multiple copies of this data to improve reliability because of the limited resources. Data saved on these disks consume huge amount of energy for processing. To address this issue a number of energy efficient techniques are proposed. Most of these techniques aggressively save energy without considering their effects on reliability of the disks. Moreover, once a disk fails it needs to be recovered and it can only be done if there exists a copy. To recover the data again a significant amount of energy is spent, so a part of energy saved by using the energy efficient algorithms will again be spent for its recovery. Hence there is a need for techniques that not only consider energy efficiency but also maintains a good reliability. Our motivation to study the reliability issues in this research comes from the above defined problems. 1.1.4 Security Issues of Real Time Systems Real time systems are very sensitive with respect to security. Security of these systems cannot be compromised at any cost. Once the security of the real time system such as online banking is compromised it is very hard to gain the trust of the consumer again. It puts a great deal of loss on the commercial or organizational benefits. When applications like military air craft control are considered, compromising the security of these applications puts the human life at stake. Hence improving the security of these systems not only benefits the world financially but only improves the trust worthiness of the applications, thereby providing peace of mind. Security of the real time applications can be attained by applying the wide variety of security mechanisms available today. A system can be made highly secure if all the existing security mechanisms can be applied to it, as it is not practically feasible the security if often achieved and improved by carefully selecting the important security mechanisms. This motivated the second part of our research to study the security mechanisms of real time systems. 1.1.5 Reliability Concerns of Real Time Systems Reliability of a real time system needs very careful attention. A real time application such as aircraft control system cannot tolerate any faults when it is detonated in field. Because of the critical nature of the application itself these systems need to be one hundred percent fool proof. When an application such as online banking is considered, and if a fault occurs during a high level transaction then the total banking operations are left in jeopardy. The two examples discussed here outline the importance of reliable operations in real time systems. Our motivation to study the reliability of real time systems and to combine it with the security of real time systems comes from the above mentioned examples. 1.2 Scope of Research Our research is focused on two goals. First one is to design energy-efficient and reliable techniques for large scale data centers. Existing energy conservation techniques such as operating the disks in different power modes are used in our research in an innovative way to guarantee the reliability. To reduce the impact of disk transitions on energy efficiency and reliability we also focused on writing an algorithm to control the power state transitions. Finally a very energy efficient and reliable approach was proposed in which popular data was skewed on to few disks to save energy. Safe utilization zone is a terminology we have come up with to define the utilization range in which the disk reliability is higher. By operating the disks in this zone high reliability is guaranteed. For our second design goal we focused on the real time embedded systems issues such as security and fault tolerance. We addressed the issue of security by implementing confidentiality and integrity services in to the system and achieved fault tolerance by using uniform and non uniform check point strategies. 1.3 Contributions The major contributions of this research are summarized as follows: For large scale storage systems arena: (Design goals ? Energy Efficiency and Reliability) We developed a mathematical reliability model to estimate disk failure rate as a function of disk utilization and ages. We derived that high disk reliability is achieved by operating disks only in the safe utilization zones. We provided a dynamic power management policy that aims to reduce energy dissipation in parallel I/O systems with mirroring disks. We proposed an algorithm to control the power state transitions which negatively affect reliability and energy efficiency if not controlled. We design an utilization based reliability aware energy efficient algorithm to achieve high thresholds of energy efficiency and reliability We conduct extensive experiments for large scale storage systems. These experimental results could be used by other researchers in their related research area, if applicable. For real time and embedded systems: (Design goals ? Security and Reliability) We presented the overhead models for the security services and for the fault recovery model using uniform and non uniform check points. We proposed two methods to combine the security with reliability in a real time task. Next, we integrated both the fault recovery mechanisms with adaptive quality of security 1.4 Dissertation Organization This dissertation is organized as follows. In Chapter 2, related work in the literature is briefly reviewed. In Chapter 3, we propose the energy efficiency and reliability of storage systems design. We proposed an algorithm to incorporate reliability and energy efficiency in to storage systems. To control the power state transitions of the algorithm proposed in Chapter 3, we develop a control algorithm for power state transitions in Chapter 4. In Chapter 5, we study the utilization based reliability aware energy efficient techniques. Chapter 6 details the proposed approaches to integrate security and reliability in real time embedded systems. Chapter 7 explains the interplay of fault recovery and quality of security, by providing two different strategies for security implementation and ways to integrate these strategies with fault tolerant approaches. Finally, Chapter 8 summarizes the main contributions of this dissertation and comments on future directions for this research. Chapter 2 Literature Review In this chapter, we briefly summarize the previous literatures which are most relevant to our research in terms of energy-efficiency and reliability for large scale storage systems. Next we have summarized the work done previously on security and reliability of real time embedded systems. Section 2.1 will introduce related work on energy-efficient storage systems followed by the reliability work on storage systems in section 2.2. Section 2.3 and 2.4 illustrate the work in the area of security and reliability of real time embedded systems respectively. 2.1 Related Work on Energy-Efficiency of Disks Extensive research has been carried out in developing energy efficient storage systems. Dynamic voltage scaling [14][33][47], dynamic power management[71], compiler directed energy optimizations [65][68] are some of the state of the art energy conservation techniques. Du el al. studied the dynamic voltage scaling technique with a real-time garbage collection mechanism to reduce the energy dissipation of flash memory storage systems [85]. A dynamic spin down technique for mobile computing was proposed by Helmbold et al. [15]. A mathematical model for each Dynamic Voltage Scaling - enabled system is built and their potential in energy reduction is analyzed by Lin Yuan and Gang Qu [47]. Carrera et al. [21] proposed four approaches to conserving disk energy in high-performance network servers and concluded that the fourth approach, which uses multiple disk speeds, is the one that can actually provide energy savings. An energy saving policy named eRAID [13] for conventional disk based RAID-1 systems using redundancy is given by Li et al. Energy efficient disk layouts for RAID-1 systems [12] have been proposed by Lu et al. Yue et al. investigated the memory energy efficiency of high-end data servers used for supercomputers [34]. Son et al. proposed and evaluated a compiler-driven approach to reduce disk power consumption of array-based scientific applications executing on parallel architectures [65][68][33]. Dempsey, a disk simulation environment that includes accurate modeling of disk power consumption is presented by Zedlewski et al [35]. They also demonstrated that disk power consumption can be simulated both efficiently and accurately. Optimal power management policies for a laptop hard disk are obtained with a system model that can handle non-exponential inter-arrival times in the idle and the sleep states [71]. Gurumurthi et al. [58] provided a new approach called DRPM to modulate disk speed (RPM) dynamically, which gives a practical implementation to exploit this mechanism. They showed that DRPM can provide significant energy savings without heavily compromising performance. Rosti et al. presented a formal model of the behavior of CPU and I/O interactions in scientific applications, from which they derived various formulas that characterize application performance [20]. D. Colarelli and D. Grunwald presented an architecture called ?Massive Arrays of Idle Disks? or MAID [11]. In their work they did not consider the reliability issue again. Another framework similar to MAID, called Popular Data Concentration [18], was proposed by E. Pinheiro and R. Bianchini. The basic idea of PDC is to migrate data across disks according to frequency of access, or popularity. The goal is to lay data out in such a way that popular and unpopular data are stored on different disks. This layout leaves the disks that store unpopular data mostly idle, so that they can be transitioned to a low-power mode. However, PDC is a static offline algorithm. In some cases, it is impossible for the system to exactly know which data is popular and which is not. This is especially true for the ever-changing workload, in which some data is popular at a particular period but becomes unpopular the next period. In contrast with both MAID and PDC, we implemented a utilization based reliability aware energy efficient algorithm to control energy consumption and to increase the reliability. 2.2 Related Work on Reliability of Disks Schroeder and Gibson presented and analyzed the field-gathered disk replacement data from five systems in production use at three organizations [5]. They found evidence that failure rate is not constant with age, and that there was a significant infant mortality effect. The infant failures had a significant early onset of wear-out degradation. Significant levels of correlation between failures, including autocorrelation and long-range dependence, were also found. Pinheiro et al [19] presented failure statistics and analyzed the correlation between failures and several parameters generally believed to impact longevity [19]. Four causes of variability and an explanation on how each is responsible for a possible gap between expected and measured drive reliability, are elaborated by Elerath and Shah [32]. All of the previously mentioned work either concentrated on the power conservation or on the disk reliability. Not many researchers address both energy efficiency and reliability. It is very important for a data disk to be very reliable, while consuming less power. The importance of energy efficiency and reliability, and the lack of research of their relationship, motivates the research conducted in this dissertation. 2.3 Related Work on Reliability of Real Time Embedded Systems It is known that, much attention has been paid towards fault-tolerant real-time scheduling. Most of conventional real time scheduling algorithms used timeline or backup approaches for providing fault recovery [63][1]. Such approaches with low overhead are adequate for real-time embedded systems [26][45][4][50] when storage is not a primary concern. During transient faults, re-executions of tasks are usually performed to recover faults [24][84]. Sultan et al. developed management algorithms that are efficient in bounding checkpoints and logs [24]. A protocol that supports both security and reliability aspects in database systems are presented in [44][4] A variety of fault-tolerant [57][39][24][63] activities are incorporated in both static and dynamic scheduling without impairing the feasibility of pre-guaranteed tasks and minimizing the number of reliability activities [16]. In addition, a method, slot shifting was used to integrate static and dynamic scheduling. Uniform and non-uniform check-pointing policies were developed to recover from failures and reduce power consumption in real-time embedded systems [57][55][88]. We leverage a similar approach to support fault recovery. However, our work is fundamentally different from the previous studies in the sense that, ours addresses the issue of improving security [72][16] without adversely affecting reliability, whereas, the previous studies were focused on improving reliability of real-time embedded systems. 2.4 Related Work on Security of Real Time Embedded Systems Song et al. developed security driven scheduling algorithms for Grids [57][66]. Tao et al. designed and implemented security-aware real-time scheduling algorithms, which make use of three types of security services to guard real-time embedded systems from potential threats [72]. They investigated the integration of security services to protect against a diversity of threats and attacks in cluster computing environment. A very interesting security application where life and death relevance have become the security challenges was addressed [84]. Bertram et al. proposed a set of algorithms for security-constrained optimal power flow (SCOPF) and their configuration in an integrated package for real-time security enhancements [6]. An introduction of non-convex SCOPF methods for modeling utility operating policy is also given. This study relies on a security overhead model that is presented in [72], which depicts the overhead model for an array of security services. Protocols for replication and voting in a family of applications are investigated in [86][27]. Myers et al proposed a method of building a trustworthy distributed system by the process of Construction. The literature proves that very limited or no research has been done in integrating the fault tolerance and security in real-time systems. To bridge this gap we conducted research to integrate security and reliability in real time embedded systems. 2.5 Summary There are two objectives in this dissertation research. The first objective is to develop energy-aware and reliable storage disk systems. The second one is to integrate security and reliability mechanisms in real-time embedded systems.. This chapter outlines a variety of existing techniques related to (1) energy efficiency of storage disks and their reliability issues and (2) reliability and security issues in real-time and embedded systems. Chapter 3 Energy Efficiency and Reliability of Storage Disks Numerous energy saving techniques have been developed to aggressively reduce energy dissipation in parallel disks. However, many existing energy conservation schemes have substantial adverse impacts on the reliability of disks. To remedy this deficiency, we address the problem of making tradeoffs between energy efficiency and reliability in parallel disk systems with data mirroring. Among several factors affecting disk reliability, the most two important factors ? disk utilization and ages ? are the focus of this study. In this chapter, we build a mathematical reliability model to quantify the impacts of disk age and utilization on failure probabilities of mirrored disk systems. In light of the reliability model, we propose a novel concept of safe utilization zone, within which energy dissipation in disks can be reduced without degrading reliability. We developed an approach to improving both reliability and energy efficiency of disk systems through disk mirroring and utilization control that enforces disk drives to be operated in safe utilization zones. This is the first utilization-based control scheme that seamlessly integrates reliability with energy saving techniques in the context of fault-tolerant systems. Experimental results show that our approach can significantly improve reliability while achieving high energy efficiency for disk systems under a wide range of workload situations. This chapter is organized as follows. In section 3.1, we introduced the system model that we considered in this research. In section 3.2, we presented the disk failure model. Next, we provided the reliability aware energy conservation model in section 3.3 along with the Reliability Aware and Energy Efficient algorithm (RAREE). Analysing the reliability of the mirrored disk with our approach is presented in section 3.4. Experimental environment and simulation results are shown in section 3.5. Finally, section 3.6 concludes this chapter by summarizing the main contributions of the chapter. 3.1 System Model Energy consumption can be reduced by dynamically operating parallel disks at three power states ? active, idle and sleep. Mirroring disks (a.k.a., RAID1) ? uses a minimum of two disks; one primary and one backup ? are widely adopted to provide fault tolerance. Therefore, in this research we devote attention to parallel I/O systems with disk mirroring. Traditional energy conservation schemes have concentrated on reducing energy dissipation by waking up backup disks only when the utilization of primary disks exceeds a certain threshold. However, these techniques do not address the issues of reliability. Load balancing techniques keeps both primary and backup disks equally busy to optimize I/O performance. Unfortunately, existing load balancing methods are not energy efficient. To remedy the above deficiencies, we aim to develop a reliability-aware power management policy, in which disks were operated at different power modes in accordance with disk utilization, which can lead to reduced disk failure rates.  Figure 3.1. System model of parallel I/O systems with mirroring disks. Fig. 3.1 depicts the system model of parallel I/O systems with mirroring disks. The processor in Fig. 3.1 generates read requests to be processed by the mirroring disks. These requests are queued up; the utilization levels are calculated accordingly. The utilization levels are checked against the limits of the safe utilization zone, which is specified in light of our novel reliability model. If the calculated utilization level falls below the safe utilization zone, then both the disks stay in the sleep state. If the utilization is within the limits of the safe zone, then the primary disk stays active whereas the backup disk sleeps. When the utilization exceeds the safe utilization zone, then both disks are kept in the active state to achieve high performance through load sharing. Empirical results show our approach not only can improve reliability of RAID 1, but also saves a significant amount of energy. 3.2 Disk Failure Model When one disk in a disk I/O system fails, there is a strong likelihood that failure occurs in other disks of the system. To quantitatively investigate impacts of energy conservation techniques on reliabilities of disk systems, we focused on effects of disk utilization and age on the reliability of disks. Utilization is of paramount importance for disk reliability; because it is a critical bridge between energy saving and reliability in disk systems. More specifically, high disk utilizations generated by high workload conditions give rise to large disk energy consumption. Meanwhile, different disk utilizations lead to different levels of disk annual failure rates, depending on disk ages. Since the utilization of a disk system is a centerpiece in the development of the disk failure model, we propose a disk utilization model before proceeding to develop the disk failure model. The salient feature of our utilization model is that it captures the characteristics and behavior of data-intensive tasks issuing disk request to a disk system. A second unique aspect of the utilization model is that it can be readily applied to storage systems at the application level. Our utilization model is general in the sense that the model can deal with mixed workloads where there are a set of data-intensive tasks with a variety of disk I/O requirements. Let Z+ be the set of positive integers. Without loss of generality, we consider a workload condition where there are m( Z+ disk I/O phases. The utilization Ui of the ith (1( i ( m) I/O phase is a constant that can be straightforwardly derived from the disk I/O requirements of data-intensive tasks running within the ith I/O phase. Let (i be the number of data-intensive tasks running in the ith phase. Let, , denote the arrival rate of disk request submitted by the jth data-intensive task to the disk system. Let, be the average data size of disk requests of the jth task. The disk I/O requirement of the jth task in the ith phase is a product of the task?s request arrival rate and the average data size of disk requests issued by the task, i.e., . The accumulative disk I/O requirements Ri, measured in terms of MByte/Sec., of all the tasks running in the ith phase can be written as: . (3.1) Note that Ri in Eq. (3.1) can be environed as accumulative data amount access per time unit. The utilization of a disk system within a given I/O phase equals to the ratio of the accumulative disk requirement Ri and the bandwidth of the disk system. Thus, the utilization (i of the ith I/O phase can be expressed as , (3.2) where Bdisk is the bandwidth of the disk system. The utilization U of the disk system during the m I/O phases is the weighted sum of the disk utilization of all the m phases. Thus, the utilization U is expressed by Eq. (3.3) as follows: . (3.3) Given I/O requirements of data-intensive tasks issuing disk request to a disk system, one can leverage the above model to quantify utilization of the disk system. The notation for the utilization model is summarizes in Table 3.1. Table 3.1 List of parameters for the disk utilization model. m total number of I/O phasesi the ith I/O phase (i total number of I/O-intensive tasks in phase ij the jth task (ij arrival rate of disk requests of task i in phase j sij data size of disk requests of task i in phase jRi accumulative disk requirement of phase iUi utilization of phase iU disk utilization It is worth noting that the reliability of the disk system can be measured in terms of disk annual failure rates. Using previously published data [19], we build a disk failure model in the format of annual failure rate percentiles as functions of both disk ages and disk utilization levels. We obtained annual failure rate percentiles for the low (i.e., 25%), medium (i.e., 50%), and high utilization (i.e., 90%) levels for disks of ages from 3 month to 5 years. These values are used to calculate the annual failure rates percentiles for the other utilization levels of the disks aged from 3 month to 5 years. To determine the failure rate for a given utilization rate, we took the points from the Google study [19] and used the cubic spline interpolation method to approximate annual failure rate of a disk with certain utilization and age. The disk failure model can be modeled as an n+1 dimensional vector , where (i, 0 ( i ( n, is the vector  that captures the correlations between utilization (i and disk failure rate fi. To develop the disk failure rate model, we have to determine n+1dimensional vector . Thus, given the value of utilization xi, one can make use of the failure rate model to calculate the failure rate fi in component (i of vector. To achieve this goal, we adopted the cubic spline interpolation to generate failure rates n+1dimensional vector such that it results in a smooth curve for a given disk age. We plot the annual failure rate percentiles for a disk of age 6 months (see Fig. 3.2.), 2 years (see Fig. 3.3.) and 3 years (see Fig. 3.4.) and as functions of disk utilizations. The failure rate plotted in Figure 3.2 can be modeled using the following function. Similar functions are used to generate the annual failure rate graphs for 1 year and 3 year old disks, depicted in Figs. 3.3 and 3.4.   Figure 3.2. Annual failure rate for 6-month old disks with respect to utilization.  Figure 3.3. Annual failure rate for 1-year old disks with respect to utilization.  Figure 3.4. Annual failure rate for 3-year old disks with respect to utilization. Interestingly, results shown in Figs. 3.2, 3.3 and 3.4 contradict the findings of the previous studies that indicate that lower utilization levels produce lower failure rates whereas higher utilization levels correspond to higher failure rates. The trend of a disk annual failure rate as the disk utilization grows is different from that reported in the literature. Specifically, our disk failure rate model built from real-world data of disk failures suggests that the probability that a disk fails under either very low or very high utilization levels is very high. In contrast, when the utilization levels stay within a certain range (e.g., between 20% and 60% when the disk age is 6 months old, see Fig. 3.2), the probability of failures is far smaller than a specified threshold (e.g., smaller than 2%). We term this range of utilization levels as the safe utilization zone, which largely depends on disk ages. Given a disk system, its safe utilization zone is a function of the disk?s age and the specified low failure rate threshold. The definition of safe utilization zone is formally given below. Definition: Given a disk system, we denote the disk?s utilization, age and the threshold for low failure rates as (, ( and (, respectively. Recall that the failure rate of the disk is a function of utilization ( and age (, i.e., f((, (). A utilization range [(min, (max] is called a safe utilization zone with respect to disk age ( and failure rate threshold ( if the following condition is held:  (3.4) The above condition indicates that the disk failure rate of utilization levels within a safe utilization zone is always smaller than or equal to the low failure rate threshold. We formally describe this property as below: Property: Given a disk with age ( and low failure rate threshold, (, one can ensure that a utilization level within its safe utilization zone results in a failure rate lower than the failure rate threshold, (. With the concept of safe utilization zone in place, it is efficient to dynamically control disk utilization in a way to keep a disk?s failure rate lower than a specified threshold. The above statements are true for the results depicted in Figs. 3.2 through 3.4. 3.3 Reliability-Aware Energy Conservation Model RAID 1 is popular and is widely used for disk drives. RAID 1 is implemented with a minimum of two disks, which are the primary and back disks. Initially the data is stored to the primary disk and then it is mirrored to the backup disk. This mirroring helps to recover the data when there is a failure in the primary disk. It also helps to increase the performance of the RAID 1 system by sharing the workload between the disks. We considered RAID 1 for all of our experiments. The processor in the system generates the I/O stream, which is queued to the buffer. The utilization of the disk is calculated using the request arrival rate. Please refer to Section 3.2 for details of the description for the disk utilization model. It should be noted that all requests here are considered as read requests. At any given point of time the disks can be in the following three states. State 1: Both the disks in sleep mode State 2: Primary disk in active and backup disk in sleep mode State 3: Both the disks in active and share the load. Let us consider that the disks are in state 1 at the beginning. Once the utilization is calculated, it is compared with the safe utilization zone range. If the calculated value falls below the range then disks stay in state 1. If the calculated value is within the range, then the primary disk is made active while the backup disk continues to stay in the sleep mode. This represents a transition to state 2. If the calculated value is beyond the range then both the disks are made active and both of them share the load, which corresponds to state 3. Transition of states from one power mode to another involves disk spin up and/or spin down. The disk spin ups and spin downs also consume a lot of energy. The state transition diagram (see Fig. 3.5) gives a detailed explanation of the state transitions at different utilizations and idle times. It can be observed from the diagram that there are 5 possible state transitions:  Figure 3.5 State transition diagram. State 1 to state 2:- If the calculated utilization falls within the safe utilization zone then the system transitions from state 1 to state 2. State 2 to state 3:- When the utilization exceeds the safe zone, the system transitions from state 2 to state 3. State 3 to state 1:- When the utilization becomes zero or when the utilization falls below the safe zone or when the idle time is higher than the break even time the system transitions from state 3 to 1. State 2 to state1:- When the utilization becomes zero or when the utilization falls below the safe zone or when the idle time is higher than the break even time the system transitions from state 2 to 1. State 3 to state 2:- When the utilization falls back into the safe zone range or when the idle time is higher than the break even time. Input: M: Number of processors in the system ?: Poisson arrival rate of requests generated by each of the M processors ?: Time interval between arrivals follows exponential distribution Su: Spin up power Sd: Spin down power Pa: Active power Pi: Idle power Pa: Sleep power ?: Break even time (Spin up + spin down power) Q: Buffer 1. Insert ? into Q buffer 2. Calculate the utilization ? of the disk based on the number of requests in the buffer 3. Calculate the safe utilization zones (min, max) for the disks of different age groups 4. Compare the ? value against the min and max values 5. If the system is in state 1 then if ? min then system transits from state 1 to state 2 6. If the system is in state 2 then if ? min && ? max then system transits from state 2 to state 3 7. If the system is in state 3 then if ? min && ? max then no state transition occursFigure 3.6 Reliability aware energy efficient (RAREE) algorithm for disk drives Table 3.2: Main characteristics of two SCSI disks and an IDE laptop disk ParameterIBM 36Z15 Ultrastar(high perf)IBM 73LZX Ultrastar(low perf)IBM 40GNX Travestar (laptop)Standard interface Capacity Number of platters Rotations per minute Disk controller cache Average seek time Average rotation time Internal transfer rate Power(active) Power(idle) Power(standby) Energy(spin down) Time(spin down) Energy(spin up) Time(spin up)SCSI 18 GBytes 4 15000 4 Mbytes 3.4 msec 2 msec 55 MB/sec 13.5 W 10.2 W 2.5 W 13.0 J 1.5 sec 135.0 J 10.9 secSCSI 18 GBytes 2 10000 4 Mbytes 4.9 msec 3 msec 53 MB/sec 9.5 W 6.0 W 1.4 W 10.0 J 1.7 sec 97.9 J 10.1 secIDE 20 GBytes 2 5400 8 Mbytes 12 msec 5.5 msec 25 MB/sec 3.0 W 0.82 W 0.25 W 0.4 J 0.5 sec 8.7 J 3.5 sec3.4 Analysis In this section, we analyze the reliability of a mirrored disk system with our approach. In what follows, we term our approach as RAREE (Reliability aware energy efficient approach). We aim to derive failure rate p((, () of a parallel I/O system with disk mirroring. The failure rate largely depends on the age ( and utilization ( of a pair of primary and backup disks. Let ( = ((P, (B) represent the ages of the primary and backup disks. Note that subscripts P and B represent primary and backup disks, respectively. We denote pP((P) and pP((B) as the failure rate of the primary and backup disks. The pair of disks is considered failed if both disks have failed. Given that disk failures are independent, we compute reliability r(() of the parallel I/O system as . (3.5) Let qP(() denote the probability that the utilization of the primary disk is (; let qB(() be the probability that the utilization of the backup disk is (. f((, () represents the failure rate of an (-year old disk with utilization (. Failure rate pP((P) in Eq. (3.5) can be expressed as  (3.6) Similarly, failure rate pB((B) in Eq. (3.5) is given by  (3.7) Now we analyse energy efficiency of our approach. Let PP,A and PB,A denote the power of the primary and backup disks when they are in the active state. Given a list R = (R1, R2, ?,Rn) disk requests, we can compute energy EA consumed by serving all the requests as  (3.8) where element xi,P is ?1? if request i is responded by the primary disk and is ?0?, otherwise. tj,P and tj,B are service times. We obtain the analytical formula for the energy consumed when disks are in the sleep state:  (3.9) where PP,A and PB,A are the power of the primary and backup disks when they are is in the sleep state. TP,S and TB,S are the time intervals when the disks are in the sleep state. Let fi be the completion time of request i. Then, TP,S and TB,S can be derived from I/O processing times and completion time of the last request served by each disk. Thus, we have  (3.10)  (3.11) The total energy consumption E of the parallel I/O system with disk mirroring can be derived from Eqs. (3.8) and (3.9) as  (3.12) 3.5 Performance Evaluation We conducted extensive experiments on three types of IBM disks: Ultrastar 36Z15, Ultrastar 73LZX, and Travelstar 40GNX. We compared our approach called RAREE with the load balancing scheme and the traditional dynamic power management scheme. Read requests are generated based on Poisson process. The experimental results plotted in Figs. 3.7-3.9 show the energy consumed by parallel I/O systems with the three different types of IBM disks. Fig. 3.7 clearly shows that when it comes to the IBM 40GNX disk, our approach significantly improves energy efficiency over the traditional energy saving and load balancing scheme by up to 22.4% and 31.3%.  Figure 3.7 Energy dissipation (Joules). IBM 40GNX.  Figure 3.8 Energy dissipation (Joules). IBM 73LZX.  Figure 3.9 Energy dissipation (Joules). IBM 36Z15. Interestingly, we observe from Figs. 3.8 and 3.9 that in case of the IBM 36Z15 and IBM 73LZX disks, RAREE energy efficiency is in between those of the load balancing and traditional DPM techniques. The result can be partially explained by the low disk spin up and spin down power of IBM 40GNX. This result indicates that disk spin down and disk spin up power can play a vital role in energy dissipation of parallel I/O systems. An intriguing conclusion drawn from this set of experiments is that our scheme is very energy-efficient for mobile disks as opposed to high-performance disks. Figs. 3.10 and 3.11 show the impact of spin-down energy on the energy dissipation of the parallel disk system when the disks are 6-month and 1-year old, respectively. Though the energy efficiency of our approach is slightly lower than that of the traditional DPM technique, ours substantially improves the reliability of parallel disk systems over the DPM and load-balancing schemes.  Figure 3.10 Spin-Down Energy (Joules) vs. Energy Dissipation (Joules). Figs. 3.12 and 3.13 show the effect of spin-up energy on the energy dissipation in the parallel disk system with 6-month-old and 1-year-old ultrastar disks. In all the cases, our strategy provides noticeable energy savings compared with the other two schemes. In addition, Figs. 3.12 and 3.13 illustrate that for the three strategies, energy consumption slightly increases with the increasing value of spin-up energy. RAREE is more sensitive to spin-up energy than the alternatives do.  Figure 3.11. Spin-Down Energy (Joules) vs. Energy Dissipation (Joules).  Figure 3.12. Spin-Up Energy (Joules) vs. Energy Dissipation (Joules).  Figure 3.13. Spin-Up Energy (Joules) vs. Energy Dissipation (Joules).  Figure 3.14. Active Power (Watts) vs. Energy Dissipation (Joules).  Figure 3.15. Active Power (Watts) vs. Energy Dissipation (Joules).  Figure 3.16. Idle Power (Watts) vs. Energy Dissipation (Joules).  Figure 3.17. Idle Power (Watts) vs. Energy Dissipation (Joules). Disk Age = 1 Year. Figs. 3.14 and 3.15 show the impact of active power on energy consumed by parallel disk systems. Figs. 3.14 and 3.15 reveal that regardless of disk age, energy savings provided by RAREE become more pronounced when one reduces active power. Figs. 3.16 and 3.17 shows that when idle power is larger than 6, the energy efficiency of RAREE is no longer sensitive to idle power. Fig. 3.18 shows that although the average response time of our approach is slightly longer that than those of the other two in some cases, the performance degradation is usually less than 2 milliseconds. We believe that it is worth to trade marginal performance degradation for high reliability (see Table 1) and energy efficiency.  Figure 3.18 Arrival rate vs. response time Table 3.3: Failure rate (%) of the parallel disk systems. Age (Year)Traditional/ Load BalancingRAREEImprovement (%)0.259.8270.01799.80.547.0805.50288.3152.72934.64634.3224.95717.20531.130.9230.42554.048.2633.99651.6 5 24.414 13.00346.7 Table 3.3 shows failure rates of parallel I/O systems with disk mirroring. Results summarized in Table 3.3 illustrate that our strategy significantly improves the reliability of parallel I/O systems over the two existing schemes. For example, RAREE can reduce failure rate by up to 99.8% with an average of 58.0%. 3.6 Summary Although an array of energy conservation techniques have been proposed for disk systems, most energy saving schemes are implemented at the cost of disk reliability. In order to solve this problem, we first built a model to quantify the failure probabilities of disk systems as a function of disk age and utilization. In particular, we focused on parallel I/O systems with mirroring disks, where data sets are mirrored to backup disks from primary disks. Traditional power management methods wake up the backup disks when the utilization of the primary disks exceeds a certain threshold. Load balancing techniques, on the other hand, keep both primary and backup disks always active to balance the load between the two disks to achieve high performance. However, the load balancing methods consume a massive amount of energy for large-scale parallel I/O systems. Hence, we aimed at developing a utilization-based control mechanism to reduce energy dissipation in parallel disk systems without degrading system reliability. This goal was achieved by enforcing parallel disks to be operated in safe utilization zones, within which disk failure probability is minimized. This is the first utilization-based control scheme that seamlessly integrates reliability with energy saving techniques in the context of parallel I/O systems. Experimental results show that our approach can significantly improve reliability while achieving high energy efficiency for parallel I/O systems with disk mirroring. Chapter 4 Effects of Power State Transitions on Reliability of Energy Efficient Storage Systems In the previous chapter, we have designed a reliable and energy-efficient algorithm for large scale storage systems. The algorithm switches the disks between different power modes to save the energy and operates in safe utilization to attain reliability. However, switching the disk system from one state to another state to save energy involves an overhead caused by spin up and spin down. Therefore analysis of power management mechanism described in chapter 3 has to take into account the overheads involved in disk spin up and/or spin down during transition of states from one power mode to another. The overhead involved affects both reliability and the energy efficiency of the disks. In this chapter, we present an analytical model to quantify energy efficiency, reliability and the effect of power state transitions on both. This chapter is organized as follows. Section 4.1 presents the motivation of this study. In section 4.2, we defined the system model used in this study and it also presents the energy efficiency and disk reliability explanation. Next, in section 4.3, we discuss the control algorithm for power state transitions in detail. In section 4.4, we present the performance evaluation of our approach, with a detailed explanation about the experimental set up and experimental results in section 4.2.1 and 4.2.2 respectively. Finally, section 4.5 summarizes the entire chapter. 4.1 Motivation The ever increasing number of large scale complex applications call for high performance data centres. This translates directly to an increasing volume of spinning disks creating a huge and growing energy crisis. In order to provide reliable storage services, often, disk-array based storage systems (such as RAID) are used. However these systems have been designed in a manner to gain in performance without considering the energy constraints. Current worldwide data growth is estimated at a compound annual growth rate of 50.6% through the decade [7]. Several studies have indicated that data centres are headed towards a serious energy crisis. The energy crisis problem has a rippling effect as additional power is required. For instance 22% of the total energy consumption is due to cooling systems used to keep temperatures of data centres from rising high degrees. The solution to power consumption problem has been addressed at the hardware level to a certain extent. But the problem remains largely open in many deployments of large scale storage systems. Energy consumption of storage systems in data centres is becoming a predominant concern in the IT industry. On the other hand RAID based solutions have been used to dramatically reduce the down-time due to hardware problems, minimizing data loss due to drive failures. RAID based solutions have been widely used in the storage systems industry over the last ten years because of the affordable price of energy. However, with an increasing shortage of power resources this is not true anymore. RAID implementations provide a degree of fault tolerance depending on the specific RAID configuration used. However reliability and fault tolerance being two sides of the same coin, the overall reliability of a RAID based storage system depends on the reliability of individual disks. Therefore we still see a large scope for studying ways to improve disk reliability. Nevertheless, energy is nowadays becoming more and more effective as a major expense in every businesses budget. Indeed, the extensive success and use of the RAID based technology has led the industrial and research communities to focus more on ways to lower the unaffordable amount of energy consumed. To find a solution to this problem we have developed an algorithm which saves energy by operating the disks in different power modes and attains reliability by operating the disks only in safe utilization zone. However, to operate disks in different power modes they will be switched back and forth which involves a significant amount of overhead with respect to disk spin ups and spin downs. Disk manufacturers also warn that the disks can hold only particular number of spin ups and spin downs, after that the disk reliability is going to getting effected. Hence there is a need to control the number of state transitions that occur in a disk system. In this chapter we proposed an algorithm to control the disk transitions thereby improving the disk reliability while achieving the desired energy efficiency. 4.2 System Model Disk age and utilization are studied as the primary factors that affect disk drive reliability. Based on failure statistics presented by Google [19] for disk systems, we studied the failure probabilities with respect to disk age and utilization. We then estimated safe utilization zones for disks of differing ages. Safe utilization zone is the range of utilization levels within which the probabilities of disk failures are minimal. We designed a policy where disks operate at different power modes based on the current utilization levels. As for reliability we first take a step towards quantifying the effects of disk utilization and disk age on the overall disk reliability. We present an empirical reliability metric called Disk Reliability Indicator (DRI) which incorporates the effects of utilization and disk age on disk drive reliability in terms of Annual Failure Rate (AFR). We then study the relative trade-offs between percentage energy savings and reliability in terms of AFR with every power state transition. Finally we propose a control mechanism to ensure an operation of disks in safe utilization zone in an energy efficient manner with an optimal tradeoff in terms of reliability. 4.2.1 Energy Efficiency Based on the state transition diagram in Fig 3.5, it is quite evident that maximum energy efficiency is achieved by operating disks in State 1. This however is not feasible for server side disk storage systems as the workloads on them often have idle time slots too small to justify the overhead caused by frequent spin up and spin down. At the same time energy is not saved by keeping the disk systems in State 3 as both the disks run in active mode all the time in this state. A key factor that determines the amount of energy consumed in such a system is the power state transition frequency f. It represents the number of state transitions that occur per day and is cumulative i.e. it includes transitions between all the states. Let fP and fB represent the transition frequency of the primary and backup disks. The values of fP and fB can be derived from the knowledge of state transition history of the entire RAID-1 system per day. It is to be noted here that both fP and fB comprise equal number of spin-ups and spin-downs. This is because we assume only 2-speed disks for our study which means the disks can only switch between two states. The total energy consumed is simply the sum of energies consumed by individual disks i.e.  (4.1) Let PP,A and PB,A denote the power of the primary and backup disks when they are in the active state and PP,S and PB,S the corresponding powers when they are in sleep mode. We have the following set of equations to calculate EP and EB.  (4.2)  (4.3) where, PP,A and PB,A ( power consumed in active state by primary and backup disks respectively PP,S and PB,S( power consumed in sleep state by primary and backup disks respectively TP, A and TP, S (are the times spent by the primary disk in active and sleep state respectively TB, A and TB, S (are the times spent by the backup disk in active and sleep state respectively P spin-down and P spin-up ( are the spin down and spin up power consumptions by the disk and is same for both primary and backup disks. Given a list R = (R1, R2,?Rn) of disk requests, with a given pattern we can calculate values for utilization from which we can arrive at a sequence of state transitions and the amount of time spent in each state. Using equation (4.1) we can then calculate the amount of energy consumed for the total duration. It is to be noted that our analysis also considered the energy cost of power state transitions. 4.2.2 Disk Reliability The most challenging aspect of research in reliability of storage systems is to quantify the relationship between utilization (and therefore power state transitions) and reliability. A frequency-reliability function based on a combination of the spindle start/stop failure rate adder suggested by IDEMA [30] and the modified Coffin-Manson model was built in [76]. The rationale behind their work is that disk speed transitions and spindle start/stops essentially generate the same type of disk failure mode, spindle motor failure, though with different extents. Since power state transitions directly correspond to speed transitions of a disk we believe the results of [76] will be a good starting point for our studies. Each time a hard disk drive undergoes a power cycle or speed transition, damage is caused due to temperature change and accumulates with repeated cycles of temperature change. Such cycles induce a cyclical stress, which weakens materials and eventually makes the disk fail [21]. A well known mathematical model that evaluates the reliability effects of cycles of stress or frequency of change in temperatures is the Coffin-Manson model. Modified Coffin-Manson model based on quadratic curve fitting is used to obtain the following reliability-frequency function where R is the reliability in AFR and f is the disk power state transition frequency [76].  (4.4) Next, we analyze the reliability of a mirrored disk system with our approach. We aim to derive failure rate p((, () of a parallel I/O system with disk mirroring. The failure rate largely depends on the age ( and utilization ( of a pair of primary and backup disks. Let ( = ((P, (B) represent the ages of the primary and backup disks. Note that subscripts P and B represent primary and backup disks, respectively. We denote PP((P) and PB((B) as the failure rate of the primary and backup disks. Given that disk failures are independent, we compute reliability r(() of the parallel I/O system as  (4.5) R(fP) and R(fB) represent the failure rate of primary and backup disks (of given age) with given power state transition frequency.  Figure 4.1 CAPST Algorithm 4.3 Control Algorithm for Power State Transition (CAPST) Although we have established the fact that in order to achieve energy efficient and reliable operation of disks they have to be operated in safe utilization zone. In this section we propose a control mechanism called CAPST - Control Algorithm for Power State Transition that induces a certain degree of control over fluctuation of utilization values due to dynamic workloads. In order to control the fluctuations in utilization, we use a time-window based control mechanism. To determine the operational state of the storage system, CAPST first forecast the utilization levels in the next time-window based on history of observed values. At the beginning of each time-window, CAPST sets the utilization values based on the current and the historical value and then does a power state transition if the current values of utilization are consistent with historical data. Every time a new value for utilization is generated, CAPST moves the time-window by one margin adding the latest value to its historical data. The algorithm is summarized in Figure 4.1. 4.4 Performance Evaluation 4.4.1 Experimental Setup We developed an execution-driven simulator that models an RAID-1 array of 2-speed disks. We believe multi-speed disks have not been largely manufactured and deployed currently. Therefore, there are very few or zero reported results about impacts of utilization on disk reliability of multi-speed disks. Owing to the infancy of multi-speed disks, we derived corresponding 2-speed disk statistics from parameters of a IBM Ultrastar 36Z15 disk. For the sake of simplicity we considered array of 2 disks. However the results and trends hold good for a larger set of disks too. The disk parameters that are used in the experiments and their parameters are given in Table 4.1. Table 4.1: Disk Parameters of IBM36Z15 ParameterIBM 36Z15 UltrastarStandard interface Capacity Number of platters Rotations per minute Disk controller cache Average seek time Average rotation time Internal transfer rate Power(active) Power(idle) Power(standby) Energy(spin down) Time(spin down) Energy(spin up) Time(spin up)SCSI 18 GBytes 4 15000 4 Mbytes 3.4 msec 2 msec 55 MB/sec 13.5 W 10.2 W 2.5 W 13.0 J 1.5 sec 135.0 J 10.9 secThe experimental results are compared against two traditional state of the art methods. In the first method, ?load balancing? in which both the disks are always made active. Load balancing achieves very high performance because both the disks share the load. The second method, ?traditional method? is where the primary disk is made always active and the backup disk is kept in sleep mode. The backup disk is made active only when the utilization of the primary disk exceeds 100%, also known as saturation. 4.4.2 Experimental Results We first evaluated our approach and then compared against traditional state of the art methods. Our main goal in evaluating our approach was to study the variation in disk utilization over the entire duration of experiment, which triggered state transitions. This in turn was used to calculate the energy consumed in the process using equation (4.1). For different request arrival rates we obtained different power state transition frequency. We then used those values of frequency to study cost of energy savings in terms of reliability as given by equation (4.5).  Figure 4.2 Effect of arrival rate on Power State Transitions From figure 4.2 we observe that for lighter workloads the system makes many power state transitions. At this stage the system is operating in state 1 and state 2. As the arrival rate increases further the number of power state transition frequency becomes steady. We believe it is at this stage that the system operates most energy efficiently with optimal reliability. With further increase in the workload the fluctuations in the arrival pattern also increases causing more state transitions i.e. the transition frequency increases further and the systems starts switching more towards state 3. At very high workloads where the utilization levels are high the system operates at state 3 with both the disks serving the requests reducing the power transition frequency. At this stage energy consumed is high, and the disks operate outside the safe utilization as well. We make a note of the range of values of ? for which utilization is within the safe utilization zone. This gives us an idea of the rate at which the requests must be serviced in order for the disk system to operate in a safe utilization zone. Next, we ran the simulation and noted the fluctuation in the values of utilization with and without CAPST. Fig 4.3 depicts a plot of the recorded values over the length of the simulation. It is quite evident from fig 4.3 that the number of Power State Transitions is evened out, minimizing the number of state transitions there by reducing the reliability overhead of the transitions. It is to be noted that the utilization values with CAPST vary between 20% and 60% which is the safe utilization zone. We calculated the total energy consumed to be 21425.3 J and reliability to 99.3% calculated as per equations (4.1) and (4.5)  Figure 4.3 Power State Transitions with CAPST Based on the disk parameters mentioned in table (4.1), we next studied the variation in energy consumption and reliability. The results are depicted in Fig 4.4 and Fig 4.5 we notice that for the value of f < 3 both reliability increases and energy consumption decreases in a steady manner. This is in strong agreement with the results shown in Fig 3.2. Looking back we notice that values of f< 3 for workloads with arrival rate 14 and above i.e. when the disk system operated in safe utilization zone. Traditional methods wake up the backup disk when the utilization of the primary disk exceeds 100 percent. Load balancing technique keeps both the primary and back up disks always active to share the load. These methods consume a massive amount of energy, as the disks stay active even when there are no requests to serve for a long period of time. The reliability of these disks is also ignored most of the time.  Figure 4.4 Variation of Energy consumed with Power Transition Frequency  Figure 4.5 Variation of Reliability with Power Transition Frequency Following table 4.2 depicts the reliability improvement as compared to the other 2 methods. Table 4.2: Comparison of Reliability Age (Year)Traditional/ Load BalancingCAPST based Speed ControlImprovement (%)0.259.8270.02399.80.547.084.21491.0152.72923.75255.0224.95714.32842.630.9230.30267.348.2632.17273.7524.41410.20258.24.5 Summary In this chapter, we designed and implemented a utilization control mechanism called CAPST to improve both reliability and energy efficiency of disk systems, after proposing a novel concept of safe utilization zone where energy of the disk can be conserved without degrading reliability,. The utilization control mechanism ensures that disk drives are operated in safe utilization zones to minimize the probability of disk failure. We integrate the energy consumption technique that operates the disks at different power modes with our proposed reliability approach. Simulation results show that our approach can significantly improve reliability while achieving high energy efficiency for disk systems. Chapter 5 Utilization based Reliable Energy Efficient Disks - UREED In the previous chapters, we have designed an energy efficient and reliable algorithm for storage disks. We also proposed a way to control the power state transitions to increase the energy efficiency while guaranteeing the reliability. In this chapter we further extend our work by designing the utilization based reliable energy efficient disks algorithm, called UREED. Skewing popular data to the buffer disks and controlling the utilization of the disks is the main idea implemented in this chapter, to achieve high levels of reliability along with maximum energy efficiency. This chapter is organized as follows. Section 5.1 presents the motivation of this study. In section 5.2, we define the models description of our file server. Next, in section 5.3, we discuss the UREED algorithm. In section 5.4, we present the mathematical models of file servers UREED, MAID and PDC. Section 5.5 details the performance evaluation along with the simulation description and simulation results. Finally, section 5.6 summarizes the entire chapter. 5.1 Motivation Large scale storage systems have been the centre of many researches due to the substantial growth of high performance data centres. Large scale complex applications running on these high performance data centres consume an enormous amount of energy. Current worldwide data growth is estimated at a compound annual growth rate of 50.6% through the decade [7]. Several studies have indicated that data centres are headed towards a serious energy crisis. The energy crisis problem has a rippling effect as additional power is required. For instance 22% of the total energy consumption is due to cooling systems used to keep temperatures of data centres from rising high degrees. The importance of energy conservation for large scale disks has been identified by many pioneer researchers and it has been studied thoroughly [18] but most of the published work in the arena of storage systems concentrates on the issue of energy conservation of the disks, where as equally important concern reliability is ignored in many studies. Making a disk operate with high energy efficiency is complimented only when it runs without causing the disk to fail. Most of the energy conservation techniques adversely affect the disk reliability. Once a disk fails the amount of energy spent in rebuilding the disk outweighs the amount of energy conserved by operating the disks in energy efficient mode. Hence there is a need for very reliable and highly energy efficient storage disks. The research explained in this chapter is motivated by the above reasons. Here we use the data skewing technique to keep very few disks active in the system to serve requests and thereby increase energy efficiency. By operating these skewed disks and data disks only in the safe utilization until unless it is necessary to operate those outside we also attained good reliability. 5.2 Model Description In this section we propose the utilization based reliable energy efficient disk model, referred as UREED. The model outlines the working conditions and the strategies followed to achieve the energy efficiency and reliability. Let the total number of disks in a disk array be D with U data disks and V buffer disks. D = U + V (5.1) Where U and V are positive integers and (1< U < D) and (1( V < D) and U = {1,2?p}; V = { 1, 2,?q) Assign n files f1, f2... fn among U data disks. Each file is partitioned in to I = {1,?,m} blocks where Ii is the set of indices corresponding to the files assigned to disk Ui. For simplicity of presentation, file partitioning is not considered in this work; thus each file must be assigned in its entirety to one disk. Similar assumptions were made in [48]. This does not restrict the generality of our model since if a file is portioned, each partition can be viewed as a standalone file. Poisson process with a mean access rate of , known a priori is used to model the disk accesses of each file [8]. We assume a fixed service time  for each file fi (Similar assumptions were made in [48]. Disk accesses on each file are directly linked to the popularity of the data block. It has been showed that many network server workloads have highly skewed file access frequencies [14][47][18] and that the web server file access frequency conforms to a Zipf distribution. We used Zipf?s law to predict the frequency of access or popularity ? of a file Initially all files are sorted in descending order [?I1, ?I2 ??In] based on their popularity in to list X of size n. All the sorted files are copied on to the data disks by randomly choosing the disks and allocating the next contiguous file from the sorted list. This allows the distribution of files on to disks more or less evenly. Now we have balanced file assignment on data disks. A list Y is generated to hold the popular data for buffer disks with size k (where 1min then system transits from state 1 to state 2 9. If the system is in state 2 then if ? min && ? max then system transits from state 2 to state 3 10. If the system is in state 3 then if ? min && ? max then no state transition occurs Number of Check points 1 NU  9 8 7 6 5 4 3 2 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 k=3 k=2 k=1 Uni Security Level 8 7 6 5 4 3 2 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Number of Check points Security Level Number of check points NU Uni 10 9 8 7 6 5 4 3 2 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 k=3 k=2 k=1 Uni Security Level Number of check points 10 9 8 7 6 5 4 3 2 1 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 Separation line Security Overhead Check point Recovery c4 c3 c2 c1 c4 c3 c2 c1 c4 c3 c2 c1 c4 c3 c2 c1 c4 c3 c2 c1 so c/n C+nr c