Energy Source Lifetime Optimization for a Digital System through Power Management by Manish Kulkarni A thesis submitted to the Graduate Faculty of Auburn University in partial fulfillment of the requirements for the Degree of Master of Science Auburn, Alabama Dec 13, 2010 Keywords: Low Power Architecture, Power Source Optimization, Li-ion Battery Simulations Copyright 2010 by Manish Kulkarni Approved by Vishwani Agrawal, Chair, James J. Danaher Professor, Electrical and Computer Engineering Adit Singh, James B. Davis Professor, Electrical and Computer Engineering Victor Nelson, Professor, Electrical and Computer Engineering Abstract This work analyzes a typical battery powered digital electronic system and we propose a system level voltage scaling method and a functional power management method called instruction slowdown for low power. In the first part, we examine a circuit with voltage scaling capability and observe its impact on the energy efficiency of the battery. We study the system with a power source under throughput constraints and we propose a method to find a right size of battery to satisfy given system requirements. For systems with limit on battery weight or volume, we suggest a right circuit voltage operating point. We also notice that the performance evaluation metric such as battery discharge-delay or number of cycles per recharge are more relevant when power source optimization is a primary goal. In the later part of this work, an instruction named slowdown for low power (SLOP) is introduced. Functionally, it resembles the conventional NOP but requires power-specific hardware imple- mentation. Depending upon the power reduction requirement, adequate number of SLOP?s are automatically inserted in the instruction stream by the power management hardware. A possibility also exists to allow compiler or programmer to insert SLOPs in order to create programs which would have flexibility to run in either normal mode or in low power mode. While processing a SLOP, additional power control signals are generated for various units; so they can be powered down or clock gated. Simulation of a five-stage pipelined 32-bit MIPS processor shows that the SLOP method, termed instruction slowdown (ISD), becomes more effective than a conventional clock slowdown (CSD) when leakage is high. For 32nm CMOS technology, ISD can save more than 70% power compared to about 40% by CSD. The work shows that power reduction through a judicious choice of slowdown factor and the method adopted, clock slowdown for low leakage and instruction slowdown for high leakage, can enhance the battery lifetime. ii Acknowledgments My advisor and committee were the people most directly involved with the completion of my thesis. I would like to express my appreciation and sincere thanks to my advisor Dr. Vishwani Agrawal, who patiently shaped this work as it developed through a series of false starts and dead ends. I benefited greatly from his ability to approach problems from many different directions. His advise and attitude towards life would remain a guiding light for me throughout my career. I also wish to thank my advisory committee members, Dr. Adit Singh and Dr. Victor Nelson for their guidance and advice on this work. My workcould nothave been completed without a substantial support fromDr. Prathima Agrawal, for which I am grateful. I would also like to thank my advisor for providing me with an opportunity to work as a teaching assistant for CPU design projects in his Computer Architecture and Design class. This was one of the most fun and learning experiences during my master?s studies. A number of people at Auburn University, including Nitin, Kim, Sree, Wei, provided help during this work, for which I am thankful. Thanks are also expressed to integration team, especially Sumeeth and Raghu, at ARM, Bangalore, for a truly memorable first industry experience. My special thanks to Ellie and Glynn O?Steen who treated me as a family member, cared for me and whose loving support kept me going. I gratefully acknowledge financial support at Auburn University derived from a research grant received as a gift from Intel Corporation. Finally, I would like to thank my parents, siblings and my friends Anand, Aniket, Deepti, Ameya, Saba, Indraneil, Salil for their encouragement and support during this work. Thank you, all of you. Manish September 28, 2010 iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x List of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Theory and Background Work on Low Power Design . . . . . . . . . . . . . . . 5 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 Need for Low Power VLSI chips . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 Power Vs. Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Where Does All the Power Go? . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Dynamic Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Static Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.3 The conflict between Dynamic and Static Power . . . . . . . . . . . . 14 2.3 Low Power Design Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.1 Circuit Level Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3.2 Gate Level Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.3.3 Architecture or System Level Methods . . . . . . . . . . . . . . . . . 21 2.4 Power Source Optimization: A System Approach . . . . . . . . . . . . . . . 28 2.4.1 Choice of Metric . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4.2 Classification of Power Source Optimization Methods . . . . . . . . . 29 2.4.3 A Typical Battery Powered Electronic System . . . . . . . . . . . . . 31 3 Lithium-ion Battery Background and Modelling . . . . . . . . . . . . . . . . . . 33 iv 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.2 Electro-chemistry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 3.3 Description of Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.1 Capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.2 Rate Dependent Capacity . . . . . . . . . . . . . . . . . . . . . . . . 38 3.3.3 Temperature Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3.4 Capacity Fading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4 Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Physical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.2 Empirical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.3 Abstract Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.4.4 Analytical/Mixed Models . . . . . . . . . . . . . . . . . . . . . . . . 42 3.5 Model Used for This Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.5.1 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 3.5.2 Battery Lifetime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.5.3 Voltage and Current Characteristics . . . . . . . . . . . . . . . . . . . 46 3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4 DC to DC Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.1 Necessity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.2 Topologies of Switching Regulators . . . . . . . . . . . . . . . . . . . . . . . 49 4.2.1 Buck Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5 System Approach for Power Source Optimization . . . . . . . . . . . . . . . . . 54 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.2 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.3 Case I: System is performance bound . . . . . . . . . . . . . . . . . . . . . . 56 5.3.1 Step 1: Determine circuit characteristics . . . . . . . . . . . . . . . . 56 v 5.3.2 Step 2: Determine smallest battery size . . . . . . . . . . . . . . . . . 58 5.3.3 Step 3: Meeting the lifetime requirement . . . . . . . . . . . . . . . . 60 5.3.4 Step 4: Determine minimum energy modes . . . . . . . . . . . . . . . 61 5.4 Case II: Battery size or weight is a primary concern . . . . . . . . . . . . . . 63 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6 Instruction Slowdown Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 6.2 Background on Clock Slowdown (CSD) for Power Reduction . . . . . . . . . 67 6.3 Use of NOP for Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.4 Instruction Slowdown (ISD) . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.5 Hardware Implementation of SLOP . . . . . . . . . . . . . . . . . . . . . . . 72 6.6 Estimating Leakage Factor, k . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.7 Power Management for SLOP . . . . . . . . . . . . . . . . . . . . . . . . . . 74 6.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 vi List of Figures 2.1 Growth in energy densities of Lithium-ion batteries . . . . . . . . . . . . . . . . 6 2.2 Limit on the growth of battery energy densities . . . . . . . . . . . . . . . . . . 6 2.3 A CMOS Inverter circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.4 Short circuit currents of CMOS inverter during input transition . . . . . . . . . 11 2.5 Leakage Currents for nMOS transistor . . . . . . . . . . . . . . . . . . . . . . . 12 2.6 Design flow and type of tools at different levels of abstraction[22] . . . . . . . . 16 2.7 Two different implementations of a 4-input AND gate[22] . . . . . . . . . . . . . 18 2.8 Various Implementations of Signal Gating [20] . . . . . . . . . . . . . . . . . . . 20 2.9 Different Sleep modes supported by Intel Pentium 4 Mobile [16] . . . . . . . . . 22 2.10 Power Dissipation of uniprocessing and parallel processing systems . . . . . . . 26 2.11 Powering and Electronic System . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.1 An Electrical Model for Lithium-ion battery . . . . . . . . . . . . . . . . . . . . 43 4.1 Types of Converters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.2 A Simple Buck Converter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.3 Buck Converter output waveform . . . . . . . . . . . . . . . . . . . . . . . . . . 52 vii 5.1 Circuit Delay and Current versus VDD obtained from HSPICE simulations . . 57 5.2 VBatt Vs Time when a battery of 1.2 AHr capacity is subjected to load current, IBatt = 3.6A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5.3 Battery efficiency versus battery size for various load currents . . . . . . . . . . 60 5.4 Simulation of a 400 mAHr battery for a range of supply voltages (VDD) . . . . 61 5.5 Battery lifetimes in clock cycles as a function of chip voltage for 400 mAHr and 1600 mAHr batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.6 Battery lifetimes in clock cycles as a function of chip voltage for 400 mAHr and 1600 mAHr batteries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.7 Battery lifetimes in number of clock cycles for CR2032 with max. Ibattery = 3mA 64 6.1 Clock slowdown (CSD) power and battery lifetime ratios for low and high leakage technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.2 Instruction slowdown (ISD) power and battery lifetime ratios for low and high leakage technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.3 A MIPS program used for power estimation. . . . . . . . . . . . . . . . . . . . . 73 6.4 Clock slowdown (CSD) power ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. CSD is more effective for low leakage (180nm) technology. 75 6.5 Clock slowdown (CSD) battery lifetime ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratios greater than 1 indicate increased battery lifetime through clock slowdown for low leakage 90nm and 180nm technologies. . 76 viii 6.6 Instruction slowdown (ISD) power ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. ISD gives greater power saving for higher leakage technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 6.7 Instruction slowdown (ISD) battery lifetime ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratios greater than 1 indicate increased or unde- graded battery lifetime through instruction slowdown for high leakage 32nm and 45nm technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.8 Clock slowdown (CSD) vs. instruction slowdown (ISD) power ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratio > 1.0 indicates the advantage of ISD for 32nm and 45nm technologies. . . . . . . . . . . . . . . . . 79 6.9 Clock slowdown (CSD) vs. instruction slowdown (ISD) battery lifetime ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratio < 1.0 indicates the advantage of ISD for 32nm and 45nm technologies. . . . . . . . . . . . . . . 80 6.10 Power ratio, energy ratio and ideal battery lifetime ratio plotted against slow down factor,n, for ISD in 32nm . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.11 Circuit energy, battery lifetime and task completion time plotted against number of SLOPs, for ISD in 32nm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 ix List of Tables 2.1 ITRS predictions on power dissipation of technology nodes . . . . . . . . . . . . 7 5.1 High performance and minimum energy modes of operation. . . . . . . . . . . . 61 6.1 HSPICE simulation (32nm CMOS, 90oC). . . . . . . . . . . . . . . . . . . . . . 81 6.2 Leakage factor (k) and SLOP power factor (?). . . . . . . . . . . . . . . . . . . 82 x List of Abbreviations CG Clock Gating CISC Complex Instruction Set Computer CSD Clock SlowDown DPCT Dynamic Power Cut-off Technique DVFS Dynamic Voltage and Frequency Scaling EPC Energy Per Cycle GILD Gate Induced Drain Leakage HDL Hardware Description Language ISA Instruction Set Architecture ISD Instruction SlowDown ITRS International Technology Roadmap for Semiconductors MIDs Mobile Internet Devices MIPS Million Instruction Per Second NiMH Nickel Metal Hydride NOP No-OPeration PG Power Gating PMU Power Management Unit xi PTM Predictive Technology Models RISC Reduced Instruction Set Computer SLOP Slowdown for LOw Power xii Chapter 1 Introduction Every processor chip has a physical limit on power dissipation it can support. For systems that use these processors, performance and power become opposing requirements. Modern computing systems, therefore, have built-in power control schemes. For example, thermal sensors on a processor chip may trigger a slowdown of the processor clock [35]. For mobile systems, energy consumption and the rate of consumption (power) are di- rectly related to the battery capacity. Higher discharge rate reduces the capacity, requiring bulkier batteries with higher current rating [3] or more frequent recharging. Thus, it is im- portant to control the power consumption. Traditional metrics like minimization of Power and Energy are not really suitable when power source (battery) optimization is a concern. For battery operated portable devices, an obvious objective is to maximize the battery life- time. In spite of this fact, the discussions of low power design metric and methodologies have entirely focused on VLSI sub-system optimizations. The energy stored in a battery is assumed to be constant and available at any possible rate. In reality, however, the energy stored in a battery may not be used to its full extent. The delivery of energy from battery to system depends on the mean value of the current drawn from the battery. Battery lifetime does not have a simple linear relationship with power consumption of the circuit. e.g. a 2X increase in system power can cause a 3X decrease in battery lifetime. These facts motivate us to consider various approaches with design goal of power source optimization. Various ap- proaches have been suggested in the literature and they demonstrate a potential to optimize battery energy consumption. These approaches can be classified into three broad categories. ? Voltage Management Methods ? Throughput Management Methods 1 ? Functional Management Methods In chapter 5, we suggest a general system level method to identify the load current on the battery and then choose battery of minimum size which can satisfy the required current. We also discuss various modes in which a system can operate in order to achieve maximum energy efficiency. The later part of the chapter focused on optimizing the battery lifetime and finding a right size of battery for a given load current. As far as the portable electronic devices are concerned, the ultimate aim is to achieve more battery lifetime or, for a rechargeable source, perform the most operations between consecutive recharges. Optimization of the circuit alone for power and energy may not always result in equivalent optimization of battery lifetime. So a study of the system consisting of battery and the circuit under consideration has been carried out in order to achieve maximum battery lifetime. In general, this lifetime should be measured in terms of the duration of the system operation. A relevant measure is the number of useful clock cycles obtained per battery life or per battery recharge. Size and weight of the batteries are major design constraints for mobile computing devices. Battery weights are generally proportional to their AHr ratings. Given an application with its load current requirement, a relevant problem is to find a battery with minimum size and weight to run the application. Since the energy drawn from the battery is not always equal to the energy consumed in the device, understanding battery discharge behaviour and its own dissipation are essential for optimal system design. When excessive power consumption forces clock slowdown (CSD), the completion time of the ongoing system task increases. This increases the energy consumption. The energy penalty of the CSD method can be severe for high-leakage technologies. CSD is, therefore, not recommended without voltage scaling [8]. There is, however, another consideration. The reduced power slows the current drain from the battery. For a given battery capacity, this can increase the lifetime of the battery [21, 38]. Lifetime here refers to the useful life of a primary battery or the time between recharges for a secondary (rechargeable) battery. If the increase in the battery lifetime for a portable device is more than the increase in the 2 execution time of the task, then CSD can be beneficial [2]. Unless the efficiency aspect of the power source is properly considered, the slowing down of a computing task for power reduction would not be recommended. The lack of such consideration often results in the use of oversize batteries as well as over-design for unnecessary power dissipation, cooling, etc. In the later sections of this thesis, we discuss a scenario where CSD may be necessary. We also find that its power saving advantage diminishes in higher leakage technologies. This leads to our motivation for finding a lower energy penalty alternative. Because clock slowdown (CSD) allows larger delay for hardware, we can further reduce power by lowering the supply voltage. Voltage reduction reduces both power and energy. However, this has limited potential in the nano-meter technologies where the voltage, already lowered due to the electric field requirement, is closer to the threshold voltage. This is particularly so for dual-threshold designs in which high-threshold devices are used to reduce leakage. When voltage has been scaled down to some limit set by the technology, further power reduction, if necessary due to the system or operational requirements, by CSD will increase the task completion time and the leakage energy. To reduce the energy, a dynamic power cutoff technique (DPCT) has been proposed [27]. While DPCT can save both power and energy, it requires turning power off and on for different parts of combinational logic at different times within the clock period. Asynchronous delays for power control signals make the design complex and especially sensitive to process variation. In this work, we address the need for a power saving method with emphasis on the energy penalty. We propose an instruction slowdown (ISD) method, which inserts NOP- like instructions. A new instruction named SLOP (slowdown for low power) is automatically inserted by the processor control that also generates power-down, sleep mode, or clock gating signals for various hardware units. We have analyzed several technologies ranging through 180nm to 32nm and shown that the ISD method is equally or more effective than the CSD method in higher leakage technologies. 3 In general, the slowdown of a computing task can consume more energy. In fact, it would always turn out to be that way if we considered the raw energy consumption from an ideal source. The conclusions differ when we consider a real source, such as the battery in a portable device. A relevant parameter is the lifetime or the time between consecutive recharges of a battery. A battery?s capacity, usually in mA-hours, is a valid indicator of the recharge time if the battery supplies close to the rated current. At higher currents, the capacity degrades. Thus, reduction in power consumption (or current drain) can enhance the lifetime [38]. We use a battery model based on the classical Peukert?s law [21] to represent the battery lifetime, which is adjusted for the increased task execution time. Alternatively, a battery efficiency model [43] can also be used. Slowdown for power reduction is considered beneficial only if the adjusted lifetime is enhanced. This advantage of ISD becomes more pronounced as the technology becomes leakier. Instruction slowdown (ISD) can be compared to another proposed power saving method called fetch throttling [33, 34]. This method, when applied to multiple issue processors, slows down the rate of instruction fetch based on the lack of any parallel execution opportunity in the program being executed. Thus, the instructions that would have waited in the pipeline due to data, resource, or control conflicts are fetched after suitable delays. The reported average reduction in energy delay product is 6.7% for static throttling and could go up to 15% with dynamic throttling. These savings are due to the avoidance of incorrect speculations. We can reduce the performance penalty of instruction slowdown (ISD) by inserting the NOPs after those instructions that require speculation. However, this aspect is not discussed in this work and should be explored in the future. The objective of the present work is to reduce power with minimal energy cost and to maximize number of operations performed in a single recharge. 4 Chapter 2 Theory and Background Work on Low Power Design 2.1 Introduction 2.1.1 Need for Low Power VLSI chips Higher performance and lower chip area have always been major concerns for chip designers. Low power dissipation of VLSI chips has now become one of the primary goals. In the past, the device densities were low enough that power dissipation was not a constraining factor in chips. As the scale of integration improves, more transistors, faster and smaller than their predecessors, are being packed into a chip. This leads to steady growth of operating frequency and processing capacity per chip, resulting in increased power dissipation. New generation devices are at a safe distance from reaching their fundamental physical limits so the evolution seems to continue for a while. A need for low power VLSI chip arises from such evolution forces of integration circuits. Another factor that fuels need for low power chips is the increased market demand for Mobile Internet Devices (MIDs) powered by batteries. The craving for smaller, lighter and more durable products directly translates to low power requirements. Batteries have not experienced a similar rapid density growth compared to electronic devices. The specific weight (stored energy per unit weight) of batteries barely doubles in several years [61] (Figure 2.1). Also, further increase in battery specific weight will create concerns about their safety as the energy density will approach that of explosive chemicals as shown in Figure 2.2. So the battery technology is not going to solve the power demand problem in future devices but the devices, on the other hand, will have to use battery energy in a smart way. 5 Figure 2.1: Growth in energy densities of Lithium-ion batteries Figure 2.2: Limit on the growth of battery energy densities 6 Table 2.1: ITRS predictions on power dissipation of technology nodes Node 90nm 65nm 45nm Dynamic Power per cm2 1X 1.4X 2X Static Power per cm2 1X 2.5X 6.5X Total Power per cm2 1X 2X 4X High performance computing systems characterized by large power dissipation also drive the low power needs. The power dissipation of a typical high performance microprocessor is about 150 watts with an average power density of 50-75 watts per square centimeter. Local hot spots on the die can be many times higher than the average number. This has a direct impact on packaging cost of chip and cooling cost of the system. A chip that operates at 3.3V consuming 10 watts of power means average current of 3A. Transient currents would be much higher than these. This creates problems in the design of power supply rails and poses a challenge in analysis of digital noise. This also poses a threat to reliability of the chip as mean time to failure decreases with increase in temperature. The problems are expected to get worse as we move to new technology nodes as predicted by International Technology Roadmap for Semiconductors(ITRS), shown in the Table 2.1. Another driving force for demand of low power chips comes from the environmental concerns. Computers are the fastest growing electricity loads in the commercial sector. Since electricity generation is major source of air pollution, inefficient energy usage in computing equipment directly contributes to environmental pollution. 2.1.2 Power Vs. Energy For MIDs operating on batteries, the distinction between power and energy is critical. While power is decided by the instantaneous current drawn by the device, energy is decided by the duration for which the current was drawn. The power drawn by a portable device such as cell phone or a Personal Digital Assistant (PDA) varies according to what type of tasks are being performed, e.g. an active call or a web browsing task will consume a considerable 7 amount of current while a standby mode will not consume as much power. In both the cases, however, energy is being drawn from the battery and in many practical circumstances the standby time of the device is large enough that it consumes equal amounts of energy. For a portable equipment operating on battery, therefore, better energy management and maximizing battery life are more logical design goals than power management. 2.2 Where Does All the Power Go? All the power consumed by a CMOS device does not produce useful activity. Part of the power is dissipated in the ON resistance of the device while charging and discharging the output capacitance. This is known as Dynamic power dissipation. Dynamic power dissipation also consists of short circuit power dissipation which is caused by a short between VDD and ground due to a momentary ON state of both the P-type and N-type network in a device. Part of the power is also dissipated in the OFF resistance of the device due to flow of leakage current from supply to ground while the device is turned OFF. This is known as Static power dissipation. The following subsections describe each of them in detail. 2.2.1 Dynamic Power Dissipation Until 65nm CMOS technology process, the dynamic power dissipation was the dominant source of power dissipation in CMOS. It is caused by the charging and discharging of the output node capacitance. Following is the formula used for calculation of dynamic power dissipation. PD = CLV 2f (2.1) Where, 8 CL = Total load capacitance of the circuit. This capacitance largely consists of the parasitic capacitance inherent in the circuit such as, CMOS gate capacitances, source to drain capacitances and interconnect capacitances. Although these capacitances can not be avoided entirely, certain measures can attempt to minimize these capacitances which is one of the methods of reducing dynamic power dissipation. V = Supply voltage of the circuit. This is one of the important factors in controlling the power consumption, as the power reduces quadratically with change in voltage. Supply voltage also affects static power consumption as we will see in next subsection. f = Frequency of operation. Slower circuits consume less power as compared to faster ones. As mentioned before, short circuit power consumption also contributes to dynamic power dissipation. Figure 2.3 shows an inverter circuit and the currents associated with its opera- tion. The circuit operates at Vdd with Vi as input voltage, Vtn as threshold for NMOS and Vtp as threshold for PMOS. When the input Vi changes from low (0 V) to high (Vdd) there is a short time duration for which the input is greater than Vtn and less than Vtp as shown in figure 2.4. This causes both PMOS and NMOS to conduct and hence a short circuit current flows from Vdd to ground. The shape of short circuit current curve is dependent on ? The duration and slope of input signal. ? The I-V curves of P and N transistors which depend on their sizes, process technology, temperature, etc. ? The output load capacitance of the inverter. 2.2.2 Static Power Dissipation Ideally, CMOS circuits dissipate no static power when they are not switching. But semiconductor devices conduct or leak through reverse biased channels and provide a path from VDD to ground and this constitutes to leakage power consumption. Leakage current is a 9 Vdd Vi ip ic in Vo CL ip = ic + in Figure 2.3: A CMOS Inverter circuit 10 t t Input Voltage Vi i p / i n Short Circuit Current Vtp Vtn Figure 2.4: Short circuit currents of CMOS inverter during input transition form of current which is generally not intended for normal operation of a digital circuit. This leakage current is not useful in most cases. There are various sources for leakage currents, as shown in Figure 2.5, and we will discuss three primary sources. ? Sub-threshold Channel conduction current (Isub) In the OFF state, even though the transistor is logically turned off, there is a non- zero leakage current flowing through channel. This is known as sub-threshold leakage. Other than device dimensions and fabrication process, the magnitude of this current depends on threshold voltage, Vt; gate voltage, Vgs; drain voltage Vds and temperature. During the OFF state, Vds ? VDD so the sub-threshold current essentially depends on Vgs. It is given by following equation [20]. Isub = I0e(VgsnullVt)/(?Vth) (2.2) 11 Figure 2.5: Leakage Currents for nMOS transistor Where, Vt is the device threshold voltage, Vth is thermal voltage and it is 25.9mV at room temperature (300K), I0 is the current when Vgs = Vt, ? ranges from 1.0 to 2.5 and is dependent on device fabrication process. Sub-threshold current is becoming a limiting factor in low voltage and low power chip design. When operating voltage is reduced the device threshold voltage Vt has to be reduced accordingly to compensate for loss in switching speed. ? Gate Tunnelling Current (IG) With scaling of the channel length, a good transistor aspect ratio can be maintained only by comparable scaling of oxide thickness, junction depth and depletion depth. Maintaining this aspect ratio is a challenge since the scaling in the vertical direction is difficult. The silicon dioxide gate dielectric thickness is approaching scaling limits and there is a rapid increase in the gate tunnelling current. The oxide thickness limit will 12 be reached approximately when the gate to channel tunnelling current (IG) becomes equal to the off-state source to drain sub-threshold leakage (Isub). This limitation can be resolved by making use of different materials with high permit- tivity as gate dielectric. This will result in thicker and easier to fabricate dielectric with potential for significant reduction in leakage current.[19]. One such successful implementation is Hafnium based high-k dielectric in 45nm technology by Intel for their processor series code named ?Penryn?. Hafnium silicate based dielectric materials help reduce leakage currents but they also suffer from trapped leakage currents which affects the device life. ? Reverse biased PN-Junction current (ID) This current flows when (for an nMOS transistor) the source is at VDD and the drain is at ground. The current flows due to a PN-junction formed at the source or drain of transistors due to parasitic effect of the bulk CMOS device structure. The junction current at the source of the transistor is picked up through bulk or a well contact. The magnitude of this current is given by following equation [20]. ID = Is(eV/Vth ?1) (2.3) Where, Is is reverse saturation current, Vth is thermal voltage which is given by Vth = kT/q where k = 1.38?10null23 Joule/K is a Boltzmann?s constant, q is electronic charge in Coulombs and T is device operating temperature. ID is largely independent of operating voltage but depends, in general, on temperature, process, bias voltage and area of the PN-junction. 13 Other sources of leakage current such as Gate Induced Drain Leakage current (IGIDL) and drain source Punch Through current (IPT) also contribute to total leakage current. 2.2.3 The conflict between Dynamic and Static Power Dynamic power can bereduced by reducing the supply voltage. Supply voltage reduction has been a constant phenomenon with the technology scaling. Voltages for semiconductor devices have been reduced from 5V to 0.8 in the most recent technologies. But when the voltage is lowered, the transistor ON current IDS reduces which makes devices switch slower. The approximate equation for IDS is given by IDS = ?CoxWL .(VGS ?Vt) 2 2 (2.4) Where, ? is the carrier mobility, Cox is the gate capacitance, Vt is the threshold voltage, VGS is the gate-source voltage So to maintain higher IDS we need to lower Vth as we lower VDD (or VGS). How- ever, lowering Vth results in an exponential increase in the sub-threshold leakage current as indicated by the Isub equation (equation 2.2). Thus the methods to lower dynamic power and leakage power in a device contradict each other. This situation has worsened for 65nm and lower CMOS process technologies as the static power is equal to or more than dynamic power in the device. 2.3 Low Power Design Methods Low power methods for design of circuits can be classified in many different ways. One of the classic papers in this area [8] describes these techniques in three simple categories as 14 1. Trade area or speed for power, 2. Don?t waste power and 3. Find a low power problem. Though this functional classification is good for an insight into the subject, classification of these methods based on abstraction level is more practical from an engineer?s point of view. System or architecture level techniques are most effective for managing power since often a problem can be implemented with an algorithm that consumes less power [22]. Algo- rithmic level changes in the solution to a problem can only be incorporated at the system or architectural level. On the other hand, estimation of power is most accurate at the transistor level and least accurate at the system level. A decision for the selection of abstraction is generally based on the overhead involved with the technique. This overhead may include area, speed, complexity and verification time. In any modern chip design flow, efforts to reduce power consumption in a circuit are incorporated at all possible stages and levels of abstractions as shown by Figure 2.6. The following subsections discuss some of the tech- niques of low power design at various levels of abstraction. We discuss these in a bottom up fashion. 2.3.1 Circuit Level Methods At the circuit level, the power reduction techniques are quite limited in number and they generally don?t result in more than 25% power reduction. However these techniques can have a major impact on power consumption of a design because these circuits, e.g. standard cells for most common gates and flip-flops, are repeated thousands of times on a chip. So circuit techniques with a small percentage of power savings cannot be overlooked. Transistor sizing for Leakage Power reduction Leakage current of a transistor increases with decrease in channel length and thresh- old voltage. But lower threshold voltage and channel length can provide higher saturation 15 Figure 2.6: Design flow and type of tools at different levels of abstraction[22] current resulting in faster switching frequencies. Thus there is a trade-off between leakage power and delay. One of the techniques used to reduce leakage power is to size one or more transistors in the transistor network. Consider a simple two-transistor inverter. If the output of the inverter is logic high (P- transistor conducting) then the leakage power is determined by the N-transistor. Whereas in other case when the output is low, leakage power is determined by the P-transistor. Assuming that, in dormant mode, the inverter output is at logic high, we can reduce leakage power by increasing the channel length of the N-transistor. This also affects the switching speed of the N-transistor so the falling transition for the inverter will be affected. If the falling transition is not part of the critical path then this method can save on leakage energy without any change in the circuit speed. If the falling transition is on a critical path then we can select logic low to be the default output value during dormant mode and size the P-transistor instead. Similar effects can be observed by increasing the threshold voltage of either the P or N transistor. 16 Transistor Network Restructuring Boolean functions are implemented as combinations of simple logic gates like NAND and NOR. These gates are then mapped to their equivalent transistor networks. These networks can be organized in different ways to achieve similar functionality. Choice of arrangement of transistors inside the network can be based on the leakage current minimization. Transistor stacking is a well known technique for reduction of leakage current in stand-by mode. Any implementation of a function has an input combination that results in minimum leakage current flow from VDD to ground. This input combination can be applied to the function when it is in stand-by mode. A very good summary of leakage reduction through stacking has been explained in Chapter 2 of [19]. Similarly, transistor re-organizing also plays an important role in reducing overall power consumption. Simple boolean functions can be implemented as a single complex network of transistors but as the function complexity increases the number of serial transistors in a network start to increase. This number has to be limited to ensure proper operation of the circuit. When the number of serial transistor increases the effective resistance of the serial transistor chain increases. To compensate for the increased resistance, transistor sizes have to be increased to maintain an acceptable delay. Also number of parallel transistors have a similar limit as well since each additional parallel transistor adds its own drain diffusion capacitance which increases total capacitance to the output node slowing down the circuit. These limits on the number of serial and parallel transistors are technology dependent and may also depend on operating voltage, system speed and other factors. Given an arbitrary boolean function, there can be different organizations of the circuits. Figure 2.7 indicates how a 4-input AND can be implemented in two different organizations. Low Power Cell Libraries Most digital designs today are designed using high level Hardware Description Lan- guages(HDL) and synthesized using automated computer aided design (CAD) tools. The 17 (a) 12 Transistors (b) 14 Transistors Figure 2.7: Two different implementations of a 4-input AND gate[22] basic building blocks of these designs are customized logic gates or Cells. So the quality of overall design depends on the quality of these cells. Low power design is no exception to that. The cells can be custom designed and characterized keeping power as primary constraint. The most important attribute that constitutes to a good low power design is the availability of variety of cell sizes of commonly used gates/functions. A smallest size of cell which satisfies the delay constraint can be chosen from all the available sizes. Therefore, fine granularity of the cell sizes is important. For instance, if the delay requirement demands for a cell size of 3X and the closest available size is 4X then we are unnecessarily wasting power by using an extra size cell. While deciding the range of cell sizes, the capacitance and area requirements are also important factors to be decided on. The overall circuit capacitance should be taken into consideration. Although it would be efficient for the design to have as many sizes of cells possible in the cell library, increase in the simulation and synthesis time of the design may limit the number of sizes per cell. 2.3.2 Gate Level Methods Gate level design, or logic design in general is the most basic form of design where the logic synthesis starts. Due to the complexity of the designs today, the synthesis process is not done manually. Although the design process at logic level is done by HDLs, power optimization at logic level can still be performed by modifying the synthesis algorithms. The 18 most common theme in power optimization at logic level is reduction of switching activity. Switching activity directly contributes towards dynamic power and hence elimination of unnecessary switching activity should be a primary goal. Gate Reorganization Gate reorganization is a technique similar to transistor restructuring that was described in last section. In general, this reorganization is an operation to transform one logic circuit to another that is functionally equivalent. Since there are many possible combinations, it is important to choose an organization which does not differ drastically from the existing one in terms of area and delay while consuming lower power. A logic synthesis produces an initial logic network of gates from the HDLs. Then, depending on power constraints, some local transformations are applied to optimize the circuit. Some of the local transformations are: 1. Combine several gates into a single gate. 2. Decompose a single gate into several gates. 3. Duplicate a gate and redistribute its output connections. 4. Delete a wire. 5. Add a wire. 6. Eliminate unconnected gates. This reorganization can be targeted towards low power design of functions. The trans- formation from a function to a gate structure is called as Technology Mapping. An excellent discussion of technology mapping for low power design has been carried out by Tiwari et. al.[30] Signal Gating Signal gating is a technique to mask unwanted switching activity from propagating forward causing unnecessary power dissipation. Since signal activities can be monitored 19 Latch/FF Gate GateGate Gate (a) Simple Gate (b) Tri-state Buffer (c) A Latch / FF (d) Transmission Gate Figure 2.8: Various Implementations of Signal Gating [20] and analysed better at gate level, these techniques are generally applied at gate level.There are many different methods to implement a signal gating. Figure 2.8 shows some of the implementations of signal gating All signal gating methods require control signals to stop the propagation of switching activities. These control signals are generated by additional logic in the controller. This can add to area and cause additional leakage power. So a designer must take this fact into account and see if the design leads to overall power saving. The identification of signals to be gated is application dependent and is subjected to feasibility of implementation. Potential candidates for signal gating are clock signals, address buses and signals with high activity or glitches. Logic and State Machine Encoding Reduction in logicactivity of thesignals can also beachieved by changing the encoding of the combinational or sequential circuits. For instance, a 3-bit counter can be implemented in both binary and Gray encoding. In binary encoding the number of transitions for the counter is 14, whereas for a Gray encoding they are 8. For a 6-bit counter this difference is 126 for binary coding against 64 for Gray coding. So dynamic power can be greatly reduced by using Gray encoding. Another example is Bus Invert encoding in which the signals transmitted over a par- allel bus are examined and they are sent in normal form or in complemented form. This decision logic inspects two consecutive signal vectors for the activities and decides whether 20 to complement the next vector or not. A polarity signal is transmitted along with the vector so that the vector can be converted to its original form at the receiving end. State machines perform transitions from one state to other depending on present state and input. To define this behaviour, a state transition graph/diagram is prepared first and then a synthesis tool will convert this graph (generally a HDL description) to combination of flip-flops and logic gates. Allocating the binary codes to the states in a state transition graph is called as state machine encoding. This encoding is one of the important factors that decides area, power and speed of the state machine. One goal is to reduce number of states in the machine so as to minimize number of flip-flops. Another key decision to make is which state encoding method to use. One hot or one cold methods have least number of transitions but they also use more number of flip flops. Binary encoding, on the other hand, has very few flip flops but may have many transitions if not properly designed. Gray encoding achieves a balance between number of flip flops and number of transitions for a state machine. 2.3.3 Architecture or System Level Methods As we move up in abstraction level, the optimization problems become less exact and obscured due to more freedom in design configuration and decision. Due to this fact, higher level techniques rely more on human intuition and the art of chip design. System Power Management Low power Standby or Sleep modes: The system level power management ensures that we do not waste power by designing hardware that has more performance than necessary. Also, when the system throughput requirement is low, a low power oriented system should be able to adapt to the change and consume less power. Low power standby modes, or sleep modes, for a microprocessor are examples of such power management schemes. The best way to achieve a better power efficiency is to shut down functional units which are not being used or 21 Figure 2.9: Different Sleep modes supported by Intel Pentium 4 Mobile [16] gate the clock to these units in order to suppress the activity. In modern day microprocessor design, there is a variety of sleep modes available which can be activated depending on the state of the processor and performance requirements. These modes can be extremely effective, reducing standby power to a small fraction of the power consumed during normal operation. Figure 2.9 shows the state diagram of a processor for its transition from one mode of operation to other in order to achieve maximum power efficiency. If the processor clock frequency is reduced, functionality will be maintained during the low power mode and the processor can still service low priority tasks that do not require full frequency performance. Processor clocks can only be stopped if the machine state is maintained statically. Clocks can also be gated in only a portion of the design so that some functions are still active. Examples of a functions which need clocking even during a sleep mode are bus snooping controllers or control logic which actually provides sleep and clock gating signals. 22 Low power modes may be implemented with software or hardware control. Software control requires specific instructions to enter a sleep mode when the processor is idle. This code can be part of operating system (OS) code. The OS enters this state of sleep mode when system has been idle for pre-decided period of time. The system returns to normal mode of operation when a high priority interrupt is detected. To provide this support in an OS, certain provisions in the hardware are necessary. The power management unit on the chip needs to clock gate or power gate functional units depending on how deep a sleep mode has been requested by the OS. Supply voltage selection for Standby mode: Reduction in supply voltage affects both dynamic and static power dissipation but it also increases delay, so the throughput is low. During a sleep mode, the system throughput requirements can be very low and there is a possibility of reduction in the voltage to a level which satisfies the throughput requirement. There is also a possibility of turning off power to a chip if all the memory states can be saved off chip and reloaded when the activity resumes. This approach can only be considered if the overhead (in time and power) of storing the data off chip and reloading it back justify the overall saving achieved by turning off the chip. Modern chips also have different volt- age domains on a chip that can be turned off independently in order to achieve maximum efficiency. Architectural Methods Architectural methods are quite commonly used to develop microprocessors that are more power efficient and have equal or nearly equal performance as their power hungry counterparts. Architectural modifications can save power either with no compromise on speed and area or with trade-offs. These architectural decisions are made depending on the application for which the processor is being designed. A processor designed for a portable computer, PDA or a smart phone can use power saving techniques that trade performance off for power but may have strict area constraints. On the other hand, a processor designed 23 for a server can not use techniques that sacrifice speed for power. Support from compiler, operating system or an application are also important factors to be considered while making architectural modifications in order to reduce power. Following are some of the areas in architecture design where decisions and modifications for power efficiency can be made. Instruction Set Architecture: Instruction fetch is performed for every single instruction. So a large portion of energy is spent in fetching operation. Instructions can be designed in a way that programs will have higher code density, smaller instruction lengths and reduction in code size. This will allow on-chip cache memories to hold the complete program which saves greatly on fetch energy. Decisions regarding choice of CISC or RISC type of ISA can also affect the energy efficiency. CISC has greater code density and smaller program length but CISC also needs complex decoding hardware. So CISC can be energy efficient if the ISA contains only few types of instructions. RISC on the other hand can use a simpler decoding logic but also has longer program lengths so it can be only energy efficient if wide variety of small instructions are required. Number of instructions accessing memory variables di- rectly should be limited. Such instructions reduce code length but also need more energy due to their longer execution times. A proper combination of memory-to-memory accesses and register operations in an ISA can obtain maximum energy efficiency. A fixed reduced instruction length or a variable instruction length are other decisions that a designer needs to make while designing ISA. Datapath: Pipelining is common way of implementing the ISA due to its inherent throughput advantage. Two important parameters to be considered while designing the pipeline are the number of pipeline stages and the number of execution pipelines. Two pipelining strategies which emphasize the two factors are described below. [22] Superscalar Performance: increased throughput by providing multiple execution units so that parallel 24 execution may be implemented. Power: increase in design complexity and area, data dependency check requirements increase dispatch logic area. Superpipelined Performance: increased number of simple pipeline stages, perform faster and higher clock frequency can be achieved. Power: increased number of clocked elements, inherent increase in dynamic power due to increased frequency. Microprocessors chosen for low power typically have five pipeline stages or less. Use of register files can save a lot of energy by reducing traffic to memory. But register files themselves can also be made power efficient by power/clock gating them during pipeline stalls and disabling read ports when data is being provided from other sources. Parallel Architecture with Voltage Reduction: Parallelism has traditionally been used to boost system throughput. It does so without increasing the operating frequency but requires additional hardware to perform multiple functions at the same time. In short, parallelism trades area for performance. This trade-off can also be used to reduce the power. Voltage scaling has a quadratic effect on dynamic power reduction and linearly reduces leakage power. So scaling down the voltage is an attractive solution for a power efficient design. But since circuit delay is inversely proportional to the voltage, reduction in voltage increases delay and hence there is a performance penalty. This problem can be overcome by using a parallel architecture which allows lowering the voltage while still maintaining the throughput. Consider a signal processing system whose throughput requirement is satisfied by a frequency f. Let V be the system voltage and C is the total amount of capacitance being switched, then the power consumption is given by P = parenleftBig CV 2f parenrightBig (2.5) 25 MUX Voltage = V Frequency = f Processor Processor Processor ff Input Input OutputOutput f/2 f/2 Cap = C Voltage = 0.6V Frequency = 0.5f Cap = 2.2C Figure 2.10: Power Dissipation of uniprocessing and parallel processing systems If the number of processors is doubled as shown in figure 2.10, each of the processors can be operated at half the frequency f/2 and the output is multiplexed at the desired frequency f. Now assuming that due to increase in components the total capacitance switched is 2.2C and the voltage can be scaled down to 0.6V, the new power dissipation is given by Pnull = (2.2C)(0.6V)2(0.5f) = 0.396P (2.6) So in the best case, we get about 60% power reduction compared to the single processor system. But there are other factors which limit this technique to achieve higher power re- duction. One important factor is leakage power. Since we have additional components in the system the leakage current will be at least twice that of the single processor configuration. So according to formula Pleakege = V ?Ileakage the leakage power is 1.2 times its original value. Another factor is the availability of inputs in the parallelizable form. When considering the system with two processors, we assumed that the input can be split into two equal length parts and that these parts are independent. But, in practice, only very few types of inputs such as images, certain matrix operations, etc., have such properties. Most other problems are sequential and have inter-dependability of variables on each other. This realization has 26 changed direction of new research towards making applications, programs and basic algo- rithms more parallelizable [60]. Dynamic Voltage and Frequency Scaling (DVFS) : The total power at each node of CMOS circuit can be represented by P = parenleftBig CLV 2ddf parenrightBig + (ISCVdd) + (IleakageVdd) (2.7) It is apparent from the above equation that each of the contributors to total power can be reduced by reducing the supply voltage Vdd. Also, the first term, which represents dynamic power, reduces quadratically with the voltage. Voltage reduction has been one of the most common techniques of power reduction. Low voltage modes are used in conjunction with lowered clock frequencies to minimize power consumption associated with components such as CPUs and DSPs; only when significant computational power is needed will the voltage and frequency be raised. Many modern chips also contain multi-voltage domains that can be operated on different voltages depending on their critical delay requirements and can also have multiple voltage assignments (including 0V) for each domain. Dynamic frequency scaling (also known as CPU throttling) is a technique in computer architecture whereby the frequency of a microprocessor can be automatically adjusted at run time, either to conserve power or to reduce the amount of heat generated by the chip. Dynamic frequency scaling is commonly used in laptops and other mobile devices, where energy comes from a battery and thus is limited. It is also used in quiet computing settings and to decrease energy and cooling costs for lightly loaded machines. Less heat output, in turn, allows the system cooling fans to be throttled down or turned off, reducing noise levels and further decreasing power consumption. Dynamic frequency scaling reduces the number of instructions a processor can issue in a given amount of time, thus reducing performance. Hence, it is generally used when the performance requirements are not critical. Dynamic 27 frequency scaling by itself is rarely worthwhile as a way to conserve switching power. Saving the most power requires dynamic voltage scaling too, because of the V 2 component and the fact that modern CPUs are strongly optimized for low power idle states. In most constant- voltage cases it is more efficient to run briefly at peak speed and stay in a deep idle state for longer (called ?race to idle?), than it is to run at a reduced clock rate for a long time and only stay briefly in a light idle state. However, reducing voltage along with clock rate can change those trade-offs. Both dynamic voltage and frequency scaling (DVFS) can be used to prevent computer system overheating, that can result in program or operating system crashes, and possibly hardware damage. Some of the examples of DVFS implementation are Intel?s CPU throttling technology, SpeedStep, which is used in its mobile CPU processors and AMD?s two different CPU throttling technologies- Cool?n?Quiet, which is used on its desktop and server processor lines, and PowerNow, which is used in its mobile processor line. 2.4 Power Source Optimization: A System Approach 2.4.1 Choice of Metric Traditional metrics like minimization of Power and Energy are not really suitable when power source (battery) optimization is a concern. For battery operated portable devices, an obvious objective is to maximize the battery lifetime. In spite of this fact, the discussions of low power design metric and methodologies have entirely focused on VLSI sub-system optimizations. The energy stored in a battery is assumed to be constant and available at any possible rate. In reality, however, the energy stored in a battery may not be used to its full extent. The delivery of energy from battery to system depends on the mean value of the current drawn from the battery. Battery lifetime does not have a simple linear relationship with power consumption of the circuit. e.g. a 2X increase in system power can cause a 3X decrease in battery lifetime. These facts motivate us to consider other metrics for design goal of power source optimization. 28 Weiser et al. [56] present Millions of Instructions Per Joule(MIPJ) as a quality metric for dy- namic voltage scaling (DVS). The key idea is to eliminate idle time by reducing the processor voltage and clock for a given segment of computation. To predict processor utilization, either a fixed-size window of future events or a fixed-size window of past events is analyzed, and the corresponding DVS decisions are evaluated using trace-based simulations. This method has limited practicality since measurement and tracking of battery energy in terms of joules is difficult. Rakhmatov et al. [44, 45] use an analytical model of the battery to minimize a cost function ?(t). This cost is function of load current i(t) and sum of l(t) and u(t), where, l(t) is the charge lost in load and u(t) is the charge unavailable. Evaluation of this cost function is in the context of DVS for task scheduling and battery optimization. Minimization of this cost function is subjected to constraints such as task dependencies, task deadlines etc. Pedram et al. [43] propose battery discharge-delay product as the metric. This metric is similar to the energy-delay product while accounting for the battery characteristics and the DC/DC conversion efficiency. The BD-delay product states that the design goal should be to minimize delay and maximize battery lifetime at the same time. 2.4.2 Classification of Power Source Optimization Methods Since the primary aim is to optimize the energy of power source, the methods normally used for low power design are only a part of power source optimization methods. Various methods have already been proposed [56, 43, 38, 46] and these can, in general, be classified in three following categories. Voltage Management Methods Most common of voltage management methods is dynamic voltage management. Here the system has a capability of statically or dynamically varying VDD. A relevant problem is to find an optimum value of supply voltage which would minimize the energy consumption 29 of the battery and still maintain the throughput requirements. [43] propose a method to find optimum operating voltage for minimization of battery discharge-delay product. First of our two proposed techniques falls into this category. In chapter 5, we discuss this technique in detail. Throughput Management Methods Dynamic frequency scaling is one of the most used methods in this category. CPU frequency scaling for battery powered computers is examined in [48] in terms of its impact on battery life, system performance, and power consumption. Frequency scaling approaches use information from a battery model to vary the clock frequency of system components dy- namically at run time. They also use workload characteristics such as run-time and idle-time percentages dynamically, and models of system power and performance. These approaches can be used to ensure efficient use of the battery without significantly compromising system performance. [46] Functional Management Methods These methods include most of the methods discussed in the chapter 2 above. Most of these methods focus on power management of the system in order to reduce the average current drawn from the battery. Battery aware dynamic task scheduling is one such technique [45]. Second of our two proposed methods, which exploits idleness in a pipeline processor to dynamically manage power to different units, falls under functional management category. Dynamic voltage and frequency scaling (DVFS) is combination of voltage and through- put management methods and architecture level parallelism is a combination of all the three methods mentioned above. 30 capacitor Electronic VDD Battery Decoupling DC to DC voltage converter for Li?ion battery 4.2V to 3.5V GND system Figure 2.11: Powering and Electronic System 2.4.3 A Typical Battery Powered Electronic System A typical power supply for an electronic system is shown in Figure 2.11. The primary source of energy is a battery, normally an electrochemical device [21]. The battery can be a primary type that is discarded after it is discharged, or a rechargeable type. As shown in Figure 2.11, a fully charged Lithium-ion battery supplies 4.2 volts and when the voltage drops below 3.0 volts it is recharged. The electronic system is supplied a voltage VDD that is close to 1 volt or lower for modern nanometer technologies. A DC-to-DC converter [55, 43] provides the voltage transformation as well as the capability to vary VDD for power management. Because the current requirement of the electronic system is often pulsed and time varying, decoupling capacitors are used to smooth the transient ripples. The decoupling capacitors is, in general, distributed in the power grid of the system. In the consequent chapters, we discuss these components of a system in detail. Chap- ter 3 describes Lithium-ion batteries in detail along with background, electro-chemistry and terminology used for lithium ion batteries. This chapter also discussed various models that have been proposed and the model used for this work. Chapter 4 summarizes theory and background work on DC-to-DC converters. Chapter 5 describes the proposed technique for power source optimization. This technique falls into the first class of methods i.e. voltage 31 management. Chapter 6 describes a proposed functional method of power source optimiza- tion where we demonstrate savings in battery lifetime. Chapter 7 makes concluding remarks on the methods. 32 Chapter 3 Lithium-ion Battery Background and Modelling 3.1 Background For many years, nickel-cadmium had been the only suitable battery for portable equip- ment from wireless communications to mobile computing. Nickel-metal-hydride(NiMH) and lithium-ion emerged in the early 1990s and today, lithium-ion is the fastest growing and most promising battery chemistry. Lithium is the lightest of all metals, has the greatest electrochemical potential and provides the largest energy density per weight. Attempts to develop rechargeable lithium batteries failed due to safety problems. Because of the inherent instability of lithium metal, especially during charging, research shifted to a non-metallic lithium battery using lithium ions. Although slightly lower in energy density than lithium metal, lithium-ion is safe, provided certain precautions are met when charging and discharging. In 1991, the Sony Corporation commercialized the first lithium-ion battery. Other manufacturers like Hitachi, Panasonic, and LG followed suit. The energy density of lithium-ion is typically twice that of the standard nickel-cadmium. There is potential for higher energy densities for lithium-ion batteries. The load characteris- tics are reasonably good and behave similarly to nickel-cadmium in terms of discharge. The high cell voltage of 3.6 volts allows battery pack designs with only one cell. Most of today?s mobile phones run on a single cell. A nickel-based pack would require three 1.2-volt cells connected in series. Lithium-ion is a low maintenance battery. There is no memory and no scheduled cycling is required to prolong the battery?s life. In addition, the self-discharge is less than half 33 compared to nickel-cadmium, making lithium-ion well suited for modern portable computing applications. Lithium-ion cells cause little harm when disposed. Despite its overall advantages, lithium-ion has its drawbacks. It is fragile and requires a protection circuit to maintain safe operation. Built into each pack, the protection circuit limits the peak voltage of each cell during charge and prevents the cell voltage from dropping too low on discharge. In addition, the cell temperature is monitored to prevent temperature extremes. The maximum charge and discharge current on most packs are limited to between 1C and 3C. With these precautions in place, the possibility of metallic lithium plating occurring due to overcharge is virtually eliminated. Ageing is a concern with most lithium-ion batteries. Some capacity deterioration is noticeable after one year, whether the battery is in use or not. The battery frequently fails after two or three years. It should be noted that other chemistries also have age- related degenerative effects. This is especially true for nickel-metal-hydride if exposed to high ambient temperatures. Storage in a cool place slows the ageing process of lithium-ion (and other chemistries). Manufacturers recommend storage temperatures of 15nullC (59nullF). In addition, the battery should be partially charged during storage. The manufacturer recommends a 40% charge. The most economical lithium-ion battery in terms of cost-to-energy ratio is the cylin- drical 18650 (18 is the diameter and 650 the length in mm). This cell is used for mobile computing and other applications that do not demand ultra-thin geometry. If a slim pack is required, the prismatic lithium-ion cell is the best choice. These cells come at a higher cost in terms of stored energy. Advantages of lithium-ion batteries ? High energy density - potential for yet higher capacities. ? Does not need prolonged priming when new. One regular charge is all that?s needed. 34 ? Relatively low self-discharge - self-discharge is less than half that of nickel-based bat- teries. ? Low Maintenance - no periodic discharge is needed; there is no memory. ? Speciality cells can provide very high current to applications such as power tools. Limitations of lithium-ion batteries ? Requires protection circuit to maintain voltage and current within safe limits. ? Subject to ageing, even if not in use - storage in a cool place at 40% charge reduces the ageing effect. ? Transportation restrictions - shipment of larger quantities may be subject to regulatory control. This restriction does not apply to personal carry-on batteries. ? Expensive to manufacture - about 40 percent higher in cost than nickel-cadmium. ? Expensive to manufacture - about 40 percent higher in cost than nickel-cadmium. ? Not fully mature - metals and chemicals are changing on a continuing basis. The Lithium Polymer battery The lithium-polymer battery differentiates itself from conventional battery systems in the type of electrolyte used. The original design, dating back to the 1970s, uses a dry solid polymer electrolyte. This electrolyte resembles a plastic-like film that does not conduct elec- tricity but allows ions exchange (electrically charged atoms or groups of atoms). The polymer electrolyte replaces the traditional porous separator, which is soaked with electrolyte. The dry polymer design offers simplifications with respect to fabrication, ruggedness, safety and thin-profile geometry. With a cell thickness as little as one millimeter (0.039 inches), equipment designers are left to their own imagination in terms of form, shape and size. 35 Unfortunately, the dry lithium-polymer suffers from poor conductivity. The internal resistance is too high and cannot deliver the current bursts needed to power modern com- munication devices and spin up the hard drives of mobile computing equipment. Heating the cell to 60oC (140oF) and higher increases the conductivity, a requirement that is unsuitable for portable applications. To compromise, some gelled electrolyte has been added. The commercial cells use a separator, or electrolyte membrane, prepared from the same traditional porous polyethylene or polypropylene separator filled with a polymer, which gels upon filling with the liquid electrolyte. Thus the commercial lithium-ion polymer cells are very similar in chemistry and materials to their liquid electrolyte counter parts. Lithium-ion-polymer has not caught on as quickly as some analysts had expected. Its superiority to other systems and low manufacturing costs has not been realized. No im- provements in capacity gains are achieved - in fact, the capacity is slightly less than that of the standard lithium-ion battery. Lithium-ion-polymer finds its market niche in wafer-thin geometries, such as batteries for credit cards and other such applications. 3.2 Electro-chemistry The three participants in the electrochemical reactions in a lithium-ion battery are the anode, cathode, and electrolyte. Both the anode and cathode are materials into which, and from which, lithium can migrate. The process of lithium moving into the anode or cathode is referred to as insertion (or intercalation), and the reverse process, in which lithium moves out of the anode or cathode is referred to as extraction (or de-intercalation). When a lithium- based cell is discharging, the lithium is extracted from the anode and inserted into the cathode. When the cell is charging, the reverse process occurs: lithium is extracted from the cathode and inserted into the anode. During discharge, the anode of a conventional Li-ion cell is made from carbon, the cathode is a metal oxide, and the electrolyte is a lithium salt in an organic solvent. 36 Useful work can only be extracted if electrons flow through a (closed) external circuit. The following equations are written in units of moles, making it possible to use the coefficient x. The cathode half-reaction (with charging being forward) is: LiCoO2 ?Li1nullxCoO2 +Li+ +enull (3.1) The anode half reaction is: Li+ +enull + 6C ? LixC6 (3.2) Overcharge up to 5.2V leads to the synthesis of cobalt(IV) oxide, as evidenced by x-ray diffraction LiCoO2 ?Li+ +CoO2 (3.3) The overall reaction has its limits. Over discharge will supersaturate lithium cobalt oxide, leading to the production of lithium oxide, possibly by the following irreversible reac- tion: Li+ +LiCoO2 ?Li2O +CoO (3.4) In a lithium-ion battery the lithium ions are transported to and from the cathode or anode, with the transition metal, Co, in LixCoO2 being oxidized from Co+3 to Co+4 during charging, and reduced from Co+4 to Co+3 during discharge. 37 3.3 Description of Terminology 3.3.1 Capacity Capacity of the battery is its ability to hold and supply charge. For practical purposes, this capacity is defined in units of Ampere Hour(Ahr). So a 1 Ahr battery is able to provide current of 1A for an hour. The capacity for modelling purposes can be categorized in different types. Full charge capacity is the remaining capacity of a fully charged battery at the beginning of a discharge cycle, and full design capacity is the remaining capacity of a newly manufactured battery. Further, theoretical capacity is the maximum amount of charge that can be extracted from a battery based on the amount of active material it contains, standard capacity is the amount of charge that can be extracted from a battery when discharged under standard load and temperature conditions, and actual capacity is the amount of charge a battery delivers under given load and temperature conditions. 3.3.2 Rate Dependent Capacity Battery capacity decreases as the discharge rate increases. In a fully charged cell, the electrode surface contains the maximum concentration of active ions. When the cell is connected to a load, a current flows through the external circuit; active ions are consumed at the electrode surface and replenished by diffusion from the bulk of the electrolyte. However, this diffusion process cannot keep up with the reaction process, and a concentration gradient builds up across the electrolyte. A higher load current results in a higher concentration gradient and thus a lower concentration of active ions at the electrode surface. When this concentration falls below a certain threshold, which corresponds to the voltage cut-off, the electrochemical reaction can no longer be sustained at the electrode surface. At this point, the charge that was unavailable at the electrode surface due to the gradient remains unusable and is responsible for the reduction in capacity. 38 However, the unused charge is not physically lost, but simply unavailable due to the lag between reaction and diffusion rates. Decreasing the discharge rate effectively reduces this lag as well as the concentration gradient. If the battery load goes to zero, the concentration gradient flattens out after a sufficiently long time, reaching equilibrium again. The concen- tration of active ions near the electrode surface following this rest period makes some unused charge available for extraction. This charge can be used for recovery to control the discharge rate to maximize battery lifetime under performance constraints. However, at sufficiently low discharge rates, the battery will behave like an ideal energy source. 3.3.3 Temperature Effect Temperature strongly affects battery capacity and its shelf life. Temperatures much lower than room temperature lowers the internal activity of the battery resulting in higher internal resistance and hence increasing slope of discharge curve. On the other hand, temper- atures much above room temperature causes less internal resistance and hence the battery can deliver full rate of discharge and voltage. However, this results in a quicker self-discharge and the battery has less capacity to start with. Temperature effects on battery in a device are rather difficult to manage. 3.3.4 Capacity Fading Because of their high energy density and capacity, lithium-ion batteries are the popular choice for many portable applications. However, these batteries lose a portion of their capacity with each discharge-charge cycle. This capacity fading results from unwanted side reactions including electrolyte decomposition, active material dissolution, and passive film formation. These irreversible reactions increase cell internal resistance, ultimately causing battery failure. To deal with this problem, system users can attempt to control the depth of discharge before recharging. Typically, a battery subjected to shallow discharge state, that 39 is, voltage is still relatively high when recharging occurs, will be good for more cycles than a battery subjected to deep discharge state for example, until the cut-off voltage is reached. 3.4 Modelling Battery modelling, a mathematical description of batteries, is an important part of battery design and battery related system design. Several types of battery models have been reported in the literature. Use of any particular model is decided by its suitability in the application. For instance, a physical model may be suitable to construct a battery whereas an abstract or analytical model is suitable for designing a system containing batteries and optimization of battery parameters for the system. The following subsections briefly describe different types of models. 3.4.1 Physical Models Physical models are the most accurate and have great utility for battery designers as a tool to optimize battery?s physical parameters. However, they are also the slowest to produce predictions and the hardest to configure. These models may need as many as 50 parameters such as structure, chemical composition, temperature etc. for their configuration. They also provide a very limited analytical insight for system designers. Doyle et al [39, 40] developed an isothermal electrochemical model which describes charging and discharging cycles of a lithium ion polymer battery for one cycle. The model uses concentrated solution theory to derive set of differential equations which when solved can provide battery voltage as function of time. Dualfoil [41] is a Fortran program written to model lifetime of the battery. The program reads a sequence of constant current steps and compares the output voltage to cut-off voltage. This program has been widely used by many researchers for lifetime computation. 40 3.4.2 Empirical Models Empirical models are the easiest to configure, and they quickly produce predictions, but they generally are the least accurate. Although they work well in certain special cases, the constants used have no physical significance, which seriously limits their analytical insight. Peukert?s law attempts to capture non-ideal discharge behavior using relatively simple equa- tions. While an ideal battery with capacity C, discharged at a constant current I would be expected to have a lifetime L given by C = LI, Peukert?s law expresses this as a power law relationship, C = LI. The exponent provides a simple way to account for rate dependence. However, the values for different temperatures must be obtained empirically, and the fit is not always accurate. Though easy to configure and use, Peukert?s law does not account for time-varying loads. Most batteries in portable devices experience widely varying loads, for example, an iPhone user may run a movie player application followed by a text editor, which yields a profile with two very different loads for the battery. Massoud Pedram and Qing Wu [43] model battery efficiency, the ratio of actual capacity to theoretical capacity, as a linear quadratic function of the load current. They derive bounds on the actual power consumed for different current distributions with the same average cur- rent and show that these bounds depend on maximum and minimum values of the current. Among all distributions with the same mean, a constant current (least variance) would give the longest battery lifetime, and a uniformly distributed current (highest variance) would give the shortest. This model accounts for rate dependence and can handle variable loads. Researchers have used it, with slight modifications, to maximize the lifetime of multi-battery systems, to minimize the discharge delay product in an interleaved dual-battery system de- sign and in static task scheduling for real-time embedded systems. 3.4.3 Abstract Models Instead of modelling discharge behaviour either by describing the electrochemical pro- cesses in the cell or by empirical approximation, abstract models attempt to provide an 41 equivalent representation of a battery. Although the number of parameters is not large, such models employ lookup tables that require considerable effort to configure. In addi- tion, despite acceptable accuracy and computational complexity, these models have limited utility for design exploration because they lack analytical expressions for many variables of interest. Electrical-circuit and discrete-time models are particularly useful when compatible models of other system components, circuit models or VHSIC Hardware Description Lan- guage (VHDL) models, are available to simulate the entire system in a single continuous-time or discrete-time environment. Gold [53] proposed a PSpice model which uses linear passive elements along with voltage sources and lookup tables to model the battery behaviour. This model can represent capacity fading, effect of temperature on internal resistance. It is a continuous time model. Benini [54] proposed a discrete time model which makes use of high level hardware description languages such as VHDL. Besides modelling basic parameters, the advantage of using this model is its compatibility with system level power management designs. Some of the other models include Hageman?s PSpice model [52] for NiMH batteries, Bergveld?s electrical circuit model [50] for NiCd batteries and more recently Chen?s accurate electric model [49] for run time lifetime prediction which we use for this work and we will be discussing it in next section. 3.4.4 Analytical/Mixed Models Some mixed models based on mathematical analysis have also been proposed. They use results obtained from a series of experiments to create system level models. [44] proposes one such model, which describes a battery using two variables, derived from the lifetime values for a series of constant load tests. The parameter is a measure of the battery?s theoretical capacity, which models the rate at which the active charge carriers are replenished at the electrode surface. Accuracy of battery lifetime predictions with this model has been verified with the Dualfoil model. 42 Voltage?current characteristics Self?Discharge C Capacity IBatt IBatt RSeries C Transient_S Transient_LC R Transient_LTransient_SR V OC (V ) SOC VBatt VSense = 0 volt VSOC (0?1 volt) ?+ + ? + ?+? Battery lifetime R Figure 3.1: An Electrical Model for Lithium-ion battery Peng Rong and Pedram [47] proposed a high level battery model to estimate remaining capacity that considers both the temperature effect and capacity fading with successive cycles. They derived an expression for cell terminal voltage as a function of time and, using the Arrhenius dependence on temperature of cell kinetics and transport phenomena, obtained an expression for the bulk properties of the active material as a function of the temperature. They also derived an expression for film thickness as a function of the temperature, discharge rate, and number of cycles. 3.5 Model Used for This Work As mentioned before, we use an electrical model provided by [49]. This model is shown in Figure 3.1. One of the reasons behind choosing this model is its capability of predicting lifetime and I-V performance. Besides load current, it considers effects of temperature, number of cycles and storage time dependence of capacity on battery lifetime. This model is also scalable as it models batteries of varying AHr ratings and predicts runtime for different load current profiles. This model can be used for Lithium-ion, polymer Lithium-ion and NiMH batteries. 43 3.5.1 Description On the left side of figure 3.1, a capacitor CCapacity represents the present state of charge(SOC) of the battery and a current source IBatt models the discharge. The right side of the circuit models the voltage and current characteristics of the battery based on the current drawn from the battery. These two parts are connected to each other by a voltage controlled voltage source VSOC whose value depends on the open circuit voltage(VOC) of the capacitor CCapacity Assuming a battery is discharged from an equally charged state to the same end- of-discharge voltage, the extracted energy, called usable capacity, declines as cycle num- ber, discharge current, and/or storage time (self-discharge) increases, and/or as temper- ature decreases[49]. The usable capacity can be modelled by a full-capacity capacitor (CCapacity), a self-discharge resistor (RSelfnullDischarge), and an equivalent series resistor (the sum of RSeries,RTransientS, and RTransientL). The full-capacity capacitor CCapacity represents the whole charge stored in the battery, i.e., SOC, by converting nominal battery capacity in Ahr to charge in coulomb and its value is defined as CCapacity = 3600?Capacity?f1(Cycles)?f2(Temp) (3.5) Where, Capacity is the nominal capacity in AHr, f1 (Cycle) is a correction factor for number of cycles, f2 (Temp) is a temperature-dependent correction factor A fully charged battery can be initialised by setting the initial voltage across CCapacity (VSOC) equal to 1 V or fully discharged by setting VSOC to 0 V. In other words, VSOC represents the SOC of the battery quantitatively and 0 ? VSOC ? 1. 44 As seen from equation 3.5, CCapacity will not change with current variation, which is reasonable for the batterys full capacity because energy is conserved. The variation of current-dependent usable capacity comes from different SOC values at the end of discharge for different currents owing to different voltage drops across internal resistor (the sum of RSeries,RTransientS, and RTransientL) and the same end of discharge voltage. When the battery is being charged or discharged, current-controlled current source IBatt is used to charge or dis- charge CCapacity so that the SOC, represented by VSOC, will change dynamically. Therefore, the battery runtime is obtained when battery voltage reaches the end-of-discharge voltage. Self-discharge resistor RSelfnullDischarge is used to characterize the self-discharge energy loss when batteries are stored for a long time. Theoretically, RSelfnullDischarge is a function of SOC, temperature, and, frequently, cycle number. Practically, it can be simplified as a large resistor, or even ignored, which shows that usable capacity decreases slowly with time when no load is connected to the battery. In our implementation of the model we set its value to a very large resistance of about 1 GigaOhm. Open-circuit voltage (VOC) is changed to different capacity levels, i.e., SOC. The non- linear relation between the open-circuit voltage (VOC) and SOC is important to be included in the model. Thus, voltage-controlled voltage source VOC(VSOC) is used to represent this relation. In a step load current event, the battery voltage responds slowly. Its response curve usually includes instantaneous and curve-dependant voltage drops. Therefore, the transient response is characterized by the shaded RC network in figure 3.1. The electrical network consists of series resistor RSeries and two RC parallel networks composed of RTransientS, CTransientS,RTransientL, and CTransientL. Series resistor RSeries is responsible for the instanta- neous voltage drop of the step response. RTransientS, CTransientS, RTransientL, and CTransientL are responsible for short- and long-time constants of the step response. Theoretically, all the parameters in the proposed model are multi-variable functions of SOC, current, temperature, and cycle number. 45 3.5.2 Battery Lifetime The state of charge (SOC) is defined as 1.0 for a fully charged battery. It is represented by a voltage VSOC, which ranges between 0 and 1 volt. The charge of the battery is stored in a capacitor CCapacity whose value is determined as follows. CCapacity = 3600?Capacity?f1(cycles)?f2(Temperature) (3.6) Where, Capacity is the AHr rating of the battery. Thus, 1 AHr ? 3600 seconds is the total amount of charge in coulombs. As the battery goes through cycles of charging and discharging its capacity to hold charge is affected, re- ducing the usable capacity. That is represented by f1(Cycles). Similarly, temperature affects the usable capacity and that is represented by f2(Temp). For simplicity, we have assumed both factors to be unity in the present discussion. The resistance RSelfDischarge represents leakage when the battery is stored over a long period. For reasonable time between recharge, this can be considered to be large or practically infinite. The current source IBatt represents a source when the battery is being charged or a load when the battery is powering a circuit. In the latter case, it is the current being supplied to the DC-to-DC converter and to the circuit after conversion. When the model is used to simulate the behavior of a battery that is fully charged, VSOC is initialized to 1 volt. 3.5.3 Voltage and Current Characteristics The circuit on the right in figure 3.1 emulates the terminal voltage of the battery as it supplies current. This part is linked to the part on the left by state of charge (SOC), a quantity in the (0.0, 1.0) range. VOC(SOC) is the open circuit voltage. For Lithium-ion bat- teries, Chen and Rincon-Mora [49] empirically derive expressions for the circuit components, which all depend on SOC. 46 VOC(SOC) = ?1.031enull35nullSOC + 3.685+ 0.2156?SOC ?0.1178?SOC2 + 0.3201?SOC3 (3.7) RSeries(SOC) = 0.1562enull24.37nullSOC + 0.07446 (3.8) RTransient S(SOC) = 0.3208enull29.14nullSOC + 0.04669 (3.9) CTransient S(SOC) = ?752.9enull13.51nullSOC + 703.6 (3.10) RTransient L(SOC) = 6.6038enull155.2nullSOC + 0.04984 (3.11) CTransient L(SOC) = ?6056enull27.12nullSOC + 4475 (3.12) 3.6 Summary Many unique properties of Lithium ion batteries, including high energy density and quick recharging, have made Lithium ion batteries a popular choice of power source for portable computing devices. Since the focus of computing community has shifted to mobile, multi-function, wireless communication devices, study of batteries and research in power source optimization techniques has become an important part of product design. Various proposed battery models help designers understand impact of design decisions on battery energy and create better designs without having to set up time consuming experiments. 47 Chapter 4 DC to DC Converter 4.1 Necessity There are various reasons to convert a DC voltage of one magnitude to anoather. Firstly, most of the commercially used lithium ion batteries have rated voltages in the range of 3.7 V to 4.2 V. But modern VLSI chips run at much smaller voltages of 1 V to 1.5 V. Secondly, batteryoperated portablesystems have several different chips working together toprovide the varied functionality. These are analog, digital and mixed signal chips and they may operate on different supply voltages. A single chip may also contain multiple voltage domains. Third, as a fully recharged battery is being used, the battery voltage drops as the stored charge from the battery drains. So regulation of output voltage is required in order to maintain a steady supply to the chips. DC to DC converters are switching regulators in general. Switching regulators are more efficient than linear regulators. Linear regulators are cheap and have simple structures but they can only convert from high voltage level to low voltage level. The excess voltage appears across a resistor and produces heating in the resistor. This heat has to be dissipated. So switching regulators are useful only for conversions with low output current ratings and low difference in the voltage levels as this limits the power dissipation. Switching regulators on the other hand are highly efficient. Typical range of efficiency being 75% to 98%. This efficiency of conversion is necessary to make efficient use of limited battery energy. Switching regulators store the input energy temporarily in magnetic (inductor) or electric (capacitor) storage elements in one phase of operation and release it to the output in the next phase at a different voltage. They can convert voltages from low to high and high to low levels. They can also be designed to produce negative voltages. The drawbacks of 48 switching regulators include design complexity, high switching noise and higher cost. They also require energy management in the form of a control loop. 4.2 Topologies of Switching Regulators Switching regulators can be up converters (boost), down converter (buck) and in- verter(flyback) as shown by (a), (b) and (c) in Fig 4.1, respectively. Some regulators also provide isolation between input and output. A power switch is a key to switching regulators. Vertical Diffused MOS (VDMOS), aka Double Diffused MOS (DMOS) is used as a power switching transistor. These transistors have high switching frequencies and low power dis- sipation. An inductor is used to control the DC current through the switch, thus reducing the heating. The inductor also serves as a storage element in the charge cycle and provides the energy to the load in the discharge cycle. This makes switching regulators very efficient. The following subsection describes the operation of a buck converter. 4.2.1 Buck Converter A basic buck converter is shown in Figure 4.2 [51]. A Single Pole Double Throw (SPDT) switch is connected to input DC voltage Vg. When the switch is on position 1, output DC voltage V is equal to Vg and V is equal to 0 when the switch is at position 2. The switch position varies periodically, such that Vs(t) is a rectangular waveform having period Ts and duty cycle D as shown in Figure 4.3. The duty cycle is equal to the fraction of time that the switch is connected in position 1, and hence 0 ? D ? 1. The switching frequency fs is equal to 1/Ts. In practice, the SPDT switch is realized using semiconductor devices such as diodes, power MOSFETs, IGBTs, BJTs, or thyristors. Typical switching frequencies lie in the range 1 kHz to 1 MHz, depending on the speed of the semiconductor devices. The switch network changes the DC component of the output waveform. This component is given by the average value of the waveform obtained given by Equation 4.1 49 Vg V L CD (b) Buck Converter Vg V L C D (a) Boost Converter Vg VL C D (c) Inverter (Flyback) FET FET FET + - + + + + + - - - - - + Figure 4.1: Types of Converters 50 V = 1T s integraldisplay Ts 0 Vs(t)dt = DVg (4.1) The integral is equal to the area under the waveform, or the height Vg multiplied by the time DTs. The switch network reduces the DC component of the voltage by a factor equal to the duty cycle D. Since 0 ? D ? 1, the DC component of Vs is less than or equal to Vg. In addition to the desired DC voltage component Vs, the switch waveform Vs(t) also contains undesired harmonics of the switching frequency. In most applications, these harmonics must be removed, such that the converter output voltage v(t) is essentially equal to the DC component V = Vs. A low-pass filter is employed for this purpose. The converter of figure contains a single-section L-C low-pass filter. The filter has corner frequency f0 given by equation 4.2 f0 = 12pi?LC (4.2) The corner frequency f0 is chosen to be sufficiently less than the switching frequency fs, so that the filter passes only the DC component of Vs(t). Ideally, the power dissipated by the converter is zero. For the switch network, when the switch contacts are closed, the voltage across the contacts is equal to zero and hence the power dissipation is zero. When the switch contacts are open, then there is zero current and the power dissipation is again equal to zero. Therefore, the ideal switch network is able to change the DC component of the voltage without dissipation of power. In practice however, since the switch is realized by a MOSFET device it has finite resistance in ON mode and a leakage current in OFF mode which will result in power dissipation. Similarly an ideal filter removes the switching harmonics without dissipation of power but practical inductor and capacitors have finite DC resistances which cause power dissipation. Thus, the converter 51 Vg V Switch Network Low Pass Filter L C 1 2 Vs(t) Figure 4.2: A Simple Buck Converter Figure 4.3: Buck Converter output waveform 52 produces a DC output voltage whose magnitude is controllable via the duty cycle D, using circuit elements that (ideally) do not dissipate power. The conversion ratio, M(D), is given by the ratio of output DC voltage to input DC voltage i.e. For a Buck Converter, M(D) = VV g = D (4.3) Efficiency, ? of a DC to DC converter is defined by the ration of output DC power to the input DC power. ? = PoutP in (4.4) 4.3 Summary Although switching techniques are more difficult to implement, switching circuits have almost completely replaced linear power supplies in a wide range of portable and stationary designs. MOSFET power switches are now integrated with controllers to form single-chip solutions. With switching frequencies in MHz range, the output inductor and filter capacitors can be reduced in size, further saving valuable space and component count. As MOSFET power-switch technologies continue to improve, so will switch-mode performance, further reducing cost, size, and thermal management problems. 53 Chapter 5 System Approach for Power Source Optimization 5.1 Introduction Most of the work on low power design is focused on designing circuits which consume lower energy and power. As far as the portable electronic devices are concerned, the ultimate aim is to achieve more battery lifetime or, for rechargeable source, perform most operations between consecutive recharges. Optimization of the circuit alone for power and energy may not always result in equivalent optimization of battery lifetime. So a study of the system consisting of battery and the circuit under consideration is required in order to achieve maximum battery lifetime. In general, this lifetime should be measured in terms of the duration of the system operation. A relevant measure is the number of useful clock cycles obtained per battery life or per battery recharge. Size and weight of the batteries are major design constraints for mobile computing de- vices. Battery weights are generally proportional to their AHr ratings. Given an application with its load current requirement, a relevant problem is to find a battery with minimum size and weight to run the application. Since the energy drawn from the battery is not always equal to the energy consumed in the device, understanding battery discharge behaviour and its own dissipation are essential for optimal system design. Finding and using a suitable model for a battery is an important part of the problem. 5.2 Problem Definition Consider a typical battery powered system mentioned in section 2.4.3. The size of a battery is specified in terms of the electrical charge it can supply. A Lithium-ion battery 54 of 400mAHr can supply 400mA for one hour. It will supply 200mA for two hours. While 400mA is the rated current for this battery, up to three times the rated current or 1.2A can be drawn for a duration of 20 minutes. However, a discharge rate higher than this can cause noticeable loss in the internal impedance of the battery resulting in heating. This results in a loss of efficiency as defined below. The time for which a fully charged battery can supply current before requiring recharge is called its lifetime. Thus, Ideal Lifetime = AHr ratingLoad Current in Amperes (5.1) The end of lifetime is indicated by significant drop in the terminal voltage. Thus, the end of lifetime for a 4.2 volt Lithium-ion battery is indicated by a drop in terminal voltage below 3 volts. In practice, a battery can maintain an ideal lifetime for load currents smaller than three times the rated current. Thus, a 400mAHr battery can supply up to 1.2A current. For higher currents, there is generally a reduction in actual lifetime due to internal losses. Therefore, Efficiency = Actual lifetimeIdeal Lifetime (5.2) To avoid loss in efficiency, we must use larger battery. For lithium-ion battery 400mAHr is considered a unit cell. Using multiple cells in parallel enhances the current capacity and lifetime. Thus, a battery size N means a battery consisting of N unit cells. For example, a battery of size N = 5 will be rated at 2AHr. The problems we address here are [59]: 1. Determine the minimum voltage supply VDD for a synchronous clocked digital system that will meet the performance (critical path delay) requirement. Obtain the load current 55 for the battery. 2. Determine the minimum battery size (efficiency ? 85%) for the required load current. The lifetime of the minimum size battery will be 20 minutes. Determine the battery size for given recharge interval. For example, if the minimum battery size is N = 2 and the system recharge time is one hour, then we select a battery of size N = 6 or 2.4AHr. 3. For the selected size of the battery, we determine a low performance energy saving supply voltage VDD for which the lifetime of the battery in clock cycles is maximized. We examine these problems under various system constraints as described by following cases: ? Case I: System is performance bound ? Case II: Battery size or weight is a primary concern 5.3 Case I: System is performance bound We analyse the above mentioned problem statements for a case where the system has to meet a certain throughput requirement. We analyze these problems and propose a step wise solution to find a matching battery for an electronic system [59]: 5.3.1 Step 1: Determine circuit characteristics For understanding the effects of voltage scaling on battery efficiency, we consider a 70 million gate hypothetical system. We assume that the critical path consists of a 32-bit ripple-carry adder consisting of 352 NAND gates. The technology assumed is 45 nanometer bulk CMOS. For simulation, the predictive technology model (PTM) is used [1, 37]. The 32-bit adder was simulated using the HSPICE simulator [42]. The description of this circuit follows: 56 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110 ?9 10?8 10?7 10?6 10?5 10?4 Delay (s) VDD (volts) 10?4 10?3 10?2 10?1 100 101 Battery Current, I Batt (A) Delay Battery Current Figure 5.1: Circuit Delay and Current versus VDD obtained from HSPICE simulations ? Function: 32-bit ripple-carry adder ? Inputs: Operand A (32-bit), Operand B (32-bits), Carry-in (1-bit) ? Outputs: Sum (32-bits), Carry-out (1-bit) ? Transistors: 1,472 (352 two and three input NAND gates) ? Technology: 45nm bulk CMOS. ? Critical path: B(0) to Carry-out. Sensitizing vectors (3): A = 8hFFFF FFFF, B = 8h0000 000x, where x changes 0-1-0, Carry-in = 0. Using the Hspice simulator [42] and the 45nm PTM [1, 37], we determined the critical path delay of the 32-bit adder for VDD ranging from 1.0V to 0.1V at interval of 0.1V. This is shown in Figure 5.1. We found that the although the circuit slows down by more than three orders of magni- tude, it works correctly upto VDD = 0.1V , which is below the threshold voltage of 0.292V for the 45nm PTM devices [37]. Next, to determine the average current we simulated the circuit using 100 random vectors. The simulation was repeated for all the same values of 57 VDD as before. In each case, vectors were applied at an interval equal to the corresponding critical path delay. Assuming a similar activity for the entire 70 million gate system, the average current measured for the 352-gate adder from Hspice simulation was multiplied by 200,000. Considering a 100% efficiency DC-to-DC converter that translates VDD to the 4.2V rated terminal voltage of Lithium-ion battery, we determine the battery load current IBatt by multiplying the circuit current by VDD/4.2. That IBatt as a function of VDD is shown in Figure 5.1. Now, as mentioned in the problem statement, we determine the operating voltage of the circuit based on the throughput requirements. e.g if the circuit needs to work at 200MHz, then from Figure 5.1, the operating voltage is 0.6 V and the corresponding current drawn from the battery is 477mA. 5.3.2 Step 2: Determine smallest battery size The model of the selected battery type is simulated for various current loads obtained in the previous step. Every battery type has its terminal voltages corresponding to fully charged state and fully discharge state. Using the load current, scaled for the ratio of battery voltage to circuit VDD, the battery model is simulated to determine the terminal voltage as a function of time. In practice this scaling is achieved by a DC-to-DC converter that is known to have high conversion efficiency (greater than 90%) [54, 43]. Alternatively, the circuit of DC-to-DC converter can be attached to the battery model. The time between the fully charged state to the fully discharged state gives the battery lifetime in time units (seconds). This is repeated for increasing battery sizes,normalized with respect to the smallest unit. A lower bound on battery size is determined for a minimum of 85% efficiency. While the selected battery should not be smaller, its actual size is determined by the recharge interval requirement of the system. We assume the use of Lithium-ion batteries with a unit battery (N = 1) of 400mAHr rating. As an example, consider the battery load current IBatt = 3.6A for VDD = 0.9V in 58 0 200 400 600 800 1000 12002.2 2.4 2.6 2.8 3 3.2 3.4 3.6 3.8 4 Time (seconds) V Batt (volts) Battery Capacity = 1.2AHr IBatt= 3.6A Figure 5.2: VBatt Vs Time when a battery of 1.2 AHr capacity is subjected to load current, IBatt = 3.6A Figure 5.1. Figure 5.2 shows the battery terminal voltage VBatt obtained from HSPICE [42] simulation of the battery model of Figure 3.1. In this figure, the battery size is N = 3, i.e., Capacity = 1.2AHr. The leakage resistance, usually very large, was taken as 1 gigaohms. All other parameters of the battery model have been described in Section 3.5. From Figure 5.2, the terminal voltage drops to 3.0V, i.e., battery needs recharge, after it supplies current for 1008 seconds. This is the actual lifetime for this battery. From equation 1, the ideal lifetime is 36001.2/3.6 = 1200 seconds. This, according to equation 2, gives an 84% efficiency. Figure 5.3 shows the battery efficiencies obtained in this way for various battery sizes and for varying load currents. We observe, 1. When the load current is small compared to the AHr rating, the efficiency is 100% or higher. For example, for a battery of size N = 5 (2AHr) the efficiency for IBatt = 0.6A is 59 0 1 2 3 4 5 60 20 40 60 80 100 120 Battery size (N) (For N=1, Battery Capacity= 400mAHr) Battery Efficiency (%) 0.6 A 1.2 A 1.8 A 2.4 A 3.0 A 3.6 A 4.2 A 4.8 A 5.4 A 6.0 A Battery Load Current,I Batt Figure 5.3: Battery efficiency versus battery size for various load currents 107%. 2. When the load current is large compared to the AHr rating, the efficiency can be signifi- cantly lower. The 85% line is shown to indicate that a power source with lower efficiency may be considered unacceptable. For any given load current this 85% line allows us to determine the smallest battery that can be used. Continuing further with our example from previous subsection, with a current of 477 mA and an efficiency of ? 85%, a battery of size 400 mAHr is chosen. Now this battery is simulated for entire range of voltages and then a graph of supply voltage versus number of cycles per recharge is plotted as shown in figure 5.4. This graph also indicates that as we move towards right from the dotted line the circuit throughput increases and battery efficiency decreases, while moving towards left increases the battery lifetime decreasing throughput. 5.3.3 Step 3: Meeting the lifetime requirement While the smallest size battery has advantages of weight and cost, it can provide a lifetime (time between recharges) of about 1,000 seconds. This is often not sufficient. Figure 5.1 is used to determine the battery current IBatt for given performance requirement. 60 Figure 5.4: Simulation of a 400 mAHr battery for a range of supply voltages (VDD) Table 5.1: High performance and minimum energy modes of operation. Battery 200MHz, VDD = 0.6V 5MHz, VDD = 0.3V size Effici. Lifetime Effici. Lifetime N AHr % sec. cycles % sec. cycles 1 0.4 98 3000 619?109 > 100 414?103 1660?109 4 1.6 103 12300 2540?109 > 100 1364?103 6630?109 Again, continuing with our previous example, consider the system has a battery lifetime requirement of 3 hours. From figure 5.3, the minimum size battery i.e. 400 mAHr (N=1) gives 98% efficiency and hence the lifetime is 3600?0.98?0.4/0.477 = 2952 seconds. To meet the requirement of 3 hours, i.e 10800 seconds, We, therefore, use the battery size of N = 10800/2952 = 3.658 ? 4. So we select a battery of 1600 mAHr. Number of cycles obtained per recharge with these batteries is as shown in the figure 5.5 5.3.4 Step 4: Determine minimum energy modes The previous step determines two battery sizes, namely, the smallest usable battery that meets the performance requirement and another size that can meet both performance and recharge interval requirements. We now determine maximum lifetime modes for each battery. In this mode the performance requirement is completely relaxed and the supply 61 Figure 5.5: Battery lifetimes in clock cycles as a function of chip voltage for 400 mAHr and 1600 mAHr batteries voltage (VDD) is determined for maximum lifetime in clock cycles. For some nanometer technologies, this VDD can be below the sub-threshold voltage [57]. Most electronic systems have performance and uninterrupted operation requirements that determines the battery size as discussed above. But, a system does not always operate in the maximum performance environment. Lowering VDD that can be easily done by the DC-to-DC converter reduces IBatt and hence extends the battery lifetime. Critical path delay, however, increases and clock frequency must be reduced. A relevant measure of lifetime, therefore, is the lifetime in number of clock cycles. Thus, instead of expressing the lifetime in raw seconds, we express it in terms of computational work units. Figure 5.6 shows the lifetime in clock cycles as a function of VDD for the two batter- ies of Table 5.1. According to Figure 5.1, the critical path delay for VDD = 0.3V is 0.2s, giving a clock frequency of 5MHz. The high performance mode and the minimum energy modes are summarized in Table 5.1. The minimum energy mode increases the time between recharges by thousand fold. That is misleading because the clock frequency is reduced 100 times. However, it does provide more than two fold increase in the number of clock cycles 62 Figure 5.6: Battery lifetimes in clock cycles as a function of chip voltage for 400 mAHr and 1600 mAHr batteries per battery recharge. 5.4 Case II: Battery size or weight is a primary concern Some applications call for a special set of requirements from the circuit due to a stringent limit on battery size and weight. Applications such as bio-implantable devices, wearable computing devices, hearing aid cannot exceed a certain volume or weight of the battery. Such devices often do not have very high performance requirements. These devices make use of lithium ion batteries which are light weight, have high energy density and are less bulky. One such popular battery is CR2032(CR) and its properties are as described below. Note that even though the battery rating is 225 mAHr, the maximum current that the batter can provide is only 3 mA. CR2032 Lithium ion battery: ? Nominal Voltage: 3V 63 Figure 5.7: Battery lifetimes in number of clock cycles for CR2032 with max. Ibattery = 3mA ? Capacity: 225mAHr ? Nominal Current: 0.3 mA ? Maximum Current:3 mA A four step analysis, similar to that explained for the previous case, can be carried out for this case. Simulation of the above mentioned CR2032(CR) battery is shown in Figure 5.7. It is clear from Figure 5.7 that though ideal battery can keep providing higher number of cycles for voltage ? 0.3 V, practically it would have lower efficiency since the maximum current battery can supply is only 3 mA. 5.5 Summary This chapter shows how a power source is selected to economically satisfy the operational requirements of a system. An electrical model of a battery allows the determination of its lifetime and efficiency. Lifetime measured in terms of clock cycles is shown to be a useful measure. Simulation of the battery as well as that of the circuit being powered allows determination of high performance and minimum energy operational modes. Other 64 applications of battery analysis may be in assessing and optimizing the power management techniques. Given the size of the battery, its efficiency reduces for higher currents. While power reduction is necessary from temperature and other environmental requirements of semiconductor chips, the influence of power reduction on battery lifetime is important for portable devices. 65 Chapter 6 Instruction Slowdown Method 6.1 Problem Statement Consider a processor built in certain semiconductor technology. If we reduce the supply voltage V, the critical path delay will increase and hence the maximum clock frequency f will have to be decreased. This will reduce the dynamic power in proportion to V 2f. Static power will also decrease as V 2. However, a measure of energy a computing task will use is the total energy per cycle (EPC), consisting of dynamic EPC and static EPC. Dynamic EPC is proportional to V 2 and static EPC is proportional to V 2/f. We notice that dynamic EPC always reduces with voltage scale down. However, static EPC is proportional to 1/f, which will increase rapidly as V approaches the threshold voltage. Thus, for a given technology (i.e., given threshold voltage), there is an optimum supply voltage and a corresponding clock frequency that minimize the total EPC. Any further power reduction by voltage scaling beyond this optimum value will incur an increase in the total EPC, although power will reduce. As the supply voltage gets closer to the threshold voltage, the performance also becomes sensitive to process variation that is common in nano-scale technologies. In practice, therefore, the supply voltage has a lower bound [61]. If further power reduction is required, say, due to battery characteristics, thermal factors or other operational considerations, then clock frequency alone would have to be reduced. This will reduce power but increase energy per cycle (EPC). Dynamic voltage control within a clock period [27] can reduce the EPC but, as pointed out earlier, requires complex control circuitry. We assume a situation where voltage is at its lowest permissible limit and power must be reduced. Traditionally, we would slow down the clock and let EPC increase. This will be a performance-power trade off that involves an essential energy penalty. We explore an 66 alternative solution in which clock is not slowed down but performance is degraded, similar to clock slowdown, for power reduction while energy penalty is reduced, especially for high leakage technologies. 6.2 Background on Clock Slowdown (CSD) for Power Reduction Clock slowdown (CSD) is a known technique for power reduction and we use it as a reference for evaluating the proposed method. When we slow down the clock, dynamic power is reduced in proportion to the clock rate, while leakage power remains unchanged. The computing task now takes longer to complete. This results in the same dynamic energy consumption whereas the leakageenergy consumed is more. We will use a processor slowdown factor n. Without loss of generality, n is assumed to be an integer. Thus, n = 1 is the normal (rated-clock) operation. Let us define: n = processor slowdown factor (6.1) f = rated clock frequency in Hz (6.2) Pd = dynamic power with rated clock (6.3) Ps = static power with rated clock (6.4) k = Ps/Pd = static power ratio (6.5) T = time duration of a computing task (6.6) When the processor is slowed down by a factor of n, its power consumption is given by, PCSD(n) = Pdn +Ps = Pd1 +knn (6.7) We notice that a computing task of original duration T is now completed in duration nT. However, we may expect that a reduced current from the battery will result in an enhanced capacity to supply energy and increase the lifetime, L. However, we may expect 67 that a reduced current from the battery will result in an enhanced capacity to supply energy and increase the lifetime, L. This is often represented by Peukert?s law [21, 38]: L = C1/I? = C2/P? (6.8) where C1 and C2 are constants related to the battery capacity, I is the current, and P is power assumed to be drawn at a constant rated voltage. In reality, this condition assumes a study current. Though not a reality for digital circuits, this condition can be maintained by using a supercapacitor and battery combination [31]. In this case, the current fluctuations are smoothened by a large capacitor of several farads capacity. The exponent, ?, in equation 6.8 can take different values depending on the type of battery, for the present illustration we use ? = 1.3. Next, we denote the power and energy savings by the following ratios: PCSDratio = PCSD(n)P CSD(1) = 1 +knn(1 +k) (6.9) LCSDratio = 1n ? 1(P CSDratio)? (6.10) and, ECSDratio = nPCSDratio (6.11) We observe that for very low leakage, k ? 0, PCSDratio = 1/n and LCSDratio = n0.3/(1+ n), which show power saving with lifetime enhancement at least for small values of n. To consider very high leakage technologies, let us assume k = 1. Then PCSDratio = (1+n)/(2n). CSD now cannot reduce the power ratio below 0.5 and there is battery lifetime degradation for any clock slowdown factor n. These trends are illustrated in Figure 6.1. 68 1 2 3 4 5 6 7 8 90 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Slowdown Factor, n L CSDratio or P CSDratio PCSDratio(k=0) LCSDratio(k=0) PCSDratio(k=1) LCSDratio(k=1) High Leakage (k=1) Low Leakage (k=0) Figure 6.1: Clock slowdown (CSD) power and battery lifetime ratios for low and high leakage technologies. 6.3 Use of NOP for Power In the next section, we will introduce a new power reduction method called instruction slowdown (ISD) [10]. The processor is slowed down not by clock slowdown but by inserting NOP cycles. The NOP instruction has been used for power optimization. Najeeb et al. [25] mix NOP instructions in an instruction sequence to produce a maximum power consuming cycle, which they term as power virus. Such an instruction sequence is useful for the design and test of the processor. Lotfi-Kamran et al. [23] suggest freezing certain data bits in a pipeline processor whenever a NOP, either contained in the instruction stream or generated due to hazards, is executed. They report about 10% power saving with a modest hardware overhead of 0.1%. Hurd [13] describes a technique of manipulating the positions of NOP instructions in a multiple instruction word architecture so that certain instructions need not be fetched. In another technique, also due to Hurd [12], a NOP instruction is replaced by another instruction called ?proxy NOP?. This instruction uses the data patterns of its 69 neighboring instruction but executes like NOP. It thus reduces activity in the datapath. None of these techniques perform the power management as discussed in the following section. 6.4 Instruction Slowdown (ISD) In this new methodology [10], the operation of a processor is slowed down for power re- duction by inserting non-functional cycles while the rated clock frequency (f) is maintained. This is similar to inserting instruction we call SLOP (slowdown for low power). Although it is described as a purely hardware induced operation, SLOP can be included in the software instruction set. In a typical implementation, a power management unit (PMU) monitors the system and, if necessary, determines an appropriate slowdown factor (n), which is supplied to the control. The control then inserts the required number of SLOPs in the pipeline. The factor n is assumed to be an integer here but, in general, can be any number that determines the percentage of SLOPs to inserted in the instruction stream. Hardware execution of SLOP resembles a conventional NOP, stall or bubble [26] with a few differences. First, its execution in a pipeline requires no ?fetch? because the control gen- erates it locally. Second, the control generates low power mode signals for various hardware units. To analyze the power and energy relations, we will use the same symbol definitions as in the previous section. We also define a SLOP power factor: ? = power consumed by SLOPav. power consumed by non NOP instr. (6.12) where 0 ? ? ? 1. For a slowdown factor n, we insert n?1 SLOPs after each instruction. Consider a period of 1 second, containing f clock cycles. The energy consumed during a regular instruction (assumed to be non-NOP) cycle is Pd(1+k)/f and that during a SLOP cycle is ?Pd(1+k)/f. Of those f cycles, f/n are regular instruction cycles and (n?1)f/n are SLOP cycles. Thus, total power consumption, or energy dissipated per second, is obtained as, 70 1 2 3 4 5 6 7 8 90 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Slowdown Factor, n L ISDratio or P ISDratio PISDratio(k=0) LISDratio(k=0) PISDratio(k=1) LISDratio(k=1) High Leakage (beta = 0.1) Low Leakage (beta = 0.5) Figure 6.2: Instruction slowdown (ISD) power and battery lifetime ratios for low and high leakage technologies. PISD(n) = Pd(1 +k)f ? fn + ?Pd(1 +k)f ? (n?1)fn = Pd(1 +k)?n?? + 1n (6.13) Similar to the CSD, now also a computing task of original duration T will require nT time. We find the power and battery lifetime ratios as follows: PISDratio = PISD(n)P ISD(1) = ?n?? + 1n (6.14) LISDratio = 1n ? 1(P ISDratio)? (6.15) 71 These lifetime and power ratios as functions of slowdown factor n are shown in Fig- ure 6.2. The ratios below 1 indicate both power reduction (desirable) and lifetime reduction (undesirable). Notice that power (solid line) is always reduced. More reduction is achieved for higher leakage (? = 0.1) technology. Lifetime (dotted line) for high leakage improves for small n and then degrades because the NOP cycles consume non-zero energy. However, the lifetime degrades for low leakage technology in a similar way as it did for CSD with high leakage. 6.5 Hardware Implementation of SLOP We used a 32-bit MIPS pipelined processor for evaluation of the ISD and CSD methods. It has a conventional five-stage pipeline containing the fetch (IF), decode (ID), execute (EX), memory (DM) and write-back (WB) stages [26]. It also contains hazard and forwarding units. We obtained an available VHDL model [9] and synthesized using Mentor Graphics Leonardo Spectrum. This provided us a gate-level model for power analysis. Various blocks of the processor were extracted as transistor-level netlists using Mentor Graphics Design Architect. Each block was simulated in HSPICE for 1,000 random input vectors with 10ns clock rate (f = 100MHz) to determine the average per cycle dynamic and static energy dissipation. This evaluation was repeated for five CMOS technologies, 180nm, 90nm, 65nm, 45nm and 32 nm, using the predictive technology models (PTM) [1, 4, 37]. The simulation assumed 90oC temperature. A sample result for 32nm is shown in Table 6.1. The last three columns of this table are discussed in a later subsection. Communication buses are not considered separately because all drivers and buffers are included as parts of various hardware blocks. 6.6 Estimating Leakage Factor, k We wrote a MIPS program that multiplies hexadecimal integers FFFF and 0004 by repeated additions. Our processor has separately addressable instruction (IM) and data 72 0000 LW $1, X:0002($0) 0001 ADD $4, $1, $0 0002 ADD $1, $0, $0 0003 LW $3, X:0004($0) 0004 LW $2, X:0003($0) 0005 BEQ $2, $0, X:0003 0006 SUB $2, $2, $3 0007 ADD $1, $1, $4 0008 J X:0000005 0009 SW $1, X:0004($3) 000A #J X:000000A(HALT) Figure 6.3: A MIPS program used for power estimation. (DM) memories. Initially, DM(2) = FFFF, DM(3) = 4, DM(4) = 1. Final result is DM(5) = 0003FFFC. The MIPS code is given in Figure 6.3. This program completes in 34 cycles. The number of times pipeline stages are activated are: 34 IF, 29 ID, 18 EX, 4 DM and 14 WB. The execution statistics of hardware stages and the instruction mix as well as the number of cycles can be easily changed by varying the parameters in the program. It was assembled by hand and the gate-level model was simulated using Mentor Graphics ModelSim. The final result was verified. For power, active blocks in a pipeline stage were identified. Total energy of the pipeline stage was computed by adding the dynamic and static energies of its active blocks. After characterizing each pipeline stage for its energy, the total energy of the program was computed by adding energies of pipeline stages as per the numbers obtained above. The dynamic energy was added up for active stages while the static energy was added up for all blocks for 34 cycles, using the technology- specific data (e.g., Table 6.1 for 32nm). The ratio of total static energy to dynamic energy for each technology gives the respective value of the leakage factor k shown in Table 6.2. 73 6.7 Power Management for SLOP Table 6.1 quantitatively shows how power was reduced by clock gating (CG), power gating (PG) and drowsy memories. Power gating (PG) focuses on leakage. Circuit level approaches for leakage reduction include body bias control [6], dual threshold domino logic [5, 17], input vector control [15] and power gating [11, 18, 29]. We adopt power gating for combinational blocks. It is assumed that the supply line will be gated by pull-up or a pull-down devices that will be put in the cutoff mode during SLOP cycles. This will almost completely eliminate both static and dynamic power during those cycles [14]. We must, however, realize that power gating at clock cycle level represents a design challenge. Studies [6, 32] show that improvements will be needed both in the speed and energy cost of power control and implemented in the present- day design. The basic strategy in power gating is to provide two modes: a low power mode idle stage and an Active mode. The goal is to switch between these modes at appropriate time and in appropriate manner so as to maximize power savings while minimizing the effect on performance. Power gating can be done at the system level which includes a software (OS) controlled power gating of entire CPU or core when the OS detects an idle loop of sufficient duration. Dynamically power gating selected units within a pipeline of a processor is another technique which exploits workload phases and characteristics [11]. Power gating can be implemented in fine grained or coarse grained manner. In fine grained approach, the gating switch is placed in the standard cell library which increases cell area. In coarse grain approach, a component or a set of gates is switched by a collection of switches [18]. Coarse grained approach has less area overhead but involves design complexity to control the switches. Drowsy mode for caches: Cache memories represent significant fraction of chip area in modern microprocessors. These include multiple levels of instruction caches and data caches. The dynamic and leakagepower consumed by instruction and data caches isa sizeable portion of total power consumed by the processor. In the instruction slowdown approach we have 74 1 2 3 4 5 6 7 8 90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Slowdown Factor, n P CSD (n) / P CSD (1) 32nm 45nm 65nm 90nm 180nm Figure 6.4: Clock slowdown (CSD) power ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. CSD is more effective for low leakage (180nm) technology. considered clock gating in order to reduce the dynamic power consumption but the leakage power remains the same. There are techniques to reduce this leakage power consumption so as to achieve additional saving. For a given period of time, cache memories generally have their active operations centered to a small number of cells and hence the other cells are not in active state. During SLOP cycles, the memory cells are put into low voltage ?drowsy mode?, which can allow up to 75% energy reduction with no more than 1% performance overhead [7]. In addition, decoder and sense amplifier can be power gated. Another technique identifies an application?s cache requirements dynamically, and uses a circuit-level mechanism, ?gated- Vdd?, to gate the supply voltage to the SRAM cells of the cache?s unused sections to reduce leakage [29]. Clock gating (CG) is applied to registers. Their power is not gated because the state must be preserved. A significant fraction of the dynamic power in a processors is consumed by the clock network and flip-flops. It?s a major component because the clock is fed to most of the circuit blocks and it changes every cycle. The clock buffers can consume 50% or more of total dynamic power [18, 36]. Clock gating turns off the clocks when they are not 75 1 2 3 4 5 6 7 8 90 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Slowdown Factor, n L CSD (n) / L CSD (1) 32nm 45nm 65nm 90nm 180nm Figure 6.5: Clock slowdown (CSD) battery lifetime ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratios greater than 1 indicate increased battery lifetime through clock slowdown for low leakage 90nm and 180nm technologies. required or stop them from feeding to the components which are not being used. Results show that up to 43% power saving can be achieved with a possible 20% reduction in area when clock gating replaces the state-retention feedback logic of flip-flops [28]. The clock gating employed in the register file with high switching activity of about 0.25 shows that power saving of about 70% can be achieved [24]. At the time of this writing, we have not completed an evaluation of these techniques. The data in the last two columns of Table 6.1 is based on the references cited here. To compute the SLOP power factor (?) we first weight columns 2 and 3 by columns 5 and 6, respectively. The dynamic and static power of a SLOP cycle is then calculated in a similar way as described before for a regular instruction. The ratio of the power of SLOP cycle to that of the regular instruction cycle is ? given in Table 6.2. 76 1 2 3 4 5 6 7 8 90.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Slowdown Factor, n P ISD (n) / P ISD (1) 32nm 45nm 65nm 90nm 180nm Figure 6.6: Instruction slowdown (ISD) power ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. ISD gives greater power saving for higher leakage technologies. 6.8 Results Figures 6.4 and 6.5 display power and battery lifetime ratios as functions of the clock slowdown (CSD) factor n for five CMOS technologies. These graphs were computed from equations 6.9 and 6.10, respectively, using values of leakage factor k taken from Table 6.2. We observe that the CSD method degrades for technologies that are finer than 65nm. This is because as n increases, leakage power becomes a dominant factor in the total power. Besides, saving of dynamic energy is compensated for by increase of leakage energy. Figures 6.6 and 6.7 display power and battery lifetime ratios as functions of the instruction slowdown (ISD) factor n for five CMOS technologies. These graphs were computed from equations 6.14 and 6.15, respectively, using values of SLOP power factor ? taken from Table 6.2. Because ISD is assisted by hardware in reducing leakage for the SLOP cycles, we see greater savings of power for high leakage 32nm technology. To compare the two methods directly, we use equations 6.7 and 3.11 to obtain the following ratio: PCSD PISD = 1 +kn (1 +k)(?n?? + 1) (6.16) 77 1 2 3 4 5 6 7 8 90 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Slowdown Factor, n L ISD (n) / L ISD (1) 32nm 45nm 65nm 90nm 180nm Figure 6.7: Instruction slowdown (ISD) battery lifetime ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratios greater than 1 indicate increased or undegraded battery lifetime through instruction slowdown for high leakage 32nm and 45nm technologies. The graph in Figure 6.8 shows this ratio as a function of the slowdown factor n for five technologies in the range 180nm through 32nm. The ratio = 1 horizontal line divides this graph in two parts. Points above this line favor ISD and those below favor CSD. The curves will shift upward with improved dynamic power management in high leakage technologies. Results for battery lifetime are shown in Figure 6.9. Since Peukert?s law models only limited properties of a battery. We simulated a repre- sentative case of ISD for 32 nm with the battery model [49] mentioned in section 3.5. For such a model, we define Ideal lifetime as, Ideal Lifetime = AHr ratingLoad Current in Amperes (6.17) A graph of power ratios, energy ratios and ideal battery lifetime ratios against slow down factor, n, is plotted and is as shown in Figure 6.10. From this graph, it is clear that 78 1 2 3 4 5 6 7 8 90.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Slowdown Factor, n P CSD / P ISD 32nm 45nm 65nm 90nm 180nm Figure 6.8: Clock slowdown (CSD) vs. instruction slowdown (ISD) power ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratio > 1.0 indicates the advantage of ISD for 32nm and 45nm technologies. with increasing slow down factor, power reduces, energy increases and ideal battery lifetime also reduces due to increase in energy. Ideal battery, however, does not consider the increase in efficiency of the battery due to reduced power (and hence the current drawn from the battery). When the ideal battery was replaced with a practical battery as represented by the model mentioned in section 3.5, we see different results as shown in Figure 6.11 Here zero number of SLOPs correspond to slow down factor (n) of 1, one number of SLOP corresponds to slow down factor (n) of 2 and so on. As we can observe in Figure 6.11, the lifetime saving achieved using ISD exceeds the task completion time for 1, 2 and 3 SLOPs with peak saving at 2 SLOPs. This indicates that for these cases, we gain in terms of battery lifetime with slow down. 79 1 2 3 4 5 6 7 8 90.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Slowdown Factor, n L CSD / L ISD 32nm 45nm 65nm 90nm 180nm Figure 6.9: Clock slowdown (CSD) vs. instruction slowdown (ISD) battery lifetime ratios for 180nm, 90nm, 65nm, 45nm and 32nm CMOS technologies. Ratio < 1.0 indicates the advantage of ISD for 32nm and 45nm technologies. Figure 6.10: Power ratio, energy ratio and ideal battery lifetime ratio plotted against slow down factor,n, for ISD in 32nm 80 Table 6.1: HSPICE simulation (32nm CMOS, 90oC). Hardware Energy/cycle SLOP power block Dyn. Stat. Power Dyn. Stat. nJ nJ mode % % PC 85114 17742 CG 25 100 PC+1 adder 28947 6536 PG 0 0 IM 6780 3209 Drowsy 25 25 Regfile 98262 192375 CG 30 100 Forwarding 31297 4090 PG 0 0 Hazard 25421 3744 PG 0 0 Controller 14338 2973 None 100 100 32-b ALU 263815 22346 PG 0 0 32-b comp 39710 5695 PG 0 0 DM 64343 50699 Drowsy 25 25 3-1 mux 392374 56299 PG 0 0 2-1 mux 204456 44106 PG 0 0 BrnchAddrCal 181878 13680 PG 0 0 IF/ID reg 156027 32048 CG 50 100 ID/EX reg 213447 58412 CG 50 100 EX/DM reg 131033 34324 CG 50 100 DM/WB reg 127885 33481 CG 50 100 ForwDM/WB 5820 1009 PG 0 0 81 Table 6.2: Leakage factor (k) and SLOP power factor (?). Technology Leakage factor k SLOP power factor ? 180nm 0.097 0.265081 90nm 0.124 0.23699 65nm 0.268 0.212003 45nm 0.353 0.183881 32nm 0.413 0.159012 Figure6.11: Circuit energy, battery lifetime and taskcompletion timeplotted against number of SLOPs, for ISD in 32nm 82 Chapter 7 Conclusion This work provides an insight into the power source optimization techniques. We present a broad categorization of optimization techniques and propose two methods which fall in voltage management and functional management categories. First method demonstrates how a power source is selected to economically satisfy the operational requirements of a system. An electrical model of a battery allows the determi- nation of its lifetime and efficiency. Lifetime measured in terms of clock cycles is shown to be a useful measure. Simulation of the battery as well as that of the circuit being powered allows determination of high performance and minimum energy operational modes. Other applications of battery analysis may be in assessing and optimizing the power management techniques. Given the size of the battery, its efficiency reduces for higher currents. While power reduction is necessary from temperature and other environmental requirements of semiconductor chips, the influence of power reduction on battery lifetime is important for portable devices. The other proposed method of instruction slowdown (ISD) has advantages in power saving for high leakage technologies. We suggest combining the slowdown methods with overall supply voltage scaling. Voltagereduction will save dynamic and static power as well as energy. But the increased hardware delay will necessitate a clock slowdown. Thus, for n = 2, CSD may be used. Thereafter, n > 2 slowdown should use ISD. The throughput aspect of slowdown methods is not studied. CSD preserves all hazard penalties and throughput drops as 1/n. ISD will eliminate hazards progressively as n increases. SLOP is presented purely as an internal mechanism supported by power management and control hardware. 83 Its inclusion in the instruction set will allow compilers to explore creative ways to use the power management hardware. 84 Bibliography [1] http://www.eas.asu.edu/ptm. [2] L. Benini and G. D. Micheli, ?Dynamic Power Management, Design Techniques and CAD Tools?, Springer, 1998. [3] I. Buchmann, ?Batteries in a PortableWorld: A Handbookon Rechargeable Batteries for Non-Engineers?, Richmond, British Columbia: Cedex Electronics, Inc., second edition, 2001. [4] Y. Cao, T. Sato, D. Sylvester, M. Orshansky, and C. Hu, ?New Paradigm of Predic- tive MOSFET and Interconnect Modeling for Early Circuit Design?, in Proc. Custom Integrated Circuits Conference, 2000, pp.201-204. [5] S. Dropsho, V. Kursun, D. H. Albonesi, S. Dwarkadas, and E. G. Friedman, ?Manag- ing Static Leakage Energy in Microprocessor Functional Units?, in Proc. 35th Annual International Symp. Microarchitecture, MICRO, 2002, pp. 321-332. [6] D. Duarte, Y. F. Tsai, N. Vijaykrishnan, and M. J. Irwin, ?Evaluating Run-Time Tech- niques for Leakage Power Reduction?, in Proc. 15th International Conf. VLSI Design, 2002. [7] K. Flautner, N. S. Kim, S. Martin, D. Blaauw, and T. Mudge, ?Drowsy Caches: Sim- ple Techniques for Reducing Leakage Power?, in Proc. International Symposium on Computer Architecture, 2002, pp.148-157. [8] M. Horowitz, T. Indermaur, and R. Gonzalez, ?Low-Power Digital Design?, in Proc. International Symp. Low Power Electronics and Design, 1994, pp. 8-11. [9] A. Arthurs and L. Ngo, ?Analysis of the MIPS 32-Bit, Pipelined Processor Using Syn- thesized VHDL,? Technical report, University of Arkansas, Department of Computer Science and Engineering. www.csce.uark.edu/?ajarthu/papers/mips vhdl.pdf. [10] Khushaboo Sheth, ?A Hardware-Software Processor Architecture using Pipeline Stalls for Leakage Power Management?, Master?s Thesis, Auburn University, December 2008 [11] Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose, ?Mi- croarchitectural Techniques forPower GatingofExecution Units?, in Proc. International Symp. Low Power Electronics and Design, 2004, pp. 32-37. [12] L. L. Hurd, ?Power Reduction for Multiple-Instruction-Word Processors with Proxy NOP Instructions?, U.S. Patent 6535984, March 18, 2003. 85 [13] L. L. Hurd, ?Power Saving by Disabling Memory Block Access for Aligned NOP Slots During Fetch of Multiple Instruction Words? U.S. Patent 6442701, August 27, 2002. [14] J. Frenkil and S. Venkatraman, ?Power Gating Design Automation?, in D. Chinnery and K. Keutzer, ?Closing the Power Gap Between ASIC and Custom Tools and Techniques for Low-Power Design?, chapter 10, pp.251-280, Springer, 2007. [15] M. C. Johnson, D. Somasekhar, L.-Y. Chiou, and K. Roy, ?Leakage Control with Ef- ficient Use of Transistor Stacks in Single Threshold CMOS?, IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 10, no. 1, pp.1-5, Feb. 2002. [16] ?Mobile Intel Pentium 4 Processor with 533 MHz Front Side Bus?, Intel Incorporation, January 2004. [17] J. T. Kao and A. P. Chandrakasan, ?Dual-Threshold Voltage Techniques for Low-Power Digital Circuits?, IEEE Journal of Solid-State Circuits, vol. 35, no. 7, pp. 1009-1018, July 2000. [18] M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi, ?Low Power Methodology Manual for System On Chip Design?, Boston: Springer, 2008. [19] S. Narendra, A. Chandrakasan, ?Leakage in Nanometer CMOS Technologies?, Springer, 2006 [20] Gary Yeap, ?Practical Low Power Digital VLSI Design?, Boston: Kluwer Academic Publishers, 1998 [21] D. Linden and T. Reddy, ?Handbook of Batteries?, 3rd Edition. McGraw-Hill, 2001. [22] J. M. Rabaey, M. Pedram, ?Low Power Design Methodologies?, Kluwer Academic Pub- lishers, 1996. [23] P. Lotfi-Kamran, A. Rahmani, A. Salehpour, A. Afzali-Kusha, and Z. Navabi, ?Stall Power Reduction in Pipelined Architecture Processors?, in Proc. of 21st International Conference on VLSI Design, 2008, pp. 541546. [24] M. Mueller, A. Wortmann, S. Simon, M. Kugel, and T. Schoenauer, ?The Impact of Clock Gating Schemes on the Power Dissipation of Synthesizable Register Files?, in Proc. International Symp. Circuits and Systems, volume 2, 2004, pp. 609-612. [25] K. Najeeb, V. V. R. Konda, S. S. Hari, V. Kamakoti, and V. M. Vedula, ?Power Virus Generation Using Behavioral Models of Circuits, in Proc. 25th IEEE VLSI Test Symposium?, 2007, pp. 35-40. [26] D. A. Patterson and J. L. Hennessy, ?Computer Organization and Design: The Hard- ware/Software Interface?, Fourth Edition. Morgan Kaufmann, 2009. [27] B. Yu and M. L. Bushnell, ?A Novel Dynamic Power Cutoff Technique (DPCT) for Active Leakage Reduction in Deep Submicron CMOS Circuits?, Proc. International Symp. Low Power Electronics and Design, pp.214-219, 2006. 86 [28] K. C. Pokhrel, ?Physical and Silicon Measures of Low Power Clock Gating Success: An Apple to Apple Case Study?, Synopsys Users Group (SNUG), 2007. [29] M. Powell, S.-H. Yang, B. Falsafi, K. Roy, and T. N. Vijaykumar, ?Gated-Vdd: A Circuit Technique to Reduce Leakage in Deep-Submicron Cache Memories?, in Proc. International Symp. Low Power Electronics and Design, 2000, pp. 90-95. [30] V. Tiwari, P. Ashar, S. Malik, ?Technology Mapping for Low Power?, 30th Design Automation Conference, 1993, pp. 74-79 [31] R. F. Service, ?New Supercapacitor Promises to Pack More Electrical Punch?, Science, vol. 313, p.902, 18 Aug. 2006. [32] J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V. De, ?Dynamic Sleep Transistor and Body Bias for Active Leakage Power Control of Microprocessors?, IEEE Jour. Solid-State Circuits, vol. 38, no. 11, pp. 1838-1845, Nov. 2003. [33] O. S. Unsal, I. Koren, C. M. Krishna, and C. A. Moritz, ?Cool-Fetch: Compiler-Enabled Power-Aware Fetch Throttling?, IEEE Computer Architecture Letters, vol. 1, Apr. 2002. [34] H.Wang, Y. Guo, I. Koren, and C. M. Krishna, ?Compiler-Based Adaptive Fetch Throt- tling for Energy Efficiency?, in IEEE International Symp. on Performance Analysis of Systems and Software, Mar. 2006, pp. 112119. [35] W. Wolf, ?Cyber-physical Systems?, Computer, vol. 42, no. 3, pp. 8889, Mar. 2009. [36] K.-S. Yeo and K. Roy, ?Low-Voltage, Low-Power VLSI Subsystems?, McGraw-Hill, 2005. [37] W. Zhao and Y. Cao, ?New Generation of Predictive Technology Model for Sub-45nm Early Design Exploration?, IEEE Transactions on Electron Devices, vol. 53, pp.2816- 2823, Nov. 2006. [38] R. Rao, S. Vrudhula, and D. N. Rakhmatov, ?Battery Modeling for Energy-Aware System Design?, Computer, vol. 36, no. 12, pp. 77-87, Dec. 2003. [39] M. Doyle, T.F. Fuller, and J. Newman, ?Modeling of Galvanostatic Charge and Dis- charge of the Lithium/Polymer/Insertion Cell?, J. Electrochemical Soc., vol.140, no. 6, 1993, pp. 1526-1533. [40] T.F. Fuller, M. Doyle, and J. Newman, ?Simulation and Optimization of the Dual Lithium Ion Insertion Cell?, J. Electrochemical Soc., vol. 141, no. 1, 1994, pp. 1-10. [41] J.S. Newman, ?FORTRAN Programs for Simulation of Electrochemical Systems, Dualfoil.f Program for Lithium Battery Simulation?; www.cchem.berkeley.edu/ js- ngrp/fortran.html. 87 [42] Synopsys, Inc., ?HSPICE The Gold Standard for Accurate Circuit Simula- tion?, www.synopsys.com/Tools/Verification/AMSVerification/ CircuitSimula- tion/HSPICE/Documents/hspice ds.pdf. [43] M. Pedram and Q. Wu, ?Design Considerations for Battery-Powered Electronics?, Proc. 36th ACM/IEEE Design Automation Conference, ACM Press, 1999, pp. 861-866. [44] D.N. Rakhmatov and S.B.K. Vrudhula, ?An Analytical High-Level Battery Model for Use in Energy Management of Portable Electronic Systems?, Proc. 2001 IEEE/ACM Intl Conf. Computer-Aided Design, IEEE Press, 2001, pp. 488-493. [45] D. Rakhmatov, S. Vrudhula, and C. Chakrabarti,?Battery-Conscious Task Sequencing for Portable Devices Including Voltage/Clock Scaling, Proc. 39th Design Automation Conf., ACM Press, 2002, pp.189-194. [46] Kanishka Lahiri , Sujit Dey , Debashis Panigrahi , Anand Raghunathan, ?Battery- Driven System Design: A New Frontier in Low Power Design?, Proceedings of the 2002 conference on Asia South Pacific design automation/VLSI Design, p.261, January 07-11, 2002 [47] P. Rong and M. Pedram, ?An Analytical Model for Predicting the Remaining Battery Capacity of Lithium-Ion Batteries?, Proc. 2003 Design, Automation and Test in Europe Conf. and Exposition, IEEE CS Press, 2003, pp. 1148-1149. [48] T. L. Martin, ?Balancing Batteries, Power and Performance: System Issues in CPU Speed-Setting for Mobile Computing?, PhD thesis, Department of Electrical and Com- puter Engineering, Carnegie Mellon University, 1999. [49] M. Chen and G. A. Rincon-Mora, ?Accurate Electrical Battery Model Capable of Pre- dicting Runtime and I-V Performance?, IEEE Transactions on Energy Conversion, vol. 21, no. 2, pp. 504-511, June 2006. [50] H.J. Bergveld, W.S. Kruijt, and P.H.L. Notten, ?Electronic- Network Modeling of Rechargeable NiCd Cells and Its Application to the Design of Battery Management Systems?, J. Power Sources, vol. 77, no. 2, 1999, pp. 143-158 [51] R. W. Erickson, ?DC-DC power converters?, Wiley Encyclopedia of Electrical and Elec- tronics Engineering, pp. 1988:Wiley [52] S.C. Hageman, ?PSpice Models Nickel-Metal-Hydride Cells?, EDN Access, 2 Feb. 1995; www.reedelectronics.com/ednmag/archives/1995/020295/03di1.htm. [53] S. Gold, A PSPICE Macromodel for Lithium-Ion Batteries, Proc. 12th Ann. Battery Conf. Applications and Advances, IEEE Press, 1997, pp. 215-222. [54] L. Benini, G. Castelli, A. Macci, E. Macci, M. Poncino, and R. Scarsi, ?Discrete-time battery models for system-level low-power design?, IEEE Trans. VLSI Systems, vol. 9, no. 5, pp. 630640, Oct. 2001. 88 [55] L. Benini, G. Castelli, A. Macii, E. Macii, M. Poncino, and R. Scarsi, ?A Discrete-Time Battery Model for High-Level Power Estimation?, in Proceedings Conference on Design, Automation and Test in Europe, Mar. 2000, pp. 3541. [56] Weiser, M., Welch, B., Demers, A., AND Shenker, S. ?Scheduling for reduced CPU energy?, Proceedings of OS Design and Implementation, 1994. [57] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, ?Sub-Threshold Design for Ultra Low-Power Systems?, Springer, 2006. [58] H. Wang and Y. Guo and I. Koren and C. M. Krishna, ?Compiler-Based Adaptive Fetch Throttling for Energy-Efficiency?, IEEE International Symp. on Performance Analysis of Systems and Software, pp.112-119, Mar, 2006 [59] Kulkarni, M., Agrawal, V., ?Matching Power Source to Electronic System: A tutorial on battery simulation?, VLSI Design and Test Symposium, July, 2010 [60] D. A. Patterson, ?The Trouble with Multi-Cores?, IEEE Spectrum, vol. 47, no. 7, pp. 28-32 and 52-53, July 2010. [61] Jan Rabaey, ?Low Power Design Essentials?, Springer, 2009 89