POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS WITH PROCESS VARIATION Except where reference is made to the work of others, the work described in this dissertation is my own or was done in collaboration with my advisory committee. This dissertation does not include proprietary or classified information. Yuanlin Lu Certificate of Approval: Fa Foster Dai Vishwani D. Agrawal, Chair Associate Professor James J. Danaher Professor Electrical & Computer Engineering Electrical & Computer Engineering Charles E. Stroud Joe F. Pittman Professor Interim Dean Electrical & Computer Engineering Graduate School POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS WITH PROCESS VARIATION Yuanlin Lu A Dissertation Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy Auburn, Alabama August 4, 2007 iii POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS WITH PROCESS VARIATION Yuanlin Lu Permission is granted to Auburn University to make copies of this dissertation at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iv VITA Yuanlin Lu, daughter of Rongchang Lu and Afeng Kong, was born in Nanjing, P. R. China. She attended Southeast University in 1995 and graduated with a Bachelor of Engineering degree in Electronic Information Engineering in 1999. She entered the Graduate School at Southeast University in 1999 and received the Master of Science degree in Circuit and System in 2002. In January 2004, she joined the Ph.D. program of the Department of Electrical and Computer Engineering, Auburn University. v DISSERTATION ABSTRACT POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS WITH PROCESS VARIATION Yuanlin Lu Doctor of Philosophy, August 4, 2007 (M.S., Southeast University, 2002) (B.S., Southeast University, 1999) 142 Typed Pages Directed by Vishwani D. Agrawal With the continuing trend of technology scaling, leakage power has become a main contributor to power consumption. Dual threshold (dual-Vth) assignment has emerged as an efficient technique for decreasing leakage power. In this work, a mixed integer linear programming (MILP) technique simultaneously minimizes the leakage and glitch power consumption of a static CMOS (Complementary Metal Oxide Semiconductor) circuit for any specified input-to-output critical path delay. Using dual-threshold devices, the number of high-threshold devices is maximized and a minimum number of delay elements is inserted to reduce the differential path delays below the inertial delays of the incident gates. The key features of the method are that the constraint set size for the MILP model is linear in the circuit size and a power-performance tradeoff is allowed. vi Experimental results show 96%, 28% and 64% reductions of leakage power, dynamic power and total power, respectively, for the benchmark circuit C7552 implemented in BPTM 70nm CMOS technology. Due to the exponential relation between subthreshold current and process parameters, such as the effective gate length, oxide thickness and doping concentration, process variations can severely affect both power and timing yields of the designs obtained by the MILP formulation. We propose a statistical mixed integer linear programming method for dual-Vth design that minimizes the leakage power and circuit delay in a statistical sense such that the impact of process variation on the respective yields is minimized. Experimental results show that 30% more leakage power reduction can be achieved by using a statistical approach when compared with the deterministic approach that has to consider the worst case in the presence of process variations. Compared to subthreshold leakage, dynamic power is less sensitive to the process variation due to its linear dependency on the process parameters. However, the deterministic techniques using path balancing to eliminate glitches, becomes ineffective when process variation is considered. This is because the perfect hazard filtering conditions can easily be destroyed even by a small variation in some process parameters. We present a statistical MILP formulation to achieve a process-variation-resistant glitch- free circuit. Experimental results on an example circuit prove the effectiveness of this method. vii ACKNOWLEDGMENTS I would like to express my appreciation and sincere thanks to my advisor, Dr. Vishwani D. Agrawal, who guided and encouraged me throughout my studies. His advice and research attitude have provided me with a model for my entire future career. I also wish to thank my advisory committee members, Dr. Fa Foster Dai and Dr. Charles E. Stroud for their guidance and advice on this work. Appreciation is expressed to Badhri Uppiliappan who gave me a great help during my internship in Analog Device Inc. I also appreciate those who have made contributions to my research. Thanks to Jins Alexander, Hillary Grimes, Kyungseok Kim, Khushboobenumesh Sheth, Fan Wang and Nitin Yogi for their cooperation and helpful discussions throughout the course of this research. Finally, I would like to thank, although this is too weak a word, my parents and sister, all the other family members and my friends for their continual encouragement and support throughout this work. viii Style manual or journal used: Bibliography follows those of the transactions of the Institute of Electrical and Electronics Engineers and is sorted in alphabetical order. Computer software used: Microsoft Word 2003. ix TABLE OF CONTENTS LIST OF FIGURES ?????????????????????????????????????????????????????????????????????????????????????????????????????????? xii LIST OF TABLES???????????????????????????????????????????????????????????????????????????????????????????????????????????? xv CHAPTER 1 INTRODUCTION???????????????????????????????????????????????????????????????????????????????????????? 1 1.1 Motivation ?????????????????????????????????????????????????????????????????????????????????????????????????????? 1 1.1.1 Leakage Power ????????????????????????????????????????????????????????????????????????????????????? 1 1.1.2 Glitch Power ????????????????????????????????????????????????????????????????????????????????????????? 2 1.1.3 Process Variation?????????????????????????????????????????????????????????????????????????????????? 3 1.2 Problem Statement ????????????????????????????????????????????????????????????????????????????????????????? 3 1.3 Original Contributions???????????????????????????????????????????????????????????????????????????????????? 4 1.4 Organization of the Dissertation ???????????????????????????????????????????????????????????????????? 5 CHAPTER 2 PRIOR WORK: TECHNIQUES FOR LOW POWER DESIGN ???????????????? 6 2.1 Components of Power Consumption?????????????????????????????????????????????????????????????? 6 2.1.1 Dynamic Power ???????????????????????????????????????????????????????????????????????????????????? 6 2.1.2 Leakage Power ????????????????????????????????????????????????????????????????????????????????????? 7 2.2 Techniques for Leakage Reduction???????????????????????????????????????????????????????????????? 9 2.2.1 Dual-Vth Assignment??????????????????????????????????????????????????????????????????????????? 10 2.2.2 Multi-Threshold-Voltage CMOS ??????????????????????????????????????????????????????? 12 2.2.3 Adaptive Body Bias???????????????????????????????????????????????????????????????????????????? 13 2.2.4 Transistor Stacking ????????????????????????????????????????????????????????????????????????????? 14 2.2.5 Optimal Standby Input Vectors ?????????????????????????????????????????????????????????? 15 2.2.6 Power cutoff ??????????????????????????????????????????????????????????????????????????????????????? 16 2.3 Techniques for Dynamic Power Reduction ????????????????????????????????????????????????? 17 2.3.1 Logic Switching Power Reduction ????????????????????????????????????????????????????? 17 x 2.3.2 Glitch Power Elimination ??????????????????????????????????????????????????????????????????? 21 2.4 Power Optimization with Process Variation ???????????????????????????????????????????????? 26 2.4.1 Leakage Minimization with Process Variation ?????????????????????????????????? 26 2.4.2 Glitch Power Optimization with Process Variation ??????????????????????????? 27 2.5 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 28 CHAPTER 3 DETERMINISTIC MILP FOR LEAKAGE AND GLITCH MINIMIZATION??????????????????????????????????????????????????????????????????????????????????????? 29 3.1 Leakage and Delay??????????????????????????????????????????????????????????????????????????????????????? 29 3.2 A Deterministic MILP for Power Minimization?????????????????????????????????????????? 31 3.2.2 Objective Function ????????????????????????????????????????????????????????????????????????????? 32 3.2.3 Constraints?????????????????????????????????????????????????????????????????????????????????????????? 34 3.3 Delay Element Implementation???????????????????????????????????????????????????????????????????? 39 3.3.1 Delay Element Comparison???????????????????????????????????????????????????????????????? 40 3.3.2 Capacitances of a Transmission-Gate Delay Element???????????????????????? 41 3.4 MILP and Heuristic Algorithms??????????????????????????????????????????????????????????????????? 44 3.5 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 46 CHAPTER 4 STATISTICAL MILP FOR LEAKAGE OPTIMIZATION UNDER PROCESS VARIATION ??????????????????????????????????????????????????????????????????????????? 48 4.1 Effects of Process Variation on Leakage Power ?????????????????????????????????????????? 48 4.2 Overview of Deterministic Dual-Vth Assignment by MILP????????????????????????? 53 4.3 Statistical Dual-Vth Assignment ??????????????????????????????????????????????????????????????????? 54 4.3.1 Statistical Subthreshold Leakage Modeling ??????????????????????????????????????? 55 4.3.2 Statistical Delay Modeling ????????????????????????????????????????????????????????????????? 58 4.3.3 MILP for Statistical Dual-Vth Assignment ???????????????????????????????????????? 59 4.4 Linear Approximations ???????????????????????????????????????????????????????????????????????????????? 61 4.5 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 63 CHAPTER 5 TOTAL POWER MINIMIZATION WITH PROCESS VARIATION BY DUAL-THRESHOLD DESIGN, PATH BALANCING AND GATE SIZING ??????????????????????????????????????????????????????????????????????????????????????????? 64 xi 5.1 Deterministic MILP for Total Power Optimization by Dual-Vth, Path Balancing and Gate Sizing ??????????????????????????????????????????????????????????????????????????? 65 5.1.1 Gate Sizing for Dynamic Power Reduction ??????????????????????????????????????? 65 5.1.2 Deterministic MILP for Total Power Reduction ???????????????????????????????? 68 5.1.3 Results ???????????????????????????????????????????????????????????????????????????????????????????????? 72 5.2 Statistical MILP for Total Power Optimization ??????????????????????????????????????????? 77 5.2.1 The Impact of Process Variation on Dynamic Power ???????????????????????? 77 5.2.2 Statistical MILP for Power Optimization with Process Variation ????? 83 5.2.3 Minimizing Impact of Process Variation on Leakage or Glitch Power ?????????????????????????????????????????????????????????????????????????????????????????????????? 88 5.3 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 93 CHAPTER 6 RESULTS ?????????????????????????????????????????????????????????????????????????????????????????????????? 95 6.1 Results of Deterministic MILP (Chapter 3) for Total Power Optimization? 95 6.1.1 Leakage Power Reduction ?????????????????????????????????????????????????????????????????? 95 6.1.2 Leakage, Dynamic Glitch and Total Power Reduction ?????????????????????? 98 6.1.3 Tradeoff Between Glitch Power Reduction and Area/Power Overhead Contributed by the Delay Elements ????????????????????????????????? 101 6.2 Results of Statistical MILP (Chapter 4) for Leakage Optimization??????????? 104 6.3 Run Time of MILP Algorithms?????????????????????????????????????????????????????????????????? 109 6.4 Summary ???????????????????????????????????????????????????????????????????????????????????????????????????? 110 CHAPTER 7 CONCLUSION AND FUTURE WORK????????????????????????????????????????????????? 111 7.1 Conclusion?????????????????????????????????????????????????????????????????????????????????????????????????? 111 7.2 Future Work ??????????????????????????????????????????????????????????????????????????????????????????????? 112 7.2.1 Gate Leakage ???????????????????????????????????????????????????????????????????????????????????? 112 7.2.2 Techniques for Glitch Elimination with Process Variation?????????????? 113 7.2.3 Improvement of the MILP formulation???????????????????????????????????????????? 114 7.2.4 Complexity of the MILP formulation??????????????????????????????????????????????? 116 BIBLIOGRAPHY ?????????????????????????????????????????????????????????????????????????????????????????????????????????? 118 xii LIST OF FIGURES Figure 2.1 Leakage currents in an inverter. ........................................................................ 7 Figure 2.2 An example dual-Vth circuit............................................................................ 10 Figure 2.3 Schematic of MTCMOS, (a) original MTCMOS, (b) PMOS insertion MTCMOS, (c) NMOS insertion MTCMOS. .................................................. 13 Figure 2.4 Scheme of an adaptive body biased inverter. .................................................. 14 Figure 2.5 Comparison of leakage for (a) one single off transistor in an inverter and (b) two serially-connected off transistors in a 2-input NAND gate. ............... 15 Figure 2.6 Scheme of cluster voltage scaling. .................................................................. 18 Figure 2.7 Example circuit for illustrating ECVS. ........................................................... 19 Figure 2.8 Timing window for an n-input NAND gate. ................................................... 22 Figure 2.9 Glitch elimination methods, (a) glitches at the output of a NAND gate, (b) glitch elimination by hazard filtering, and (c) glitch elimination by path delay balancing........................................................................................ 23 Figure 2.10 Using redundant implicant to eliminate hazards, (a) a multiplexer with hazards, and (b) a redundant implementation of multiplier free from certain hazards................................................................................................. 25 Figure 3.1 Circuit for explaining MILP constraints.......................................................... 35 Figure 3.2 (a) An unoptimized circuit with high leakage and potential glitches, and (b) its corresponding optimized glitch-free circuit with low leakage. ............ 37 Figure 3.3 A full adder circuit with all gates assigned low Vth (Ileak = 161 nA). ............ 38 Figure 3.4 (a) Dual-Vth assignment and delay element insertion for Tmax = Tc. (Ileak = 73 nA), and (b) Dual-Vth assignment and delay element insertion for Tmax = 1.25Tc. (Ileak = 16 nA) ................................................................... 39 Figure 3.5 Delay elements: (a) CMOS transmission gate and (b) Cascaded inverters..... 40 xiii Figure 3.6 Capacitances in a MOS transistor.................................................................... 41 Figure 3.7 (a) Distributed and (b) Lumped RC models of a NMOS transmission gate. .. 43 Figure 3.8 Comparison of MILP with heuristic backtracking algorithm.......................... 46 Figure 4.1 Leakage power distribution of un-optimized C432 under local effective gate length variation........................................................................................ 50 Figure 4.2 Leakage power distributions of the deterministically optimized dual-Vth C432 due to process parameter variations, (a) global variations, (b) local variations, (c) effective gate length variations, and (d) threshold voltage variations......................................................................................................... 53 Figure 4.3 Basic idea of using MILP to optimize leakage................................................ 54 Figure 4.4 Detailed deterministic MILP formulation for leakage minimization.............. 54 Figure 4.5 Monte Carlo Spice simulation for leakage distribution of one MUX cell in TSMC 90nm CMOS technology..................................................................... 56 Figure 4.6 Basic MILP for statistical dual-Vth assignment. .............................................. 59 Figure 4.7 Detailed formulation of statistical dual-Vth assignment MILP........................ 60 Figure 5.1 Extended cell library with 6 corners for gate sizing........................................ 66 Figure 5.2 Comparison of dynamic power optimization of circuits implemented by 2-corner and 6-corner cell library with different weight factors..................... 74 Figure 5.3 Optimization space comparison between leakage and dynamic power of C432 @ 90?C. ................................................................................................. 75 Figure 5.4 Achieving the minimum total power by adjusting the weight factor (W)....... 76 Figure 5.5 Three possible glitch filtering conditions........................................................ 79 Figure 5.6 Three possible glitch filtering conditions under process variation.................. 80 Figure 5.7 Dynamic power distribution of un-optimized (with-glitch) C432 under local delay variation. ....................................................................................... 81 Figure 5.8 Dynamic power distribution of optimized (glitch-free) C432 under local delay variation................................................................................................. 82 Figure 5.9 Comparison of the impacts of 15% local process variation on the dynamic power in C432 which is optimized by the statistical MILP with the emphasis on the resistance of dynamic power to process variatin in Section 5.2.3.1, or xiv by the deterministic MILP in Section 5.1.2. (N=1, is the expected normalized minimum dynamic power in the optimized glitch-free C432)..... 91 Figure 5.10 Comparison of the impacts of 15% local Leff process variation on the leakage power in C432 which are optimized by the statistical MILP with the emphasis on the resistance of dynamic power to process variation in Section 5.2.3.1, or by the deterministic MILP in Section 5.1.2. (N1 and N2 are the normalized nominal leakage power in the optimized glitch- free C432)........................................................................................................ 92 Figure 5.11 Flowchart of making a decision as to which one, leakage or dynamic power, should be optimized with process variation....................................... 94 Figure 6.1 Tradeoffs between leakage power and performance. ...................................... 97 Figure 6.2 (a) dynamic power reduction by delay elements with a certain delay D, and (b) cumulative dynamic power reduction by delay elements with delay 0~D. ..................................................................................................... 102 Figure 6.3 The relation between the number of inserted delay elements (assorted by their contribution to the dynamic power reduction) and the corresponding percentage of glitch power reduction............................................................ 103 Figure 6.4 Power-delay curves of deterministic and statistical approaches for C432.... 106 Figure 6.5 Leakage power distribution of dual-Vth C7552 optimized by deterministic method, statistical methods with 99% and 95% timing yields, respectively. 107 Figure 7.1 An example circuit used for illustrating the timing violation........................ 115 Figure 7.2 Flowchart of an iterative power optimization procedure. ............................. 117 xv LIST OF TABLES Table 3.1 Leakage currents for low and high Vth NAND gates. ...................................... 30 Table 3.2 Delays of low and high Vth NAND gates......................................................... 30 Table 4.1 Leakage power distribution of un-optimized C432 under local effective gate length variation................................................................................................ 49 Table 4.2 Comparison of leakage power of deterministically optimized dual-Vth C432. 51 Table 5.1 Extended cell library with 6 corners for gate sizing. ....................................... 66 Table 5.2 Comparison of dynamic power optimization of C432 implemented by 2 corners and 6 corners cell library, respectively............................................... 73 Table 5.3 Normalized dynamic power distribution of un-optimized C432 under local delay variation................................................................................................. 80 Table 5.4 Normalized dynamic power distribution of optimized C432 under local delay variation................................................................................................. 82 Table 6.1 Leakage reduction alone due to dual-Vth assignment (27?C ).......................... 96 Table 6.2 Comparison of the percentage of glitches in unoptimized circuits with the real percentage of dynamic power reduction achieved by path balancing with considering the additional loading capacitances contributed by delay elements........................................................................................................... 99 Table 6.3 Leakage, glitch and total power reduction for ISCAS?85 benchmark circuits (90?C )........................................................................................................... 100 Table 6.4 Number of delay elements for optimization. ................................................. 101 Table 6.5 Comparison of leakage power saving due to statistical modeling with two different timing yields (?)............................................................................. 105 Table 6.6 Monte Carlo Spice simulation results for the mean and the standard deviation of the leakage distributions of ISCAS?85 circuits optimized by deterministic method, statistical methods with 99% and 95% timing yields, respectively........................................................................................ 108 1 CHAPTER 1 INTRODUCTION The primary contribution of this work is a new design methodology to minimize the total power consumption in a static CMOS (Complementary Metal Oxide Semiconductor) circuit. A mixed integer linear programming (MILP) formulation is proposed to optimize leakage power and dynamic glitch power, without reducing circuit performance, by dual- Vth assignment, path balancing and gate sizing. To consider the process variation, statistical delay and leakage models are adopted to optimize power consumption in a statistical sense such that the impact of process variation on the power and timing yields is minimized. 1.1 Motivation With the continuous increase of the density and performance of integrated circuits due to the scaling down of the CMOS technology, reducing power dissipation becomes a serious problem that every circuit designer has to face. 1.1.1 Leakage Power In the past, the dynamic power dominated the total power dissipation of a CMOS device. Since dynamic power is proportional to the square of the power supply voltage, lowering the voltage reduces the power dissipation. However, to maintain or increase the performance of a circuit, its threshold voltage should be decreased by the same factor, 2 which causes the subthreshold leakage current of transistors to increase exponentially and make it a major contributor to power consumption. To reduce leakage power, many techniques have been proposed, including transistor sizing [45, 72], multi-Vth [12, 19, 103], dual-Vth [31, 45, 70, 72, 96-101], optimal standby input vector selection [69, 84], transistor stacking [64, 65, 106], body bias [10, 91], etc. As the threshold voltage (Vth) of transistors in a CMOS logic gate is increased, the leakage current is reduced but the gate slows down. Dual-Vth assignment is an efficient technique for leakage reduction. The basic idea is utilizing the timing slack on non- critical paths to minimize the leakage power by assigning high Vth to some or all gates on non-critical paths. 1.1.2 Glitch Power Glitches as unnecessary signal transitions account for 20%-70% of the dynamic switching power [20]. To eliminate glitches, a designer can adopt techniques of hazard filtering [7, 38, 46, 83, 104] and path balancing [8, 46, 74]. In Hazard filtering, gate sizing or transistor sizing is used to increase the gate?s inertial delay to filter out the glitches. An obvious disadvantage of such hazard filtering, when used alone, is that it may increase the circuit delay due to the increase of the gate delay. Alternatively, any given performance can be maintained by path delay balancing, although the area overhead and additional power consumption of the inserted delay elements can become a major concern. The best way to eliminate glitches is to combine these two techniques [8]. 3 1.1.3 Process Variation The increase in variability of several key process parameters can significantly affect the design and optimization of low power circuits in the nanometer regime [61]. Due to the exponential relation of leakage current with some process parameters, such as the effective gate length, oxide thickness and doping concentration, process variations can cause a significant increase in the leakage current. There are two principal components of leakage current. Gate leakage is most sensitive to the variation in oxide thickness (Tox), while the subthreshold current is extremely sensitive to the variation in effective gate length (Leff), oxide thickness (Tox) and doping concentration (Ndop). Compared to gate leakage, subthreshold leakage is more sensitive to parameter variations [66]. Dynamic power is normally much less sensitive to the process variation because of its approximately linear dependency on the process parameters. However, any deterministic path balancing technique used for eliminating glitches becomes less effective under process variation, since the perfect hazard filtering conditions can be easily corrupted even with a small variation in some process parameters. To make the glitch-free circuits optimized by path balancing resistant to process variations, a statistical delay model is developed in this work. 1.2 Problem Statement The problem solved in this work is: Find a deterministic mixed integer linear programming (MILP) formulation to optimize the total power consumption by dual threshold voltage (dual-Vth) assignment, path balancing and gate sizing. Further, derive 4 a statistical mixed integer linear programming formulation to minimize the impact of process variations on the optimal leakage and dynamic glitch power. 1.3 Original Contributions In this dissertation, we first propose a deterministic mixed integer linear programming (MILP) formulation to minimize the leakage and dynamic power consumption of a static CMOS circuit for a given performance. In a dual-threshold circuit this method maximizes the number of high-threshold devices and simultaneously eliminates glitches by balancing paths with the smallest number of delay elements. Gate sizing is also considered to further minimize the dynamic switching power by reducing the loading capacitances of gates. Since leakage exponentially depends on some key process parameters, it is very sensitive to process variations. We treat gate delay and leakage current as random variables to reflect the impact of process variation. A mixed integer linear programming (MILP) method for dual-Vth design is proposed to minimize the leakage power and circuit delay in a statistical sense such that the effect of process variation on the respective yields is minimized. Two types of yields are considered. Leakage yield refers to the probability of an optimized circuit retaining the leakage current below the specified value in the presence of random process variations. Similarly, timing yield is the probability of the critical path delay staying below the specification. The experimental results show that 30% more leakage power reduction can be achieved by using the statistical approach, referred to as statistical MILP, when compared with the deterministic approach. 5 Glitch-free circuits optimized by path balancing are also quite sensitive to process variations. We further extend the statistical MILP formulation to optimize the dynamic switching power considering process variation and achieve process-variation-resistant glitch-free circuits. 1.4 Organization of the Dissertation In Chapter 2, the basic components of power consumption in a static CMOS circuit are first discussed, followed by a survey of the relevant published literature on low power design techniques at the gate level. Chapter 3 proposes an original mixed integer linear programming (MILP) method for total power minimization by dual-Vth assignment and path balancing. To consider process variation, statistical MILP optimization of leakage power and dynamic glitch power are presented in Chapter 4 and Chapter 5, respectively. In Chapter 6, experimental results are presented. Finally, a conclusion and recommendations for future work are given in Chapter 7. 6 CHAPTER 2 PRIOR WORK: TECHNIQUES FOR LOW POWER DESIGN 2.1 Components of Power Consumption Power consumption in a static CMOS circuit basically comprises three components: dynamic switching power, short circuit power and static power. Compared to the other two components, short circuit power normally can be ignored in submicron technology. 2.1.1 Dynamic Power Dynamic power is due to charging and discharging the loading capacitances. It can be expressed by the following equation [73]: FAVCP ddLdyn ??= 221 (2.1) where ? CL is the loading capacitances, including the gate capacitance of the driven gate, the diffusion capacitance of the driving gate and the wire capacitance; ? Vdd is the power supply voltage; ? A is the switching activity; ? F is the circuit operating frequency. Equation (2.1) shows that dynamic switching power is directly proportional to the switching activity, A, or the number of signal transitions. More the signal transitions, 7 higher is the dynamic power consumption. After a transition is applied at the input, the output of a gate may have multiple transitions before reaching a steady state (see Figure 2.9(a)). Among these transitions, at most one is the essential transition, and all others are unnecessary transitions that are called glitches or hazards. Hence, dynamic power is composed of two parts, logic switching power which is contributed by the necessary signal transitions for logic functions, and glitch power which is caused by glitches or hazards. 2.1.2 Leakage Power The leakage current of a transistor is mainly the result of reverse-biased PN junction leakage, subthreshold leakage and gate leakage as illustrated in Figure 2.1. Vdd Vdd Subthreshold Leakage Gate Leakage Reverse Biased PN-Junction Leakage Gate Leakage Figure 2.1 Leakage currents in an inverter. 8 In submicron technology, the reverse-biased PN junction leakage is much smaller than subthreshold and gate leakage and hence can be ignored. The subthreshold leakage is the weak inversion current between source and drain of an MOS transistor when the gate voltage is less than the threshold voltage [99]. It is given by [42]: ??? ? ??? ? ??? ? ??? ? ??? ??? ? ??? ? ?= T ds T thgs T eff oxsub V V nV VVeV L WCI exp1exp8.12 0? (2.2) where ?0 is the zero bias electron mobility, Cox is the oxide capacitance per unit area, n is the subthreshold slope coefficient, Vgs and Vds are the gate-to-source voltage and drain-to- source voltage, respectively, VT is the thermal voltage, Vth is the threshold voltage, W is the channel width and Leff is the effective channel length, respectively. Due to the exponential relation between Isub and Vth, an increase in Vth sharply reduces the subthreshold current. Gate leakage is the oxide tunneling current due to the low oxide thickness and the high electric field which increases the possibility that carriers tunnel through the gate oxide. Tunneling current will become a factor and may even be comparable to subthreshold leakage when oxide thickness is less than 15-20? [102]. Unlike subthreshold leakage, which only exists in weakly turned-off transistors, gate leakage always exists no matter whether the transistor is turned on or turned off [100]. Equation (2.3) gives the expression of the gate leakage [64]. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??? ? ??? ? ??? = ox ox ox ox ox ox effeffgate V VB T VALWI ? ? 2 3 2 )1(1 exp)( (2.3) 9 where Vox is the potential drop across the thin oxide, ?ox is the barrier height for the tunneling particle (electron or hole), and Tox is the oxide thickness. A and B are physical parameters given by [64], oxh qA ?pi 2 3 16= and hq mB ox 3 24 23?= , where m is the effective mass of the tunneling particle, q is the electronic charge, and h is the reduced Plank?s constant. The oxide thickness Tox decreases with the technology scaling to avoid the short channel effects. Equation (2.3) shows that gate leakage increases significantly with the decrease of Tox. In this work, we use BPTM (Berkeley Predictive Technology Models) 70nm technology [1] to implement our designs. Since BPTM 70nm technology is characterized by BSIM3.5.2, which cannot correctly model gate leakage, gate leakage is omitted in this work, and all the techniques discussed in Section 2.2 aim at subthreshold leakage reduction. 2.2 Techniques for Leakage Reduction Leakage is becoming comparable to dynamic switching power with the continuous scaling down of CMOS technology. To reduce leakage power, many techniques have been proposed, including dual-Vth, multi-Vth, optimal standby input vector selection, transistor stacking, and body bias. 10 2.2.1 Dual-Vth Assignment Dual-Vth assignment is an efficient technique for leakage reduction. In this method, each cell in the standard cell library has two versions, low Vth and high Vth. Gates with low Vth are fast but have high subthreshold leakage, whereas gates with high Vth are slower but have much reduced subthreshold leakage. Traditional deterministic approaches for dual-threshold assignment utilize the timing slack of non-critical paths to assign high Vth to some or all gates on those non-critical paths to minimize the leakage power. A B C Co S Figure 2.2 An example dual-Vth circuit. Figure 2.2 gives an example dual-Vth circuit. The bold lines represent the critical paths. To keep the highest circuit performance, all gates on the critical paths are assigned low Vth (white gates), while some gates on those non-critical paths can be assigned high Vth (black gates) to reduce the leakage since there are timing slacks left on those non- critical paths. Based on the techniques used for determining which gates on non-critical paths should be assigned high Vth, the dual-Vth approaches can be basically divided into 11 two groups: heuristic algorithms [45, 72, 96-101] and linear programming algorithms [31, 70]. Among heuristic algorithms, the backtracking algorithm [97, 98] used to determine the dual-Vth assignment only gives a possible solution, not usually an optimal one (see example in Figure 3.8 in Section 3.4). Because the backtracking search direction for non- critical paths is always from primary outputs to primary inputs, the gates close to the primary outputs have a higher priority for high Vth assignment, even though their leakage power savings may be smaller than those of gates close to the primary inputs. In [96], dual-Vth assignment is described as a constrained 0-1 programming problem with non- linear constraint functions. Wang et al. use a heuristic algorithm based on circuit graph enumeration to solve this problem. Although their swapping algorithm tries to avoid the local optimization, a global optimization still can not be guaranteed. Unlike a heuristic algorithm that can only guarantee a locally optimal solution, a linear programming (LP) formulation ensures a global optimization by describing both the objective function and constraints as linear functions. Nguyen et al. [70] use LP to minimize the leakage and dynamic power by gate sizing and dual-Vth device assignment. The optimization work is separated into several steps. An LP is first used to distribute slack to gates with the objective of maximizing total power reduction. Then, an independent algorithm is needed to resize gates and assign threshold levels. This means that in [70] LP still needs the assistance of a heuristic algorithm to complete the optimization. The method of [31] also uses MILP to optimize the total power consumption by dual-threshold assignment and gate sizing. Dual-Vth assignment can reduce leakage in both active and standby modes since some gates remain idle even when the whole circuit or system is in the active mode. But 12 the effectiveness of this method depends on the circuit structure. A symmetric circuit with many critical paths leaves a much reduced optimization space for leakage reduction. 2.2.2 Multi-Threshold-Voltage CMOS A Multi-Threshold-Voltage CMOS (MTCMOS) circuit [12, 19, 103] is implemented by inserting high Vth transistors between the power supply voltage and the original transistors of the circuit [68]. Figure 2.3(a) shows a schematic of a MTCMOS NAND gate. The original transistors are assigned low Vth to enhance the performance while high- Vth transistors are used as sleep controllers. In active mode, SL is set low and sleep control high-Vth transistors (MP and MN) are turned on. Their on-resistance is so small that VSSV and VDDV can be treated as almost being equal to the real power supply. In the standby mode, SL is set high, MN and MP are turned off and the leakage current is low. The large leakage current in the low-Vth transistors is suppressed by the small leakage in the high-Vth transistors. By utilizing the sleep control high-Vth transistors, the requirements for high performance in active mode and low static power consumption in standby mode can both be satisfied. To reduce the area, power and speed overhead contributed by the sleep control high- Vth transistors, only one high-Vth transistor is needed. Figure 2.3(b) and 2.3(c) show the PMOS insertion MTCMOS and NMOS insertion MTCMOS. NMOS insertion MTCMOS is preferred because for any given size, an NMOS transistor has smaller on-resistance than a PMOS transistor [100]. Compared to the dual-Vth technique, MTMOS can only reduce leakage in the standby mode and has additional area-, power-, and speed overheads. 13 VDD VSS SL SL VDDV VSSV Vdd MP MN High Vth Low Vth VDD VSSV VSS SL MN VDD SL VDDV MP VSS (a) (c)(b) Figure 2.3 Schematic of MTCMOS, (a) original MTCMOS, (b) PMOS insertion MTCMOS, (c) NMOS insertion MTCMOS. 2.2.3 Adaptive Body Bias The threshold voltage of a short-channel NMOSFET can be expressed by the following equation [47]. ( ) NWddDIBLsbssthth VVVVV ?+???+= ????0 (2.4) where Vth0 is the threshold voltage with a zero body bias, ?S, ? and ?DIBL are constants for a given technology, Vbs is the voltage applied between the body and source of the transistor, ?VNW is a constant that models narrow width effect, and Vdd is the supply voltage. Equation (2.4) shows that a reverse body bias leads to an increase of the threshold voltage and a forward body bias decreases the threshold voltage. Leakage power reduction can be achieved by dynamically adjusting the threshold voltage through adaptive body bias according to the different operation modes. In the active mode, forward body (or zero) bias is used to reduce the threshold voltage, which results in a higher performance. In the standby mode, leakage power is greatly reduced by 14 the optimal reverse body bias, which increases threshold voltages. The basic scheme of an adaptive-body-biased inverter is shown in Figure 2.4 [100]. Similar to the MTCMOS, adaptive body bias [11, 13, 28, 54, 63, 90] only reduces the leakage power in the standby mode. With the continuous technology scaling, the optimal reverse body bias becomes closer to the zero body bias and thus the technique of adaptive body bias becomes less effective [44]. VDD VSS active active standby standby Vbp Vbn Figure 2.4 Scheme of an adaptive body biased inverter. 2.2.4 Transistor Stacking The two serially-connected devices in the off state have significantly lower leakage current than a single off device. This is called the stacking effect [64, 65, 106]. In Figure 2.5(b), when both M1 and M2 are turned off, Vm has a positive value due to the leakage current flowing through M1 and M2. Assuming the bodies of M1 and M2 are both connected to the ground, Vbs of M1 becomes negative and leads to an increase of M1?s threshold voltage. At the same time, Vgs and Vds of M1 are both reduced. According to equation (2.2), the subthreshold leakage in M1 is decreased sharply and suppresses the 15 relative larger leakage current in M2. On the contrary, Vm in Figure 2.4(a) is always equal to zero and has no effect on Vbs, Vgs and Vds of M and hence on its subthreshold leakage. Vdd=Vds Vdd GND 0 M Vdd=Vds1+ Vds2 0 Vdd GND M1 M2 0 0 0 0 (a) (b) Vm Vm Figure 2.5 Comparison of leakage for (a) one single off transistor in an inverter and (b) two serially-connected off transistors in a 2-input NAND gate. With transistor stacking [40, 51, 55], by replacing one single off transistor with a stack of serially-connected off transistors, leakage can be significantly reduced. The disadvantages of this technique are also obvious. Such a stack of transistors causes either performance degradation or more dynamic power consumption. 2.2.5 Optimal Standby Input Vectors Subthreshold leakage current depends on the vectors applied to the gate inputs because different vectors cause different transistors to be turned off. From the illustration in Section 2.2.4, a 2-input NAND gate has the smallest subthreshold leakage due to the stacking effect when the input vector is ?00?. When a circuit is in the standby mode, one 16 could carefully choose an input vector and let the total leakage in the whole circuit to be minimized [6, 22, 32, 52, 69, 84]. Gao et al. in [32] model leakage current by means of linearized pseudo-Boolean functions. An exact ILP model was first discussed to minimize leakage with respect to a circuit?s input vector. A fast heuristic MILP was then proposed to selectively relax some binary constraints of the ILP model to make a tradeoff between runtime and optimality. 2.2.6 Power cutoff Yu and Bushnell [108, 109] present a novel active leakage power reduction method called the dynamic power cutoff technique (DPCT). The power supply to each gate is only connected in its switching window, during which the gate makes its transition within a clock cycle. The circuit is optimally partitioned into groups based on the minimal switching window (MSW) of gates and power cutoff transistors are inserted into each group to control the power connection of that group. Since the power supply of each gate is only turned on during a small timing window within a clock cycle, significant active leakage reduction can be achieved. One key of this leakage reduction technique is the implementation of the cutoff transistors, which can be either implemented by high-Vth transistors as discussed in Section 2.2.2, or by low-Vth transistors that are overdriven by a power supply larger than Vdd for PMOS cutoff transistors or lower than Vss for NMOS cutoff transistors. 17 2.3 Techniques for Dynamic Power Reduction Dynamic power is comprised of logic switching power and glitch power, and can be expressed by the following equation [73]. FAVCP ddLdyn ??= 221 (2.4) To reduce dynamic power at a specified operating frequency F, we can either reduce the dynamic power consumption per logic transition which is determined by loading capacitances CL, and power supply Vdd, or reduce the number of logic transitions in the circuit represented by switching activity A. 2.3.1 Logic Switching Power Reduction 2.3.1.1 Dual power supply Reducing the supply voltage, or voltage scaling [15, 23, 27, 29, 107], is the most effective technique for dynamic power reduction because dynamic power is proportional to the square of the power supply. Similar to the dual-Vth approach, the dual Vdd technique assigns high Vdd to all the gates on the critical paths and low Vdd to some of the gates on the non-critical paths. When a gate operating at a lower Vdd directly drives a higher Vdd gate, a level converter is required to avoid the undesirable short circuit power in that higher Vdd gate due to the possible large DC current caused by the low voltage fanin. Since the level converters contribute additional power, minimizing the number of level converters is also important in voltage scaling [9]. 18 High Vdd Cluster Low Vdd Cluster Level Converters Combinational LogicFFs FFs Figure 2.6 Scheme of cluster voltage scaling. Clustered voltage scaling (CVS) [94] is an effective voltage scaling technique. The basic idea is shown in Figure 2.6 [9]. The instances of low Vdd gates driving high Vdd gates are not allowed and level converters are only used to convert low voltage signals to high voltage as inputs to flip-flops (FFs) such that the total number of level converters is minimized. In contrast to CVS, extended clustered voltage scaling (ECVS) [95] allows level conversion anywhere and the supply voltage assignment to the gates is much more flexible. Thus greater dynamic power saving can be achieved compared to the CVS. The algorithm of ECVS is more complicated than that of CVS, since CVS may use a backtracking algorithm to determine just two clusters: one high Vdd cluster and the other a low Vdd cluster. Figure 2.7 gives an example circuit whose dynamic power is optimized by ECVS. The bold lines represent the critical paths. 19 High Vdd Gate Low Vdd Gate Level Converter FF Figure 2.7 Example circuit for illustrating ECVS. 2.3.1.2 Gate sizing Non-critical paths have timing slack and the delays of some gates on these paths can be increased without affecting the performance. Since the lengths of devices (transistors) in a gate are usually minimal for a high speed application, the gate delay can be increased by reducing the device width. As a result, the dynamic power is accordingly decreased due to smaller loading capacitance CL, which is proportional to the device size. Gate sizing is a technique that determines device widths for gates. Traditional gate sizing approaches use Elmore delay models in a polynomial formulation. Heuristics- based greedy approaches [23-25, 67, 78, 86, 101] can be used to solve such a polynomial problem. In general, a heuristic algorithm is relatively fast but cannot guarantee a global optimal. The gate delay with respect to its device size, used in [23-25, 67, 78, 101], is generally given by the following equation, 20 i out iii GS CCgdd i+= (2.5) where, di is the delay of the gate, gdi is the intrinsic gate delay of gate i, Ci is a constant, Couti is the fanout load of gate i and GSi is the width of the gate i. The total loading capacitance Couti is determined based on the fanout of the gate and is given as [78], )( )( ? ? ?+= iFOj jwireout GSCCC iji (2.6) where, FO(i) is the set of gates that form the fan-outs for gate i, Cwireij is the capacitance of the wire connecting gates i and j and C is a constant. When ignoring the wiring capacitance, Equation (2.5) can be rewritten as (2.7). ? ? += )(iFOj j iii GSi GSkgdd (2.7) where ki=C?Ci. A linear programming method is proposed [14] in which a piecewise linear delay model is adopted to achieve a global optimal solution. A non-linear programming approach [59] gives the most accurate optimal solution but at a cost of long run times. 2.3.1.3 Transistor sizing The basic idea of transistor sizing is exactly the same as that of gate sizing except that in gate sizing all the transistors in one gate are sized together with the same factor but in transistor sizing each transistor can be sized independently. Gate intrinsic delay actually depends on the current and previous input vectors which determine the internal IO path (from the gate inputs to gate output). Different internal IO paths have different on-resistances that cause distinct path delays (gate intrinsic delays). 21 For a gate on a critical path, only part of its transistors contribute the largest intrinsic gate delay, so the remaining transistors still can be sized to reduce the capacitances. In gate sizing, gdi, the intrinsic gate delay of gate i in Equation (2.5) and (2.7) is a fixed value which makes it impossible to differentiate among the internal IO paths. On the contrary, transistor sizing [16, 43, 85, 105] explores the maximum possible optimization space by sizing transistors independently. 2.3.2 Glitch Power Elimination When transitions are applied at inputs of a gate, the output may have multiple transitions before reaching a steady state (Figure 2.9(a)). Among these, at most one is the essential transition, and all others are unnecessary transitions often called glitches or hazards. Because switching power consumed by the gate is directly proportional to the number of output transitions, glitches reportedly account for 20%-70% dynamic power [20]. Agrawal et al. [8] prove that a combinational circuit is minimum transient energy design, i.e., there is no glitch at the output of any gate, if the difference of the signal arrival times at every gate's inputs remains smaller than the inertial delay of the gate, which is the time interval that elapses after a primary input change before the gate can produce a change at its output. This condition is expressed by the following inequality: in dtt ?i iWi jileak iWjidisizeWiIWMin ][3,21 3 ? (5.37) Although MILP tries to minimize ??[i]. ?[i] for some gate may still be positive since the constraint (5.34) is too tight to be satisfied without the help of a positive ?[i]. Every 90 positive ?[i] possibly causes the glitch generation at gate i?s output. From Table 5.4, we can also see that the average dynamic power linearly increase with the process variation approximately. This increase is contributed by the glitch power which generates under process variation condition. To counteract the increase in the average dynamic power due to those glitches, or to let the really average dynamic power in process variation condition still be close to that one achieved by the deterministic MILP formulation, we have to sacrifice some leakage power to get a smaller logic switching power in advance. This can be achieved by letting W1 and W2 both equal to 1 in the MILP objective function (5.38) and adding a new constraint (5.39) to the statistical MILP formation. [ ] [ ] [ ] ? ? ??? ?? ??? + ??? ? ??? ? ?++? ???? ?>?i iWi jileak iWjidCisizeCiICMin ][3,321 ? (5.38) [ ] [ ]??? ?+ i ji jidCisizeC ,32 < ( Pdyn_opt / ?) ( ?>1) (5.39) Pdyn_opt is the optimal dynamic power obtained by the deterministic MILP in Section 5.1.2 and ? is a constant determined by the process variation. By letting ? larger than 1, the statistical MILP formulation can give an optimal circuit which has less dynamic power. 91 0.00 0.10 0.20 0.30 0.40 0.50 0.95 0.97 0.99 1.01 1.03 1.05 1.07 1.09 1.11 1.13 1.15 1.17 1.19 1.21 1.23 Normalized Dynamic Power Pro ba bil ity statistical ?=1.04 3?/?=2.82% (?-N)/N=3.63% determistic ?=1.14 3?/?=5.13% (?-N)/N=13.53% Figure 5.9 Comparison of the impacts of 15% local process variation on the dynamic power in C432 which is optimized by the statistical MILP with the emphasis on the resistance of dynamic power to process variation in Section 5.2.3.1, or by the deterministic MILP in Section 5.1.2. (N=1, is the expected normalized minimum dynamic power in the optimized glitch-free C432). In C432 optimized by the deterministic MILP formulation in Section 5.1.2, the optimized total power comprises 59.3?W dynamic power and 5.5?W leakage power as shown in Figure 5.4. The data in Table 5.4 shows that with 15% local process variation, its average dynamic power increase 13.53% and with 5.34% standard deviation. To reduce the impact of process variation on its dynamic power, the objective function (5.38) and constraint (5.39) (let Pdyn_opt=59.3?W and ?=1.10) are adopted in the statistical MILP formulation. The two curves in Figure 5.9 show that the average dynamic power only increases 3.63% instead of 13.53%, and standard deviation is also reduced to 2.82% from 5.13% when 15% local process variation is applied to the optimized glitch-free C432, although at a cost of 94% average leakage power increase (from 1.0 to 1.94) and a little bit wider spread of leakage power distribution, which is shown in Figure 5.10. 92 0.00 0.05 0.10 0.15 0.20 0.50 0.65 0.80 0.95 1.10 1.25 1.40 1.55 1.70 1.85 2.00 2.15 2.30 2.45 2.60 2.75 2.90 Normalized Leakage Pro ba bil ity statistical N2=1.94 ?=2.25 ?/?=10.24% (?-N1)/N1=16.97% deterministic N1=1.00 ?=1.17 ?/?=6.64% (?-N2)/N2=15.22% Figure 5.10 Comparison of the impacts of 15% local Leff process variation on the leakage power in C432 which are optimized by the statistical MILP with the emphasis on the resistance of dynamic power to process variation in Section 5.2.3.1, or by the deterministic MILP in Section 5.1.2. (N1 and N2 are the normalized nominal leakage power in the optimized glitch-free C432). 5.2.3.2 Minimizing the impact of process variation on leakage In case 2, leakage almost equals to or is even larger than the dynamic power. Since leakage is so sensitive to the process variation that we cannot minimize the effect of process variation on the dynamic power by sacrificing leakage any more. The technique of using path balancing to eliminate glitches has to be discarded since the increase in the average dynamic power under process variation may be close to or even larger than the glitch power eliminated by path balancing. To let the leakage of optimized circuits resistant to the process variation, we can still use the MILP proposed in Chapter 4 except every gate has six possible choices instead of just two choices. 93 5.3 Summary This chapter first introduces the technique of using gate sizing to reduce dynamic power. Then a deterministic MILP formulation is proposed to optimize the total power consumption by dual-Vth assignment, path balancing and gate sizing without considering any process variation. The impact of process variation on dynamic power is analyzed and a statistical MILP formulation is presented to minimize the impact of process variation on the dynamic power by giving up some leakage power if the dynamic power is still the dominant one under process variation. Figure 5.11 gives the flowchart of how to make a decision as to which one, leakage or dynamic power, should be optimized considering process variation. 94 Use determinstic MILP to get the optimal power Use statistical MILP to minimize process variation impact on dynamic power Use statistical MILP to minimize process variation impact on leakge power Is circuit most time in standby mode? Y N N Y Can leakage still be ignored under certain process variation? Simulate optimized circuit with certain process variation, get mean and standard deviation of leakage Figure 5.11 Flowchart of making a decision as to which one, leakage or dynamic power, should be optimized with process variation. 95 CHAPTER 6 RESULTS To study the increasingly dominant effect of leakage power, we use the BPTM 70nm CMOS technology [1]. Low Vth for NMOS and PMOS devices are 0.20V and ? 0.22V, respectively. High Vth for NMOS and PMOS are 0.32V and ? 0.34V, respectively. We regenerated the netlists of ISCAS?85 benchmark circuits using a 2-corner cell library in which the maximum gate fanin is 5. Two look-up tables for gate delays and leakage currents, respectively, of each type of cell were constructed using Spice simulation. A C program parses the netlist and generates the constraint set for the CPLEX LP solver in the AMPL software package [30]. CPLEX then gives the optimal Vth assignment as well as the value and position of every delay element. The dynamic power is estimated by an event driven logic simulator that incorporates an inertial delay glitch filtering analysis. 6.1 Results of Deterministic MILP (Chapter 3) for Total Power Optimization 6.1.1 Leakage Power Reduction The results of leakage power reduction for ISCAS?85 benchmark circuits are shown in Table 6.1. Here the objective of the MILP in Section 3.2 was set to minimize the leakage alone. All ?di,j variables were forced to be 0 and constraints (3.9) and (3.10) 96 were suppressed. The numbers of gates in column 2 are for our gate library and differ from those in the original benchmark netlists. Tc in column 3 is the minimum delay of the critical path when all gates have low Vth. This was determined by the LP discussed in Section 3.2 in the paragraph following Equation (3.16). Column 4 shows the total leakage current with all gates assigned low Vth. Column 5 shows the optimized circuit leakage current with gate Vth reassigned according to the MILP optimization. Column 6 shows the leakage reduction (%) for optimization without sacrificing any performance. Column 9 shows the leakage reduction with 25% performance sacrifice. Table 6.1 Leakage reduction alone due to dual-Vth assignment (27?C ). Optimized (Tmax= Tc) Optimized (Tmax= 1.25Tc) Circuit name # gates Tc (ns) Unopt Ileak (?A) Ileak (?A) Leakage reduction Sun OS 5.7 CPU s Ileak (?A) Leakage reduction Sun OS5 .7 CPU s C432 160 0.751 2.620 1.022 61.0% 0.42 0.132 95.0% 0.3 C499 182 0.391 4.293 3.464 19.3% 0.08 0.225 94.8% 1.8 C880 328 0.672 4.406 0.524 88.1% 0.24 0.153 96.5% 0.3 C1355 214 0.403 4.388 3.290 25.0% 0.1 0.294 93.3% 2.1 C1908 319 0.573 6.023 2.023 66.4% 59 0.204 96.6% 1.3 C2670 362 1.263 5.925 0.659 90.4% 0.38 0.125 97.9% 0.16 C3540 1097 1.748 15.622 0.972 93.8% 3.9 0.319 98.0% 0.74 C5315 1165 1.589 19.332 2.505 87.1% 140 0.395 98.0% 0.71 C6288 1177 2.177 23.142 6.075 73.8% 277 0.678 97.1% 7.48 C7552 1046 1.915 22.043 0.872 96.0% 1.1 0.445 98.0% 0.58 From Table 6.1, we see that by Vth reassignment, the leakage current of most benchmark circuits is reduced by more than 60% without any performance sacrifice (column 6). For several large benchmarks leakage is reduced by 90% due to a smaller percentage of gates being on critical paths. However, for some highly symmetrical 97 circuits, which have many critical paths, such as C499 and C1355, the leakage reduction is less. Column 9 shows that the leakage reduction reaches the highest level, around 98%, with some performance sacrifice. The curves in Figure 6.1 show the relation between normalized leakage power and normalized critical path delay in a dual-Vth process. Unoptimized circuits with all low Vth gates are at point (1, 1) and have the largest leakage power and smallest delay. With optimal Vth assignment, leakage power can be reduced sharply by 61% (from point (1, 1) to point (1, 0.4)) for C432 or 88% (from point (1, 1) to point (1, 0.1)) for C880, depending on the circuit, without sacrificing any performance. When normalized Tmax becomes greater than 1, i.e., we sacrifice some performance, leakage power further decreases with a slower decreasing trend. When the delay increase is more than 30%, the leakage reduction saturates at about 98%. Thus, Figure 6.1 provides a guide for making tradeoffs between leakage power and performance. 1 1.1 1.2 1.3 1.4 1.50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Critical Path Delay No rm ali ze d L ea ka ge P ow er C432 C880 C1908 Figure 6.1 Tradeoffs between leakage power and performance. 98 6.1.2 Leakage, Dynamic Glitch and Total Power Reduction The leakage current increases with temperature because VT (thermal voltage, kT/q) and Vth both depend on the temperature. Our Spice simulation shows that for a 2-input NAND gate with low Vth, when temperature increases from 27?C to 90?C, the leakage current increases by a factor of 10. For a 2-input NAND gate with high Vth, this factor is 20. The leakage in our look-up table is from simulation for 27?C operation. To manifest the dominant effect of the leakage power, we estimate the leakage currents at 90?C by multiplying the total leakage current obtained from CPLEX LP solver [30] by a factor between 10 and 20 as determined by the proportion of low to high threshold transistors. The dynamic power is estimated by a glitch filtering event driven simulator, and is given by ( )c i i iddinv dyn dyn T FOTVC T EP ? ??? == ? 2.11000 5.0 2 (6.1) where Cinv is the gate capacitance of an inverter, Ti is the number of transitions at the output of gate i when 1,000 random vectors are applied at PIs, and FOi is the number of fanouts for gate i. The vector period is assumed to be 20% greater than the critical path delay, Tc. By simulating each gate?s number of transitions, we can estimate the glitch power reduction. When path balancing is used to eliminate glitches, the additional loading capacitances contributed by the inserted delay elements consume extra dynamic power. Whether the technique of path balancing is effective depends on the ratio of this dynamic 99 power overhead to the eliminated glitch power. Data in column 3 of Table 6.2 show that less than 10% dynamic power reduction can be achieved for some circuits, for instance, C432, C1908 and C2670, when the loading capacitances of the delay elements are considered. This is mainly because we use a 2-corner cell library which has a limited optimization space. As we discussed and illustrated in Section 5.1, using a 6-corner cell library, normally we can achieve more dynamic power reduction since this type of cell library makes it possible to eliminate glitches by path balancing and to reduce loading capacitances for each logic transition by gate sizing simultaneously. Table 6.2 Comparison of the percentage of glitches in unoptimized circuits with the real percentage of dynamic power reduction achieved by path balancing considering the additional loading capacitances contributed by the delay elements. Cirt. Name Glitch % in Un-opt Circuits Dynamic Power reduction W/ C L of delay elements C432 27.4 % 8.63 % C499 29.0 % 18.13% C880 27.8 % 16.23% C1355 43.5 % 35.79% C1908 22.4 % 8.39% C2670 21.6 % 7.42% C3540 31.5 % 14.04% C5315 34.6 % 12.08% C6288 76.0 % 68.73% C7552 40.2 % 27.74% To demonstrate the projected dominant effect of leakage power in a sub-micron CMOS technology, we compare the leakage power and dynamic power at 90?C in Table 6.3. ?All low Vth? means the unoptimized circuit that has all low threshold gates, and ?Dual Vth? means the optimized circuit whose Vth has been optimally assigned for 100 minimum leakage. Column 6 gives the dynamic power of the optimized design, which is further reduced as shown in column 7 when glitches are eliminated by path balancing and the power overhead contributed by the delay elements is considered. We observe that for 70nm BPTM CMOS technology at 90?C, unoptimized leakage power (column 3) of some large ISCAS'85 benchmark circuits can account for about one half or more of the total power consumption (column 9). With Vth reassignment, the optimized leakage power of most benchmark circuits is reduced to around 10%. With further glitch (dynamic) power reduction, the average total power reduction for ISCAS'85 benchmark is 40%. Some have a total reduction of up to 70%. Table 6.3 Leakage, glitch and total power reduction for ISCAS?85 benchmark circuits (90?C ). Leakage Power (?W) Dynamic Power (?W) Total Power (leakage+dynamic) (?W) Cirt. Name # gates All low Vth Dual Vth Reduc. % Dual Vth Delay Opt. Reduc. % All low Vth Dual Vth + Del Opt. Reduc % C432 160 35.77 11.87 66.8% 101.0 73.3 8.63 % 136.8 104.15 23.86% C499 182 50.36 39.94 20.7% 225.7 160.3 18.13% 276.1 224.72 18.61% C880 328 85.21 11.05 87.0% 177.3 128.0 16.23% 262.5 159.57 39.21% C1355 214 54.12 39.96 26.3% 293.3 165.7 35.79% 347.4 228.29 34.29% C1908 319 92.17 29.69 67.8% 254.9 197.7 8.39% 347.1 263.20 24.17% C2670 362 115.4 11.32 90.2% 128.6 100.8 7.42% 244.0 130.38 46.57% C3540 1097 302.8 17.98 94.1% 333.2 228.1 14.04% 636.0 304.40 52.14% C5315 1165 421.1 49.79 88.2% 465.5 304.3 12.08% 886.6 459.06 48.22% C6288 1177 388.5 97.17 75.0% 1691 405.6 68.73% 2079.7 625.95 69.90% C7552 1046 444.4 18.75 95.8% 380.9 227.8 27.74% 825.3 293.99 64.38% 101 6.1.3 Tradeoff Between Glitch Power Reduction and Area/Power Overhead Contributed by the Delay Elements The area overhead due to the inserted delay elements is somewhat large. From Table 6.4, we observe that the number of delay elements (?di #) is almost equal to the number of gates (Gates #), except for C1355. If we assume that the average number of transistors in a gate is 4 (e.g., consider a 2-input NAND gate), and each delay element implemented by a CMOS transmission gate has 2 transistors, the rough area overhead will be around 50% due to delay element insertion. The main reason is that our cell library has some complex gates, for example, AOI (AND-OR-INVERT) gates whose fanin number may be as large as 5. Some NAND or NOR gates can also have as large as 4 inputs. As a result, it is very possible that more than one delay buffer is inserted for a gate. The solution is to use a simpler and smaller cell library which will be used in our following research. Table 6.4 Number of delay elements for optimization. Circuit Gates # ?di # C432 160 160 C499 182 128 C880 328 303 C1355 214 112 C1908 319 313 C2670 362 330 C3540 1097 1258 C5315 1165 1198 C6288 1177 1307 C7552 1046 845 102 Considering the usually large routing area in an ASIC chip, and the fact that a large percentage of delay elements have quite small delays (see the following discussion in this section) and hence small sizes, the actual area overhead should be much less than 50%. We also applied the path balancing technique to an ADI (Analog Devices Inc.) RFID chip which is implemented in TSMC 0.35um CMOS technology and has 46,000 placeable cells (39,000 combinational cells and 7,000 sequential cells). The power simulation results by PrimePower [5] show that 11.8% of the logic transitions are glitches which consume 8% of the dynamic power. Here the internal logic switchings inside of a standard cell are not considered. Although this RFID chip does not consume too much glitch power, the analysis of the values and number of the delay elements is still instructive. Figure 6.2 (a) dynamic power reduction by delay elements with a certain delay D, and (b) cumulative dynamic power reduction by delay elements with delay 0~D. (b) (a) 103 Figure 6.3 The relation between the number of inserted delay elements (sorted by their contribution to the dynamic power reduction) and the corresponding percentage of glitch power reduction Figure 6.2(a) gives the PDF (probability distribution function) of the delay elements, or dynamic power reduction by delay elements with a certain delay D. It shows that most of the delay elements inserted for glitch elimination have small delays. This coincides with the nature of the circuit structure in a high speed ASIC design. The logic depth of any combinational logic between two flip-flops cannot be very large in a high speed ASIC chip and hence the timing window determining the value of a delay element is not wide. Figure 6.2(b) gives the CDF (cumulative distribution function) of the delay elements, or the cumulative dynamic power reduction by delay elements with delay 0~D. It is found that delay elements whose delays are larger than 5ns or 10ns for the best case or worst case, respectively, contribute very little to the dynamic power reduction. 104 Therefore, Figure 6.2 gives us guidance for the selection of the delay elements when a standard cell library of delay elements is constructed. The relation between the number of inserted delay elements and the corresponding percentage of glitch power reduction is shown in Figure 6.3. Delay elements are assorted by their contribution to the dynamic power reduction. The fist 10,000 delay elements play a much more important role in glitch elimination, while the remaining 4,000 cells? contribution is very small. Figure 6.3 actually provides circuit designers a clue of how to make a tradeoff between glitch reduction and power/area overhead introduced by those delay elements. It should be noted that the glitches propagated at the outputs of buffers and inverters disappear automatically when all the paths are balanced. In this RFID chip, this type of glitches consumes 25% and 39% of the total glitch power for the worst case and best case respectively. Therefore, the maximum glitch power contributed by all the remaining glitches is 75% and 61% of the total glitch power for the worst case and best case respectively. 6.2 Results of Statistical MILP (Chapter 4) for Leakage Optimization To compare the power optimization results of the statistical MILP with those from the deterministic approach, we assume that all the gates have the same ci1 and ci2 (sensitivities of gate delay to the variation of different process parameters) in equation (4.9). Therefore, each gate has the same ri and we assume 3?/? of ri is 15%. This assumption is only for the simplicity and does not change the efficacy of the statistical approach. 105 In the deterministic method, the worst case is applied, which means all gate delays increase 15% and hence Tmax increases 15% accordingly. To make the comparison between the statistical method and the deterministic approach reasonable, Tmax in the statistical approach is also 115% of the original value. Table 6.5 Comparison of leakage power saving due to statistical modeling with two different timing yields (?). Circuit Deterministic Optimization (? = 100%) Statistical Optimization (? = 99%) Statistical Optimization (? = 95%) Name # gate Unopt. Leak. Power (?W) Opt. Leak. Power (?W) Run Time (s) Opt. Leak. Power (?W) Extra Power Saving Run Time (s) Opt. Leak. Power (?W) Extra Power Saving Run Time (s) C432 160 2.620 1.003 0.00 0.662 33.9% 0.44 0.589 41.3% 0.32 C499 182 4.293 3.396 0.02 3.396 0.0% 0.22 2.323 31.6% 1.47 C880 328 4.406 0.526 0.02 0.367 30.2% 0.18 0.340 35.4% 0.18 C1355 214 4.388 3.153 0.00 3.044 3.5% 0.17 2.158 31.6% 0.48 C1908 319 6.023 1.179 0.03 1.392 21.7% 11.21 1.169 34.3% 17.5 C2670 362 5.925 0.565 0.03 0.298 47.2% 0.35 0.283 49.8% 0.43 C3540 1097 15.622 0.957 0.13 0.475 50.4% 0.24 0.435 54.5% 1.17 C5315 1165 19.332 2.716 1.88 1.194 56.0% 67.63 0.956 64.8% 19.7 C7552 1046 22.043 0.938 0.44 0.751 20.0% 0.88 0.677 27.9% 0.58 Average of ISCAS?85 benchmarks 0.24 29.2% 9.04 41.3% 4.64 ARM7 15.5k 686.56 495.12 15.69 425.44 14.07% 36.79 425.44 14.07% 36.4 In Table 6.5, columns 4, 6 and 9 give the optimized leakage power by deterministic MILP, by statistical MILP with 99% timing yield and by statistical MILP with 95% timing yield. From Table 6.5, we see that compared to the deterministic method, which uses the fixed values, when we use statistical models for gate delay and subthreshold leakage current, ISCAS85 benchmarks can achieve on average 29% greater leakage power saving with 99% timing yield and 41% greater power saving with 95% timing yield. The reason is that statistical model has a more flexible optimization space, 106 while the deterministic approach assumes the worst case. For C499 and C1355, which have many critical paths due to their extremely symmetrical circuit structures, the optimization space is limited and therefore the additional power saving contributed by optimization is much smaller, especially with the higher timing yield (99%). It is also obvious that with a decreased timing yield, higher power saving can be achieved due to the relaxed timing constraints, resulting in a larger optimization space. 1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.50 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Normalized Timing No rm ali ze d L ea ka ge P ow er Deterministic LP Statistical LP ( 99% Timing Yield) Statistical LP ( 95% Timing Yield) Figure 6.4 Power-delay curves of deterministic and statistical approaches for C432. Figure 6.4 shows the power-delay curves for C432?s leakage optimization by deterministic and statistical approaches. The starting points of the three curves, (1,1), (1,0.66) and (1,0.59), indicate that if we can reduce the leakage power to 1 unit by deterministic approach, 0.65 unit and 0.59 unit leakage power can be achieved by using statistical approach with 99% and 95% timing yields, respectively. The lower the timing yield, the higher the power saving. With a further relaxed Tmax, all three curves will give more reduction in leakage power because more gates will be assigned high Vth. 107 0. 000 0. 050 0. 100 0. 150 0. 200 0. 250 0 1.00 E-0 7 2.00 E-0 7 3.00 E-0 7 4.00 E-0 7 5.00 E-0 7 6.00 E-0 7 7.00 E-0 7 8.00 E-0 7 9.00 E-0 7 1.00 E-0 6 1.10 E-0 6 1.20 E-0 6 1.30 E-0 6 1.40 E-0 6 Leakage Power (uW) Pro ba bil ity C7552_d C7552_p99 C7552_p95 Figure 6.5 Leakage power distribution of dual-Vth C7552 optimized by deterministic method, statistical methods with 99% and 95% timing yields, respectively. Figure 6.5 shows a clear comparison of the leakage power distributions of dual-Vth C7552 optimized by the deterministic method, and the statistical methods with 99% and 95% timing yield, respectively. We can see that both mean and standard deviation of C7552?s leakage distribution are reduced by statistical approaches as compared to the deterministic method. Although not very obvious, leakage optimization with 95% timing yield indeed has a smaller spread than that with 99% timing yield. The reason for the narrower leakage distribution and lower average leakage lies in the fact that more high threshold gates can be assigned by the statistical method compared to the deterministic method. Because, when optimizing the leakage and considering process variation by the deterministic approach, we have to analyze the worst case which is too pessimistic. The leakage in high Vth gates is less sensitive to the process variation, because although high Vth gates may have the same percentage of leakage variation as low Vth gates, the absolute variation in high Vth gates is certainly much 108 smaller. Therefore, a higher percentage of high Vth gates in a dual-Vth circuit ensures a narrower spread and a lower mean of leakage power. Table 6.6 Monte Carlo Spice simulation results for the mean and the standard deviation of the leakage distributions of ISCAS?85 circuits optimized by deterministic method, statistical methods with 99% and 95% timing yields, respectively. In global process variation, all the gate delays have the same percentage of variation, and hence no effect on the timing window constraints in the statistical MILP, which means the assignment of the dual threshold voltages is kept unchanged. On the other hand, subthreshold current is most sensitive to the Leff variation. Therefore, in Table 6.6, we simulate the leakage distributions of all the deterministically and statistically optimized ISCAS?85 benchmark circuits with local Leff variation (3?/?=15%) by Spice. Just as expected, almost all of the mean and standard deviations of the leakage distributions are decreased by statistically approaches. Narrower spread and lower mean Circuit Deterministic Optimization (? = 100%) Statistical Optimization (? = 99%) Statistical Optimization (? = 95%) Name # gates Nom. Leak. (nW) Mean Leak. (nW) S.D. (nW) Nom. Leak. (nW) Mean Leak. (nW) S.D. (nW) Nom. Leak. (nW) Mean Leak. (nW) S.D. (nW) C432 160 0.907 1.059 0.104 0.603 0.709 0.074 0.522 0.614 0.069 C499 182 3.592 4.283 0.255 3.592 4.283 0.255 2.464 2.905 0.197 C880 328 0.551 0.645 0.086 0.430 0.509 0.080 0.415 0.491 0.079 C1355 214 3.198 3.744 0.200 3.090 3.606 0.202 2.199 2.610 0.175 C1908 319 1.803 2.123 0.170 1.356 1.601 0.116 1.140 1.341 0.127 C2670 362 0.635 0.750 0.078 0.405 0.473 0.046 0.395 0.461 0.043 C3540 1097 1.055 1.243 0.119 0.527 0.611 0.032 0.493 0.575 0.031 C5315 1165 2.688 3.128 0.165 1.229 1.420 0.088 1.034 1.188 0.067 C7552 1045 0.924 1.073 0.069 0.774 0.903 0.049 0.701 0.823 0.045 Average of ISCAS?85 benchmarks 0.138 0.105 0.093 109 can be achieved by the statistical method with 95% timing yield compared to that with 99% timing yield. 6.3 Run Time of MILP Algorithms The run time of MILP is always a big concern since its complexity is exponential in the number of variables and constraints of the problem in the worst case. However, our experimental results show that the real computing time may depend on the circuit structure, logic depth, etc., and may not be exponential. The CPU times shown in columns 7 and 10 of Table 6.1 are for the deterministic MILP in Chapter 3. From the data in Table 6.1, it is hard to express any relation between the CPU time and the problem size, such as the number of gates in the circuit. For example, MILP solution time for the 1046-gate C7552 is only 1.1 CPU seconds, which is much less than 140 CPU seconds used for the 1165-gate C5315. Even for the same size problems, different constraints require varying solution times. Consider the 1177-gate C6288 circuit as an example. When the timing constraints for primary outputs (POs) are relaxed by 25%, CPU time decreases from 277 CPU seconds to 7.48 CPU seconds. As a result, MILP formulation may still solve some very large size circuits and provide a possibly better solution to dual-Vth assignment problem through global optimization. Running on a 2.4GHz AMD Opteron 150 processor with 3GB memory, many CPU run times for solving the statistical MILP problem (Chapter 4) were less than one second (columns 5, 8 and 11 in Table 6.5). This is an advantage over other techniques [61] because we achieve 30% more leakage reduction with 99% timing yield but in much less CPU time. 110 Besides ISCAS?85 benchmark circuits, we also optimized the leakage for an ARM7 IP core, which has 15,500 combinational cells and 2,400 sequential cells implemented in TSMC 90nm CMOS process. The experimental results in the last row of Table 6.5 show that 14% more leakage reduction is achieved with 37 seconds run time and partly demonstrate the feasibility of applying our MILP approach to real circuits. Although today's SOC may have over one million gates, it always has a hierarchical structure. MILP constraints can be generated for submodules at a lower level and the run times will be determined by the number of gates in the individual submodules. Such a technique may not guarantee a global optimization, but still would get a reasonable result within acceptable run time. 6.4 Summary Experimental results are presented and discussed in this chapter. The results show that the deterministic MILP formulation proposed in Chapter 3 for total power reduction by path balancing and dual-Vth assignment can achieve on average 40% total power reduction. If combining with the gate sizing technique discussed in Chapter 5, more power reduction can be obtained. The statistical MILP proposed in Chapter 4, for minimizing the impact of process variation on leakage power, can achieve 30% more leakage power reduction compared to the deterministic MILP formulation. Whether is it necessary to minimize the impact of process variation on dynamic power depends upon the circuit applications and which one is the dominant power component in the optimized circuit, so we only propose the corresponding statistical MILP formulation in Chapter 5 and do not give more detailed results in this chapter. 111 CHAPTER 7 CONCLUSION AND FUTURE WORK In this chapter, we summarize the entire work of this dissertation and provide some suggestions for future research. 7.1 Conclusion With the continuing trend of technology scaling, leakage power has become a main contributor to power consumption. Dual-Vth assignment has emerged as an efficient technique for decreasing leakage power. In Chapter 3, a mixed integer linear programming (MILP) technique simultaneously minimizes the leakage and glitch power consumption of a static CMOS circuit for any specified input to output critical path delay. Using dual-threshold devices, the number of high-threshold devices is maximized and a minimum number of delay elements are inserted to reduce the differential path delays below the inertial delays of the incident gates. The key features of the method are that the constraint set size for the MILP model is linear in the circuit size and a power- performance tradeoff is allowed. Experimental results show 96%, 28% and 64% reductions of leakage power, dynamic power and total power, respectively, for the benchmark circuit C7552 implemented in 70nm BPTM CMOS technology. Due to the exponential relation between subthreshold current and process parameters, such as the effective gate length, oxide thickness and doping concentration, process 112 variations can severely affect both power and timing yields of the designs obtained by the MILP formulation. In Chapter 4, we propose a statistical mixed integer linear programming method for dual-Vth design that minimizes the leakage power and circuit delay in a statistical sense such that the impact of process variation on the respective yields is minimized. Experimental results show that 30% more leakage power reduction can be achieved by using the statistical approach when compared with the deterministic approach that has to consider the worst case in the presence of process variations. Compared to subthreshold leakage, dynamic power is less sensitive to the process variation due to its linear dependency on the process parameters. However, the deterministic technique discussed in Chapter 3, which uses path balancing to eliminate glitches, becomes ineffective when process variation is considered. This is because the perfect hazard filtering conditions can easily be destroyed even by a small variation in some process parameters. We present a statistical MILP formulation to achieve a process- variation-resistant glitch-free circuit in Chapter 5. Experimental results on an example circuit prove the effectiveness of this method. 7.2 Future Work Some ideas and suggestions for future work are given in this section. 7.2.1 Gate Leakage In this work, the contribution of the gate-tunneling effect to the total leakage is not considered. Neglecting such effect can result in an underestimation of the total leakage. Our examples use BPTM 70nm technology, which is characterized by BSIM 3.5.2 and may not correctly model gate leakage. However, with appropriate design, the gate 113 leakage of transmission?gate delay elements can be kept small. For example, it is possible to use high-threshold transistors in the delay elements because these transistors are always on and the switching speed is not important. These transistors have a thicker gate oxide layer and hence have a lower gate leakage than low-threshold transistors. Otherwise, in general, the problem of gate leakage will have to be answered by future research. 7.2.2 Techniques for Glitch Elimination with Process Variation Although leakage has become a dominant contributor to the total power consumption with the continued technology scaling, its contribution drops much lower than the dynamic power after the circuit is optimized by efficient techniques, such as dual-Vth assignment and adaptive body bias [11, 13, 28, 54, 63, 90]. Elimination of glitches in a high activity circuit is still imperative. Path balancing is not preferred due to its sensitivity to the process variation. Hazard filtering (gate sizing) is sort of resistant to the process variation but has its own limitation in that a 100% glitch reduction is not guaranteed because of the impossibility of increasing any gate delays on critical paths [8]. Besides, there exists an upper bound on the achievable gate delay in any specific technology. Combining the two methods together to achieve both a complete glitch reduction and a process-variation-resistant circuit should be a challenging topic. In Chapter 5, we propose such a combined technique but at a cost of leakage increase. More efficient algorithms should be developed. 114 7.2.3 Improvement of the MILP formulation We have applied our MILP formulation of dual-Vth assignment to some industry circuits. (This work was done in the CAD group, Analog Devices Inc., during the summer of 2006). The basic steps were as follows. 1. Assign all cells in the circuit with low Vth. Then the LVT (low Vth) delay and leakage for each cell is extracted by PrimeTime [4] from this LVT design. 2. Similarly, acquire the HVT (high Vth) delay and leakage for each cell from a HVT design. 3. Extract timing (slack, specified clock period, input-delay, output-delay), primary inputs and primary outputs for each timing group by PrimeTime [4]. 4. Construct an MILP model based on the above information. 5. Solve this MILP problem and give the optimal dual-Vth solution. 6. Update the circuit from the original LVT design to the dual-Vth design according to the CPLEX solution. 7. Check timing and power of the new dual-Vth design by PrimeTime [4] and PrimePower [5] respectively. Experimental results show that twice the leakage power reduction can be achieved by our dual-Vth assignment MILP model as compared to a design by commercial tools, Physical Compiler [3] and Astro [2]. About 42% of 15,500 combinational cells were assigned high Vth. The runtime for solving this MILP was only several minutes since in such an ASIC design, only small combinational logic clouds (sub-circuits) are inserted 115 between registers, primary inputs and registers, and registers and primary outputs. Thus, the runtime of an MILP actually depends on the circuit structure of the most complicated or deepest combinational cloud, instead of on the total number of the cells in the circuit. FF FF 2 2 3 2 3 3.2 1 32 8ns LVT design dual-Vth design + + ++ = =8.2ns 7ns gate delay Figure 7.1 An example circuit used for illustrating the timing violation. However, there are some timing violations (the actual path delay is larger than the timing specification) in the dual-Vth design optimized by our MILP formulation. A possible reason is that the delays of LVT cells extracted in step 1 are not accurate. We use the example circuit in Figure 7.1 to briefly explain the cause of a timing violation. In LVT design, the path delay is 7ns (2+2+3) which is less than the specified clock period of 8ns. CPLEX finds that to reduce leakage, gate 2 can be assigned high Vth without a timing violation since gate 2?s HVT delay is 3ns and hence the new path delay should be 8ns (2+3+3). However, we found that in the dual-Vth design, the LVT delay of gate 3 actually changes to 3.2ns due to the increase in its input transition time, as a result of the increase in gate 2?s output transition time. Therefore, the real path delay is 8.2ns (2+3+3.2) which is beyond the specified clock period 8ns. The cause of this phenomenon 116 is the interdependency of delays of gates, which was neglected for simplicity in our MILP formulation. An iterative method shown in Figure 7.2 may be adopted to get the accurate delays and hence avoid the timing violation problem. If any timing violation is found, the new delays for all LVT cells are extracted from the current dual-Vth design and the MILP formulation is updated correspondingly. A different optimal solution is then given by the CPLEX solver with fewer timing violations. We continue iterations until all timing violations are eliminated. 7.2.4 Complexity of the MILP formulation As discussed in Section 6.3, for a several-million-gate SOC, MILP constraints can be generated for its submodules at a lower level and the run times will be determined by the number of gates in the individual submodules. Such a technique may not guarantee a global optimization, but still would obtain a reasonable result within acceptable run time. To further reduce runtime of an MILP or ILP formulation, we may also adopt a relaxed LP that uses the LP solution as the starting point and round off the variables such that they satisfy the (M)ILP. Kompella et al. in [48, 49] use branch-and-bound methods to do exhaustive search in the integer space. Although given enough computing time, those methods can find an optimal solution, feasible non-optimal solutions with acceptable run time can be achieved. In [41], the authors propose a new recursive rounding approach which can produce solutions that are close to optimal and, most importantly, the complexity of the new approach is polynomial. 117 According to current dual- Vth soution to update the delays of all LVT cells in the MIP problem CPLEX solves the updated MIP problem, and gives a new dual-Vth solution Check the timing of the new netlist by PrimeTime Is there any timing violation? Extract the delays of all LVT cells by PrimeTime Update the dual-Vth netlist N Y START STOP Figure 7.2 Flowchart of an iterative power optimization procedure. 118 BIBLIOGRAPHY [1] BPTM: Berkeley Predictive Technology Model. http://www- device.eecs.berkeley.edu/~ptm/. [2] http://www.synopsys.com/products/astro/astro.html. [3] http://www.synopsys.com/products/unified_synthesis/unified_synthesis.html. [4] http://www.synopsys.com/products/analysis/primetime_ds.html. [5] http://www.synopsys.com/products/solutions/galaxy/power/power.html. [6] A. Abdollahi, F. Fallah, and M. Pedram, "Leakage Current Reduction in CMOS VLSI Circuits by Input Vector Control," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 2, pp. 140-154, 2004. [7] V. D. Agrawal, "Low Power Design by Hazard Filtering," in Proc. 10th International Conference on VLSI Design, 1997, pp. 193-197. [8] V. D. Agrawal, M. L. Bushnell, G. Parthasarathy, and R. Ramadoss, "Digital Circuit Design for Minimum Transient Energy and a Linear Programming Method," in Proc. of the 12th International Conference on VLSI Design, 1999, pp. 434-439. [9] B. Amelifard, A. Afzali-Kusha, and A. Khadernzadeh, "Enhancing the Efficiency of Cluster Voltage Scaling Technique for Low-Power Application," in IEEE International Symposium on Circuits and Systems, 2005, pp. 1666-1669 [10] H. Ananthan, C. H. Kim, and K. Roy, "Larger-than-Vdd Forward Body Bias in Sub-0.5V Nanoscale CMOS," in Proc. of the International Symposium onLow Power Electronics and Design, 2004, pp. 8-13. [11] H. Ananthan, C. H. Kim, and K. Roy, "Larger-than-Vdd Forward Body Bias in Sub-0.5V Nanoscale CMOS," in Proc. of the 2004 International Symposium on Low Power Electronics and Design, 2004, pp. 8-13. [12] M. H. Anis, M. K. Mahmoud, and M. I. Elmasry, "Efficient Gate Clustering for MTCMOS Circuits," in Proc. of the 14th IEEE International Conference on ASIC/SOC, 2001, pp. 34-38. 119 [13] V. K. Arnim, E. Borinski, P. Seegebrecht, H. Fiedler, R. Brederlow, R. Thewes, J. Berthold, and C. Pacha, "Efficiency of Body Biasing in 90-nm CMOS for Low- Power Digital Circuits," IEEE Journal of Solid-State Circuits, vol. 40, no. 7, pp. 1549-1556, 2005. [14] M. R. C. M. Berkelaar and J. A. G. Jess, "Gate Sizing in MOS Digital Circuits with Linear Programming," in Proc. European Design Automation Conference, 1990, pp. 217-221. [15] Z. Bo, D. Blaauw, D. Sylvester, and K. Flautner, "The Limit of Dynamic Voltage Scaling and Insomniac Dynamic Voltage Scaling," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 13, no. 11, pp. 1239-1252, 2005. [16] M. Borah, R. M. Owens, and M. J. Irwin, "Transistor Sizing for Low Power CMOS Circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 6, pp. 665-671, 1996. [17] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De, "Parameter Variations and Impact on Circuits and Microarchitecture," in Proc. Design Automation Conference, 2003, pp. 338-342. [18] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Testing. Boston: Springer, 2000. [19] B. H. Calhoun, F. A. Honore, and A. Chandrakasan, "Design Methodology for Fine-Grained Leakage Control in MTCMOS," in Proc. of the International Symposium on Low Power Electronics and Design, 2003, pp. 104-109. [20] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design. Boston: Kluwer Academic Publishers, 1995. [21] H. Chang and S. S. Sapatnekar, "Full-Chip Analysis of Leakage Power under Process Variations, Including Spatial Correlations," in Proc. Design Automation Conference, 2005, pp. 523-528. [22] X. Chang, D. Fan, Y. Han, Z. Zhang, and X. Li, "Fast Algorithm for Leakage Power Reduction by Input Vector Control," in Proc. of the 6th International Conference on ASIC, 2005, pp. 14-18. [23] C. Chen and M. Sarrafzadeh, "Simultaneous Voltage Scaling and Gate Sizing for Low-Power Design," IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, vol. 49, no. 6, pp. 400-408, 2002. [24] O. Coudert, "Gate Sizing for Constrained Delay/Power/Area Optimization," IEEE Transactions on VLSI Systems, vol. 5, no. 4, pp. 465-472, 1997. 120 [25] O. Coudert, R. Haddad, and S. Manne, "New Algorithms for Gate Sizing: A Comparative Study," in Proc. Design Automation Conference, 1996, pp. 734-739. [26] A. Davoodi and A. Srivastava., "Probabilistic Dual-Vth Optimization Under Variability," in Proc. of the International Symposium on Low Power Electronics and Design, 2005, pp. 143-147. [27] M. Elgebaly and M. Sachdev, "Efficient Adaptive Voltage Scaling System through On-Chip Critical Path Emulation," in Proc. of the 2004 International Symposium on Low Power Electronics and Design, 2004, pp. 375-380. [28] W. Elgharbawy, P. Golconda, A. Kumar, and M. Bayoumi, "A New Gate-Level Body Biasing Technique for PMOS Transistors in Subthreshold CMOS Circuits," in IEEE International Symposium on Circuits and Systems, 2005, pp. 4697-4700. [29] A. Forestier and M. R. Stan, "Limits to Voltage Scaling from the Low Power Perspective," in Proc. of the 13th Symposium on Integrated Circuits and Systems Design, 2000, pp. 365-370. [30] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Modeling Language for Mathematical Programming. South San Francisco, California: The Scientific Press, 1993. [31] F. Gao and J. P. Hayes, "Total Power Reduction in CMOS Circuits via Gate Sizing and Multiple Threshold Voltages," in Proc. Design Automation Conference, 2005, pp. 31-36. [32] F. Gao and J. P. Hayes, "Exact and Heuristic Approaches to Input Vector Control for Leakage Power Reduction," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 25, no. 11, pp. 2564-2571, 2006. [33] M. Hashimoto, H. Onodera, and K. Tamaru, "A Power Optimization Method Considering Glitch Reduction by Gate Sizing," in Proc. of the International Symposium on Low Power Electronics and Design, 1998, pp. 221-226. [34] F. Hu, "Process-Variation-Resistant Dynamic Power Optimization for VLSI Circuits," PhD Thesis, Auburn, Alabama: Auburn University, May 2006. [35] F. Hu and V. D. Agrawal, "Dual-Transition Glitch Filtering in Probabilistic Waveform Power Estimation," in Proc. of the 15th Great Lakes Symposium on VLSI, 2005, pp. 357?360. [36] F. Hu and V. D. Agrawal, "Enhanced Dual-Transition Probabilistic Power Estimation with Selective Supergate Analysis," in Proc. of the 23rd International Conference on Computer Design, 2005, pp. 366?369. 121 [37] F. Hu and V. D. Agrawal, "Input-Specific Dynamic Power Optimization for VLSI Circuits," in Proc. of the International Symposium on Low Power Electronics and Design, 2006, pp. 232-237. [38] E. Jacobs and M. Berkelaar, "Using Gate Sizing to Reduce Glitch Power," in Proc. of the PRORISC/IEEE Workshop on Circuits, Systems and Signal Processing, 1996, pp. 183-188. [39] E. Jacobs and M. Berkelaar, "Gate Sizing Using A Statistical Delay Model," in Proc. Design, Automation and Test in Europe Conference and Exhibition, 2000, pp. 283-290. [40] M. C. Johnson, D. Somasekhar, C. Lih-Yih, and K. Roy, "Leakage Control With Efficient Use of Transistor Stacks in Single Threshold CMOS," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 10, no. 1, pp. 1-5, 2002. [41] K. R. Kantipudi and V. D. Agrawal, "A Reduced Complexity Algorithm for Minimizing N-Detect Tests," in Proc. of the 20th International Conference on VLSI Design, 2007, pp. 492-497. [42] J. T. Kao and A. P. Chandrakasan, "Dual-Threshold Voltage Techniques for Low- Power Digital Circuits," IEEE Journal of Solid-State Circuits, vol. 35, no. 7, pp. 1009-1018, July 2000. [43] W. H. Kao, N. Fathi, and L. Chia-Hao, "Algorithms for Automatic Transistor Sizing in CMOS Digital Circuits," in Proc. Design Automation Conference, 1985, pp. 781-784. [44] A. Keshavarzi, S. Ma, S. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar, and V. De, "Effectiveness of Reverse Body Bias for Leakage Control in Scaled Dual Vt CMOS ICs," in Proc. of the International Symposium on Low Power Electronics and Design, 2001, pp. 207-212. [45] M. Ketkar and S. S. Sapatnekar, "Standby Power Optimization via Transistor Sizing and Dual Threshold Voltage Assignment," in Proc. International Conference on Computer-Aided Design, 2002, pp. 375-378. [46] S. Kim, J. Kim, and S. Y. Hwang, "New Path Balancing Algorithm for Glitch Power Reduction," IEE Proc. of Circuits, Devices and Systems, vol. 148, no. 3, pp. 151-156, 2001. [47] P. Ko, J. Huang, Z. Liu, and C. Hu, "BSIM3 for Analog and Digital Circuit Simulation," in Proc. IEEE Symposium on VLSI Technology CAD, 1993, pp. 400- 429. 122 [48] S. Kompella, S. Mao, Y. T. Hou, and H. D. Sherali, "Path Selection and Rate Allocation for Video Streaming in Multihop Wireless Networks," in Proc. Military Communications Conference, 2006, pp. 1-7. [49] S. Kompella, S. Mao, Y. T. Hou, and H. D. Sherali, "Cross-Layer Optimized Multipath Routing for Video Communications in Wireless Networks," IEEE Journal on Selected Areas in Communications, Special Issue on Cross-Layer Optimized Wireless Multimedia Communications, May 2007. [50] H. W. Kuhn and A. W. Tucker, "Nonlinear Programming," in Proc. 2nd Berkeley Symposium on Mathematical, Statistics and Probabilistics, Berkeley, 1951, pp. 481-492. [51] V. Liberali, E. Malavasi, and D. Pandini, "Automatic Generation of Transistor Stacks for CMOS Analog Layout," in Proc. IEEE International Symposium on Circuits and Systems, 1993, pp. 2098-2101. [52] Y. Lin and Q. Gang, "A Combined Gate Replacement and Input Vector Control Approach for Leakage Current Reduction," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 14, no. 2, pp. 173-182, 2006. [53] M. Liu, W.-S. Wang, and M. Orshansky, "Leakage Power Reduction by Dual-Vth Designs Under Probabilistic Analysis of Vth Variation," in Proc. of the International Symposium on Low Power Electronics and Design, 2004, pp. 2-7. [54] X. Liu and S. Mourad, "Performance of Submicron CMOS Devices and Gates with Substrate Biasing," in Proc. of the 2000 IEEE International Symposium on Circuits and Systems, 2000, pp. 9-12. [55] Y. Liu and Z. Gao, "Timing Analysis of Transistor Stack for Leakage Power Saving," in Proc. of the 9th International Conference on Electronics, Circuits and Systems, 2002, pp. 41-44. [56] Y. Lu and V. D. Agrawal, "Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing," in Proc. of the International Workshop on Power and Timing Modeling, Optimization and Simulation, 2005, pp. 217?226. [57] Y. Lu and V. D. Agrawal, "CMOS Leakage and Glitch Power Minimization for Power-Performance Tradeoff," Journal of Low Power Electronics, vol. 2, no. 3, pp. 378-387, Dec. 2006. [58] Y. Lu and V. D. Agrawal, "Statistical Leakage and Timing Optimization for Submicron Process Variation," in Proc. of the 20th International Conference on VLSI Design, 2007, pp. 439-444. 123 [59] V. Mahalingam and N. Ranganathan, "A Nonlinear Programming Based Power Optimization Methodology for Gate Sizing and Voltage Selection," in Proc. IEEE Computer Society Annual Symposium on VLSI, 2005, pp. 180-185. [60] N. R. Mahapatra, S. V. Garimella, and A. Tarbeen, "An Empirical and Analytical Comparison of Delay Elements and a New Delay Element Design," in Proc. IEEE Computer Society Workshop on VLSI, 2000, pp. 81 ? 86. [61] M. Mani, A. Devgan, and M. Orshansky, "An Efficient Algorithm for Statistical Minimization of Total Power Under Timing Yield Constraints," in Proc. Design Automation Conference, 2005, pp. 309-314. [62] M. Mani and M. Orshansky, "A New Statistical Optimization Algorithm for Gate Sizing," in Proc. International Conference on Computer Design, 2004, pp. 272-277. [63] M. Miyazaki, G. Ono, and T. Kawahara, "Optimum Threshold-Voltage Tuning for Low-Power, High-Performance Microprocessor," in Proc. IEEE International Symposium on Circuits and Systems, 2005, pp. 17-20. [64] S. Mukhopadhyay, C. Neau, R. T. Cakici, A. Agarwal, C. H. Kim, and K. Roy, "Gate Leakage Reduction for Scaled Devices Using Transistor Stacking," IEEE Transactions on VLSI Systems, vol. 11, no. 4, pp. 716-730, 2003. [65] S. Mukhopadhyay and K. Roy, "Accurate Modeling of Transistor Stacks to Effectively Reduce Total Standby Leakage in Nano-Scale CMOS Circuits," in Proc. Symposium on VLSI Circuits, 2003, pp. 53-56. [66] S. Mukhopadhyay and K. Roy, "Modeling and Estimation of Total Leakage Current in Nano-Scaled CMOS Devices Considering the Effect of Parameter Variation," in Proc. of the International Symposium on Low Power Electronics and Design, 2003, pp. 172-175. [67] A. K. Murugavel and N. Ranganathan, "Gate Sizing and Buffer Insertion Using Economic Models for Power Optimization," in Proc. of the 17th International Conference on VLSI Design, 2004, pp. 195-200. [68] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, "1-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS," IEEE Journal of Solid-State Circuits, vol. 30, no. 8, pp. 847-854, 1995. [69] R. Naidu and E. T. A. F. Jacobs, "Minimizing Standby Leakage Power in Static CMOS Circuits," in Proc. Design, Automation and Test in Europe, 2001, pp. 370- 376. [70] D. Nguyen, A. Davare, M. Orshansky, D. Chinney, B. Thompson, and K. Keutzer, "Minimization of Dynamic and Static Power Through Joint Assignment of 124 Threshold Voltages and Sizing Optimization," in Proc. of the International Symposium on Low Power Electronics and Design, 2003, pp. 158-163. [71] S. M. Nowick and D. L. Dill, "Exact Two-Level Minimization of Hazard-Free Logic with Multiple-Input Changes," in IEEE/ACM International Conference on Computer-Aided Design, 1992, pp. 626-630. [72] P. Pant, R. K. Roy, and A. Chatterjee, "Dual-Threshold Voltage Assignment with Transistor Sizing for Low Power CMOS circuits," IEEE Transactions on VLSI Systems, vol. 9, no. 2, pp. 390-394, April 2001. [73] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A Design Perspective. Upper Saddle River, NJ: Prentice Hall, 2003. [74] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program," in Proc. of the 16th International Conference on VLSI Design, 2003, pp. 527-532. [75] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Design of Variable Input Delay Gates for Low Dynamic Power Circuits," in Proc. of the International Workshop on Power and Timing Modeling, Optimization and Simulation, 2005, pp. 436?445. [76] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Variable Input Delay CMOS Logic for Low Power Design,," in Proc. of the 18th International Conference on VLSI Design, 2005, pp. 596-604. [77] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Transistor Sizing of Logic Gates to Maximize Input Delay Variability," Journal of Low Power Electronics, vol. 2, no. 1, pp. 121?128, Apr. 2006. [78] N. Ranganathan and A. K. Murugavel, "A Microeconomic Model for Simultaneous Gate Sizing and Voltage Scaling for Power Optimization," in Proc. International Conference on Computer Design, 2003, pp. 276-281. [79] R. Rao, A. Devgan, D. Blaauw, and D. Sylvester, "Parametric Yield Estimation Considering Leakage Variability," in Proc. Design Automation Conference, 2004, pp. 442-447 [80] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, "Statistical Estimation of Leakage Current Considering Inter- and Intra-Die Process Variation," in Proc. of the International Symposium on Low Power Electronics and Design, 2003, pp. 84- 89. [81] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, "Statistical Analysis of Subthreshold Leakage Current for VLSI Circuits," IEEE Transactions on VLSI Systems, vol. 12, no. 2, pp. 131-139, Feb. 2004. 125 [82] T. Sakurai and A. R. Newton, "Alpha-Power Law MOSFET Model and its Applications to CMOS Inverter Delay and Other Formulas," IEEE Journal of Solid- State Circuits, vol. 25, no. 2, pp. 584?594, Feb. 1990. [83] C. V. Schimpfle, A. Wroblewski, and J. A. Nossek, "Transistor Sizing for Switching Activity Reduction in Digital Circuits," in Proc. European Conference on Theory and Design, 1999. [84] W.-T. Shiue, "Leakage Power Estimation and Minimization in VLSI Circuits," in Proc. of the International Symposium on Circuits and Systems, 2001, pp. 178-181. [85] J. M. Shyu, A. Sangiovanni-Vincentelli, J. P. Fishburn, and A. E. Dunlop, "Optimization-Based Transistor Sizing," IEEE Journal of Solid-State Circuits, vol. 23, no. 2, pp. 400-409, 1988. [86] J. Singh, V. Nookala, L. Zhi-Quan, and S. Sapatnekar, "Robust Gate Sizing by Geometric Programming," in Proc. Design Automation Conference, 2005, pp. 315- 320. [87] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, "Modeling and Analysis of Leakage Power Considering Within-Die Process Variations," in Proc. International Symposium on Low Power Electronics and Design, 2002, pp. 64-67. [88] A. Srivastava, S. Shah, D. Sylvester, D. Blaauw, and S. Director, "Accurate and Efficient Gate-Level Parametric Yield Estimation Considering Correlated Variations in Leakage Power and Performance," in Proc. Design Automation Conference, 2005, pp. 535-540. [89] A. Srivastava, D. Sylvester, and D. Blaauw, "Statistical Optimization of Leakage Power Considering Process Variations Using Dual-Vth and Sizing," in Proc. Design Automation Conference, 2004, pp. 773-778. [90] M. Sumita, S. Sakiyama, M. Kinoshita, Y. Araki, Y. Ikeda, and K. Fukuoka, "Mixed Body Bias Techniques with Fixed Vt and Ids Generation Circuits," IEEE Journal of Solid-State Circuits, vol. 40, no. 1, pp. 60-66, 2005. [91] J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V. De, "Dynamic Sleep Transistor and Body Bias for Active Leakage Power Control of Microprocessors," IEEE Journal of Solid-State Circuits, vol. 38, no. 11, pp. 1838- 1845, 2003. [92] S. Uppalapati, "Low Power Design of Standard Cell Digital VLSI Circuits," Master's Thesis, New Brunswick, New Jersey: Rutgers University, Oct. 2004. [93] S. Uppalapati, M. L. Bushnell, and V. D. Agrawal, "Glitch-Free Design of Low Power ASICs Using Customized Resistive Feedthrough Cells," in Proc. of the 9th VLSI Design and Test Symposium, 2005, pp. 41-48. 126 [94] K. Usami and M. Horowitz, "Clutser Vlotage Scaling Technique for Low-Power Design," in Proc. International Symposium on Low Power Electronics and Design, 1995, pp. 3-8. [95] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanzawa, M. Ichida, and K. Nogami, "Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to A Media Processor," IEEE Journal of Solid-State Circuits, vol. 33, no. 3, pp. 463-472, 1998. [96] Q. Wang and S. B. K. Vrudhula, "Static Power Optimization of Deep Submicron CMOS Circuits for Dual VT Technology," in Proc. International Conference on Computer-Aided Design, 1998, pp. 490-496. [97] L. Wei, Z. Chen, M. Johnson, and K. Roy, "Design and Optimization of Low Voltage High Performance Dual Threshold CMOS Circuits," in Proc. Design Automation Conference, 1998, pp. 489-494. [98] L. Wei, Z. Chen, K. Roy, M. C. Johnson, Y. Ye, and V. K. De, "Design and Optimization of Dual-Threshold Circuits for Low-Voltage Low-Power Applications," IEEE Transactions on VLSI Systems, vol. 7, no. 1, pp. 16?24, Mar. 1999. [99] L. Wei, Z. Chen, K. Roy, Y. Ye, and V. De, "Mixed-Vth (MVT) CMOS Circuit Design Methodology for Low Power Applications," in Proc. Design Automation Conference, 1999, pp. 430-435. [100] L. Wei, K. Roy, and V. K. De, "Low Voltage Low Power CMOS Design Techniques for Deep Submicron ICs," in Proc. of the 13th International Conference on VLSI Design, 2000, pp. 24-29. [101] L. Wei, K. Roy, and C.-K. Koh, "Power Minimization by Simultaneous Dual-Vth Assignment and Gate-Sizing," in Proc. of the IEEE Custom Integrated Circuits Conference, 2000, pp. 413-416. [102] N. H. E. Weste and D. Harris, CMOS VLSI Design: A Circuits and System Perspective, 3rd ed.: Addison Wesley, 2004. [103] H.-S. Won, K.-S. Kim, and K.-O. Jeong, "An MTCMOS Design Methodology and Its Application to Mobile Computing," in Proc. of the International Symposium on Low Power Electronics and Design, 2003, pp. 110-115. [104] A. Wroblewski, C. V. Schimpfle, and J. A. Nossek, "Automated Transistor Sizing Algorithm for Minimizing Spurious Switching Activities in CMOS Circuits," in Proc. IEEE International Symposium on Circuits and Systems, 2000, pp. 291-294. 127 [105] A. C. H. Wu, N. Vander Zanden, and D. Gajski, "A New Algorithm for Transistor Sizing in CMOS Circuits," in Proc. European Design Automation Conference, 1990, pp. 589-593. [106] Y. Ye, S. Borkar, and V. De, "A New Technique for Standby Leakage Reduction in High-Performance Circuits," in Proc. Symposium on VLSI Circuits, 1998, pp. 40-41. [107] C. Yeh and Y.-S. Kang, "Cell-Based Layout Techniques Supporting Gate-level Voltage Scaling for Low Power," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 9, no. 6, pp. 983-986, 2001. [108] B. Yu, "A Novel Dynamic Power Cutoff Technology (DPCT) for Active Leakage Reduction in Deep Submicron VLSI CMOS Circuits," PhD Thesis, New Brunswick, New Jersey: Rutgers, The State University of New Jersey, October 2007. [109] B. Yu and M. L. Bushnell, "A Novel Dynamic Power Cutoff Technique (DPCT) for Active Leakage Reduction in Deep Submicron CMOS Circuits," in Proc. of the International Symposium on Low Power Electronics and Design, 2006.