Simulation Based Power Estimation For Digital CMOS Technologies Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. This thesis does not include proprietary or classified information. Jins Davis Alexander Certificate of Approval: Victor Nelson Professor Electrical and Computer Engineering Vishwani D. Agrawal, Chair James J. Danaher Professor Electrical and Computer Engineering Adit Singh James B. Davis Professor Electrical and Computer Engineering George T. Flowers Graduate Dean Graduate School Simulation Based Power Estimation For Digital CMOS Technologies Jins Davis Alexander A Thesis Submitted to the Graduate Faculty of Auburn University in Partial Fulfillment of the Requirements for the Degree of Master of Science Auburn, Alabama Decemeber 19, 2008 Simulation Based Power Estimation For Digital CMOS Technologies Jins Davis Alexander Permission is granted to Auburn University to make copies of this thesis at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Jins Davis Alexander, son of Mr. T. D. Alexander and Mrs. Mary Alexander, was born in Kottayam, Kerala, India. He did his schooling in Our Own English High School, Dubai, United Arab Emirates. He earned the degree of Bachelor of Technology in Electron- ics and Communication Engineering from National Institute of Technology, Calicut, India (formerly known as Regional Engineering College, Calicut) in 1999. He joined the graduate programme in Electrical & Computer Engineering at Auburn University in August 2005. iv Thesis Abstract Simulation Based Power Estimation For Digital CMOS Technologies Jins Davis Alexander Master of Science, Decemeber 19, 2008 (B.Tech., National Institute Technology of Calicut, 2003) 87 Typed Pages Directed by Vishwani D. Agrawal The estimation of power in digital CMOS circuits has become a significant problem, especially for present day semiconductor technologies. Finding a balance between opposing factors of estimation accuracy and computation speed makes the estimation procedures more challenging. An area that has been neglected is the breakdown analysis of various components of power. In this thesis, we discuss algorithms and a tool for a ?total? power estimation tool. This tool does an event-driven simulation of vectors, either supplied or randomly generated at user?s option. All components of power, namely, dynamic (separate logic and glitch components), leakage, short-circuit and clock power are estimated. Peak, minimum and average values for the given vector set are also determined. For simulation, first delay, node capacitance, and input state specific leakage are determined for each gate in the given technology, temperature and supply voltage, using a circuit-level simulator (Spice) and saved. This gives us the necessary accuracy and speed to the power estimation. To demonstrate applications of the tool, we examine leakage variation with temper- ature, variation of short-circuit power with rise time and output load capacitance, and the quadratic reduction in logic transition power. Glitch power is shown to reduce faster v than the quadratic function of voltage because the increased gate inertia suppresses many glitches. Since in any design technique to reduce one component of power, in general, affects other components, such a tool is useful. We analyze the effect of process variation in estimation of dynamic power dissipa- tion. Taking a novel approach, we model gates with given lower and upper bounds on delays. For given input vectors, we first find logic transitions using zero-delay simulation. Our algorithms then determine the ambiguity (transient) interval during which transitions occur, and the maximum and minimum number of possible transitions. Computation of these for all gates requires a linear-time analysis of each vector-pair. Weighting with node capacitances estimates lower and upper bounds on dynamic power. Results compare favor- ably with power analysis using Mont Carlo simulation, which requires significantly more computing resources. Monte Carlo simulation of ISCAS benchmark circuit c880 for 1000 random vectors (999 vector-pairs) demonstrates the advantages of the bounded delay power analysis. Each vector-pair was simulated for 1000 sample circuits. Sample circuits had gate delays varying ?20% about the nominal values for the TSMC025 2.5V CMOS process [1]. For a vector period of 1000 ps minimum power was 1.424 mW and maximum power was 11.598 mW. Monte Carlo simulation runs took 262.75 CPU s on a Sun Sparc Ultra 10 with 4GB shared memory system. Using the same ?20% variability and the same 1000 vectors, the bounded delay analysis obtained a bound (1.35 mW, 11.89 mW) for power in just 0.3 CPU s. Con- sidering that c880 is a small circuit and the impact of process variation on power continues to assume greater importance, this computational efficiency is a strong motivation for using the method developed in this research. vi Acknowledgments I would like to gratefully acknowledge the assistance, encouragement, support, patience and dedication provided to me by my adviser, Dr. Vishwani D. Agrawal, during my stay at Auburn University. I thank Dr. Victor Nelson and Dr. Adit Singh for being on my thesis committee and for their valuable suggestions. I would also like to thank my colleagues Nitin Yogi, Khushboo Sheth, Hillary Grimes, Fan Wang, Kim and Wei Jang for all the helpful discussions throughout this research. Special thanks to all my friends in Auburn for making my graduate study a pleasant experience. Most importantly, my heartfelt gratitude and thanks to my parents and my sister, whose encouragement and love has always been my strength in achieving my goals. Support and recognition of my research by the Wireless Engineering Research and Education Center (WEREC) at Auburn University was very helpful in completing this work. vii Style manual or journal used Journal of Approximation Theory (together with the style known as ?aums?). Bibliograpy follows van Leunen?s A Handbook for Scholars. Computer software used The document preparation package TEX (specifically LATEX) together with the departmental style-file aums.sty. viii Table of Contents List of Figures xi List of Tables xiii 1 Introduction 1 1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Separation of power components . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Process variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Original Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Prior Work on Power Analysis 6 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.2 Power Dissipation in CMOS circuits . . . . . . . . . . . . . . . . . . . . . . 6 2.2.1 Dynamic power dissipation . . . . . . . . . . . . . . . . . . . . . . . 6 2.2.2 Short circuit power dissipation . . . . . . . . . . . . . . . . . . . . . 8 2.2.3 Leakage power dissipation . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3 Existing Power Estimation Techniques . . . . . . . . . . . . . . . . . . . . . 11 2.3.1 Simulation based techniques . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2 Probabilistic techniques . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.3.3 Statistical methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.4 Other work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.4 Process Variation and Power Estimation . . . . . . . . . . . . . . . . . . . . 18 2.4.1 Delay variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4.2 Existing work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Power Estimation 22 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2 Event Driven Logic Simulation . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.3 Glitch Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.4 Dynamic Power Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.5 Static Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.6 Short Circuit Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7 Clock Power and Flip Flop Cell Power . . . . . . . . . . . . . . . . . . . . . 30 3.8 Saving Flip Flop Cell Power . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 ix 3.9 Test Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 3.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4 Power Estimation Results 37 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.3 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5 Bounded Delay and Dynamic Power Estimation 44 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.2 Background and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 5.3 Signal Transition Analysis of Bounded Delay Gates . . . . . . . . . . . . . . 45 5.3.1 Ambiguity intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.3.2 Multiple ambiguity regions . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Problem Depiction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 5.5 Maximum Number of Transitions . . . . . . . . . . . . . . . . . . . . . . . . 52 5.6 Minimum Number of Transitions . . . . . . . . . . . . . . . . . . . . . . . . 55 5.7 Dynamic Power Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 6 Bounded Delay Analysis Results 59 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.2 Experimental Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.3 Benchmark Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7 Conclusion 67 Bibliography 69 x List of Figures 1.1 Monte Carlo analysis of power dissipation in c880 circuit. 999 random vector-pairs were simulated for 1000 circuit samples. . . . . . . . . . . . . . . . . . . . . . . 3 2.1 The charge flow for an inverter : (a) dynamic charing of the load capacitance, (b) discharging of load capacitance. . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Short circuit current flow during the switching of transistors. . . . . . . . . . . . 9 2.3 Leakage current components for an inverter. . . . . . . . . . . . . . . . . . . . . 11 3.1 Short-circuit scenario: Vi(t) is a rising waveform applied to the input of an inverter with Vo(t) the corresponding output waveform. Isc(t) is the short circuit current waveform that peaks at time t2 when the PMOS transistor goes from linear to saturation region. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2 (a) A standard D flip flop and (b) D flip flop with clock gating. . . . . . . . . . . 32 3.3 (a) A standard scan cell (SFF) and (b) A scan cell with Q output gated with a scan enable signal (SE) (SE = 1 is normal mode). . . . . . . . . . . . . . . . . . . . 34 3.4 A scan cell with output Q gated with a scan enable Signal (SE) (SE = 1 is normal mode) and with clock gating. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 Output voltage waveforms for an inverter with a NAND and NOR load in (a) 0.25 micron (b) 90 nm technologies. . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2 Leakage power of c880 in 90nm technology for 1000 random vectors. . . . . . . . 41 4.3 Effect of gate sizing on short circuit power dissipation. . . . . . . . . . . . . . . . 42 5.1 Ambiguity regions of a rising signal (a) and a steady state signal (b) respectively. . 45 5.2 A four-input AND gate with delay bounds (2, 4). Shaded regions are ambiguity intervals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.3 A three-input AND gate depicting multiple ambiguity intervals. . . . . . . . . . . 49 5.4 Two-input AND gate transitions. . . . . . . . . . . . . . . . . . . . . . . . . . 51 xi 5.5 Effect of modification factor k on the second upper bound. . . . . . . . . . . . . 54 5.6 Filtering of transitions in a two-input AND gate. . . . . . . . . . . . . . . . . . 55 5.7 Estimating lower bound on output transitions of a 2-input AND gate. . . . . . . . 56 6.1 Monte Carlo simulation versus bounded delay analysis for c880. Each point rep- resents one vector-pair. One hundred sample circuits with nominal ? 20% delay variation were simulated and for each vector-pair (a) maximum and (b) minimum power was determined. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 Monte Carlo simulation versus bounded delay analysis for c880. Regression graph for average power. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.3 Transition statistics for high-activity gate 1407 in c2670 for a random vector-pair. Bounded delay analysis: (a) delay bounds (7ps, 12ps), mintran = 0, maxtran = 8, (b) delay bounds (11ps, 33ps), mintran = 0, maxtran = 4. Histograms were obtained by Monte Carlo simulation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.4 A comparison of the maximum power distribution for a vector-pair obtained by bounded delay analysis and Monte Carlo simulation for ISCAS ?85 benchmark cir- cuits (a) c880 and (b) c5315. The maximum power values are for 1000 random vector pairs. The Monte Carlo simulation used 1000 circuit samples with random delays to find the maximum power for each vector pair. . . . . . . . . . . . . . . 64 xii List of Tables 3.1 Power dissipation results for ISCAS?89 benchmark circuit s5378. . . . . . . . . . . 33 3.2 Power dissipation in ISCAS?89 benchmark circuit s5378 with scan cells of various types when the circuit is operated in normal mode. . . . . . . . . . . . . . . . . 35 3.3 Power dissipation in ISCAS?89 benchmark circuit s5378 with scan cells of various types when the circuit is being tested. . . . . . . . . . . . . . . . . . . . . . . . 35 4.1 Comparison with SPICE using an INVERTER with a NAND and NOR load. . . . 39 4.2 1-BIT ADDER Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 4.3 Average power dissipation for simulation of ISCAS Benchmark circuits using 1000 random vectors in 0.25 micron technology at a supply voltage of 2.5 volts. . . . . 41 6.1 Per vector energy consumption in picojoule in benchmark circuits for 1000 random vectors by Monte Carlo simulation of 1000 sample circuits and bounded delay analysis. 60 xiii Chapter 1 Introduction The increasing use of portable and battery operated devices has raised the demand for low power devices. With higher speed and density of CMOS circuits, power dissipation has become a growing concern. Also, newer technologies are changing the balance of power between various dissipation mechanisms. Thus, a more detailed power estimator is required to facilitate low power design and power optimization of CMOS circuits. Information on how the different components of power vary with different technologies is an important factor for any CMOS designer. Thus, one can understand the significance of a tool that can accurately and efficiently separate each power component. However, estimation of nominal power for a particular technology is not sufficient. Manufacturing process variation, that is, variations in threshold voltage, channel length, etc., and also the effect of the environmental variations in power supply and temperature should also be studied. Process variation can cause uncertainty in delay values, which produce device to device differences in power dissipation. Delay variation effect can be strongly seen in dynamic power estimation. The dynamic power consumed by a digital CMOS circuit depends on logic and glitch transitions, the latter being a function of delays in the circuit. We often take fixed (nominal or worst- case) delay values for design and analysis. This is not correct for two reasons. First, the delay of a gate changes depending on signal states, device temperature, power supply fluctuations, and interconnect coupling noise. Second, and this applies to today?s nanoscale technologies, there are wider process variations. 1 1.1 Motivation 1.1.1 Separation of power components Most existing tools either estimate the total power or are specific to a particular com- ponent of power. However, in the process of optimizing circuits for low power, a designer will be interested in knowing the effects of specific design techniques on each component of power. Different power dissipation mechanisms often have opposing requirements such that a reduction in one power component can simultaneously increase another component. Thus, it is important to provide the designer with separate information about the effect on each of these components besides the total power. One of the motivations of this thesis is to discuss the implementation of a single efficient tool that can estimate the various components of power like dynamic, leakage, short circuit and clock power and then estimate the effect of each component on the overall total power. 1.1.2 Process variation A second objective in this thesis, is to derive new techniques to estimate dynamic power in the presence of process variability caused variation in gate delays. Currently, a common approach is to use Monte Carlo simulation of sample circuits taking into account the variation in delays. However, this is a time consuming approach. Thus, the need for a faster approach is evident. This can be easily observed from the histogram in Figure 1.1. It shows the Monte Carlo simulation of ISCAS benchmark circuit c880 [2] for 1000 random vectors (999 vector-pairs). Each vector-pair was simulated for 1000 sample circuits. Sample circuits had gate delays varying about the nominal values. The details of technology, process variation, and computing platform are discussed in Section 6. For a vector period of 1000 2 Figure 1.1: Monte Carlo analysis of power dissipation in c880 circuit. 999 random vector-pairs were simulated for 1000 circuit samples. picoseconds minimum power was 1.424 mW and maximum power was 11.598 mW. Monte Carlo simulation runs took 262.75 CPU s. Using the same variability, the bounded delay analysis developed in the present work obtained a bound (1.35 mW, 11.89 mW) in just 0.3 CPU s. Considering that c880 is a small circuit and the impact of process variation on power continues to assume greater importance, this computational efficiency is a strong motivation for the present work. 1.2 Problem Statement The problem solved in this thesis: Find a method to accurately and efficiently estimate and separate the different power components for digital CMOS technologies. Also, develop an algorithm that uses the bounded delay model for dynamic power analysis and thus takes into account manufacturing process variability in gate delays. 3 1.3 Original Contributions We have developed a gate level power estimation tool that can accurately and effi- ciently estimate various power components as well as the total power and provide useful information to a designer. The tool first computes SPICE based libraries from which it extracts important information for power estimation. Each power component is accurately separated so as to provide useful information for power optimization. For example, the tool can provide maximum and minimum leakage vectors for a particular circuit which can be used to keep the circuit on a standby mode. To deal with the variability, significant advances have been made in logic simulation, timing analysis and delay testing areas through the use of the bounded delay [82] model. The delay of a gate is expressed as a range, typically known as (min, max). In this work, we adopt the bounded delay model to develop a dynamic power analysis method. Although we have only focused on the delay variations, other parameters such as capacitance and leakage current may also be similarly treated in the future. The analysis may also be extended to include leakage power. 1.4 Organization of the thesis We give a brief introduction to power analysis in Chapter 2. We survey the existing power estimation techniques and the research done in this area as found in the literature. Chapter 3 discusses in detail the power estimation techniques we have used in implementing our gate level power estimator. In Chapter 4, we describe the new bounded delay analysis algorithm to estimate dynamic power in the presence of delay uncertainties. The concepts of bounded delays, ambiguity intervals and multiple ambiguity regions are discussed in detail. 4 We then give techniques to estimate the maximum and minimum number of transitions on a gate node of the circuit and use them to provide bounds on dynamic power estimation. We show the power analysis results of our gate level power estimator tool in Chapter 5. The results include verification through comparison against HSPICE results on benchmark circuits and some experimental analysis. In Chapter 6. we demonstrate the efficiency of the new bounded delay analysis approach in comparison to the conventional Monte Carlo simulation. Results include analysis of benchmark circuits. We conclude this work in Chapter 7 where we propose future work. 5 Chapter 2 Prior Work on Power Analysis 2.1 Introduction We begin with a review of the different mechanisms of power dissipation. We dis- cuss existing power estimation techniques and specify their limitations, which inspired our work. The existing methodologies to incorporate variation while estimating power are also discussed. Our focus is on digital CMOS circuits. 2.2 Power Dissipation in CMOS circuits There are three main sources of power dissipation in CMOS circuits: ? Dynamic power ? Short circuit power ? Leakage power 2.2.1 Dynamic power dissipation The dynamic power dissipation is defined as the power spent in charging or discharging of the nodal capacitances during a high to low or low to high transition at the output node. The nodal capacitance consists of the internal capacitance of the gate, the interconnect wire capacitance of the fanout net and the gate capacitances of the corresponding fanout gates. 6 The power consumed is given by [30]: Pdyn = 12?CloadVdd2f where ? Pdyn: dynamic power dissipation of the gate ? ?: activity factor of the gate ? Cload: load capacitance of the gate ? Vdd: supply voltage ? f: clock frequency The activity factor ? determines the amount of switching activity per clock period at the gate output. It lies in the range, 0.0 ? ? ? 1.0, with clock (? = 1.0) being the most active signal. It determines the amount of switching capacitance, which is an important factor in determining the dynamic power component. In general, the activity factor for a gate depends on the input vectors. The intrinsic gate capacitances, included in the total load capacitance, are non-linear functions of voltages applied to the devices [21, 45] and are dependent on the region of operation of the transistors. The dynamic energy consumed is due to the charging and discharging of the load capacitance. Figure 2.1 depicts the charging/discharging of the output capacitance CL of an inverter for a falling/rising input Vin and a rising/falling output voltage Vout, respectively. The dynamic component can be separated into logic power and glitch or hazard power. After inputs of a combinational logic block change, the output 7 L ON OFF outin L out (b) ON OFF in (a) V V C i(t) Gnd Vdd V C Gnd Vdd i(t) V Figure 2.1: The charge flow for an inverter : (a) dynamic charing of the load capacitance, (b) discharging of load capacitance. of a gate in the block may undergo a series of transitions before reaching its final steady state value. At most a single (i.e., one or zero) transition is necessary to reach the final value. This is known as the logic transition and contributes to the logic power component. The remaining unnecessary transitions due to unbalanced path delays in the circuit are called glitches or hazards and contribute to the glitch power. It is useful to know these two components of the dynamic power separately for several reasons. For a given circuit, the logic transitions set a lower bound on dynamic power, which can only be affected by changes in the logic design. Glitch power, on the other hand, depends on the physical design, path delays, etc., and can be reduced by other design techniques [4, 5, 74, 75, 76, 77, 78, 91, 92]. 2.2.2 Short circuit power dissipation A short circuit current flows during the period in which the transistors are switching state and are potentially in the ?ON? state, thus providing a conducting path from the 8 oi sc L C V (t) i (t) V (t) Vdd Gnd Figure 2.2: Short circuit current flow during the switching of transistors. supply to ground. This is equivalent to momentarily shorting the supply and ground rails. Fortunately, this happens for a very short duration within the rise or fall time of the input vector signals. The short circuit power dissipation also depends on the output loading capacitances, as they determine how much current will flow from the supply voltage to charge or discharge the capacitance. The remaining current that flows will be the short circuit current, and hence a larger output node capacitance would reduce the overall short circuit current. Thus the short circuit current is at a maximum for low capacitive loads, that is, when the output rise/fall times are much smaller than the input signal rise/fall times. Figure 2.2 shows the short circuit current path by a downward arrow. We will discuss this in greater detail in Section 3.6 where Figure 3.1 depicts the short circuit current waveform Isc(t) as a function of the input Vi(t). The short circuit current peak Iscmaxf occurs when the PMOS transistor is going from the linear region to saturation. It is clear from this that for longer input rise/fall times the total average short circuit current flow would be higher. The short circuit current model used for estimation will be discussed in Section 3.6. 9 2.2.3 Leakage power dissipation Not too long ago, the leakage power was neglected as being insignificant when compared to the dynamic power consumption. However, for nanoscale CMOS technologies now in use, leakage power has become a significant component of the total power. The leakage power can be attributed mainly to reverse biased pn junctions, the subthreshold leakage current and the gate tunneling effect (Figure 2.3). The reverse biased pn junction current is the static dissipation due to reverse biased diode leakage between the diffusion regions, wells and substrate [97]. In a submicron process, this component is quite small compared to subthreshold and gate leakages. Even when a transistor is in the OFF state, a weak inversion current still exists. This is known as the subthreshold leakage. It is exponentially dependent on the threshold voltage and thus, with technology downscaling as the feature size decreases and the supply voltage is scaled down, the threshold voltage also scales down. This results in higher subthreshold current. The subthreshold leakage current is given by: Isub = ?0CoxWL Vt2e( Vgs?Vth nVt )(1?e ?Vds Vt ) where ?0 is effective mobility, Cox is gate oxide capacitance per unit area, L is the channel length, W is the gate width, Vgs and Vds are the gate to source and drain to source volt- ages, respectively, Vth is the threshold voltage, Vt = kTq is the thermal voltage and n is a technology parameter. The gate leakage is the oxide tunneling current due to the low oxide thickness and higher electric field. The tunneling current becomes important for technologies where the 10 off Diode LeakageSub?threshold leakage Gate Tunneling leakage on Vdd Vdd Gnd Gnd Figure 2.3: Leakage current components for an inverter. gate oxide thickness is less than 20?A [97]. This current exists always irrespective of the transistor state. The leakage power is highly pattern dependent [24] and is dissipated as long as the supply voltage is on. The input vectors applied to the circuit determine the states of the transistors in the steady state condition. Temperature is another important factor. Most devices operate at temperatures higher than the normal room temperature. At these temperatures the leakage power increases substantially, which can be attributed to the increased thermal voltage Vt and the decrease in threshold voltage [50]. 2.3 Existing Power Estimation Techniques Power estimation is an important process of determining the average or maximum power consumed by a design, as opposed to instantaneous power which is regarded as a voltage drop problem [64]. Without the relevant information one may have to redesign a circuit if it is found to consume more power than expected. In this section, we will discuss the prevalent power estimation techniques. Typically, power estimation techniques 11 can be broadly classified as dynamic and static [73]. Dynamic methods simulate designs with specific input vector sets and estimate power. Though these techniques are accurate, they are highly time consuming. Static techniques use analytical methods to speed the estimation process at the cost of reduced accuracy. 2.3.1 Simulation based techniques The most accurate and straightforward approach to power estimation is by circuit simulation using a set of input vectors. The transitions that occur can be easily observed for a gate and can be averaged out for the set of input vectors to give an average power estimate. The advantage of this technique is its accuracy and the fact that it can be used irrespective of circuit, technology or design style. However, it is highly pattern dependent [46, 100], and hence suffers from two major drawbacks. First, it requires extensive use of computing time, especially for large circuits. Second, a designer may not know the set of input patterns when the power for a particular block, embedded in a large system, is to be calculated. Thus, the power calculated may be erroneous as some of the input patterns used for estimation may never occur during normal operation. Circuit simulators like SPICE [62] use large matrix solutions of Kirchoff?s current law (KCL) equations to determine the nodal currents at the transistor level. Basic components like resistors, capacitors, inductors, current sources, voltage sources and higher level device models of diodes and transistors are used to accurately estimate the current, voltage drop and even the non-linear capacitances present at transistor nodes. From this information highly accurate power analysis is possible. The complex device modeling for higher accuracy increases the computation complexity to such a level that SPICE is no longer a feasible 12 option for larger circuits. To improve on the speed, another transistor level simulator called PowerMill [44] uses piecewise linear transistor modeling to capture the transistor characteristics in lookup tables. It also uses an event driven timing algorithm so as to obtain speeds comparable to logic simulators but with the difference that it does not consider logic transitions but instead changes in node voltages. The use of lookup tables leads to inaccuracies but results in a speed up of 2 to 3 times compared to SPICE. Switch level simulators like MOSSIM [14, 17, 38] and IRSIM [81] view transistors as bidirectional switches and circuit nodes as charge storage nodes. When a transistor is in an ON state, the switch closes creating a conduction path between the drain and source nodes of a transistor. In this model, simulation can be performed with an approximate RC calculation, thus making it faster than the normal transistor level analysis. Switch level simulators can be extended for power analysis by calculating the approximate switching capacitance for dynamic power estimation [89]. Though other components like leakage and short circuit power can be estimated, these are not very accurate compared to transistor level analysis. For example, short circuit power must be accounted for by examining the time in which the switches form a path from power to ground. A switch level simulator does not accurately model timing. Besides, the modeling does not consider the output load capacitance which leads to further inaccuracies. Gate level simulation involves the use of logic components like NAND/NOR gates, latches, flip flops and interconnection nets. The most common analysis method involves an event driven simulation [17]. When a transition or event occurs at an input of a gate, it may trigger an output event after a certain time delay. Power consumption is estimated by calculating the switching capacitance at the node of the gate, and by the number of 13 events that occur at that node. However, in this type of analysis, each gate is modeled as a black box and the internal structure is not considered. Thus short circuit current and other internal capacitances are ignored [11]. Cell based power estimation techniques follow similar methods [10] in which libraries of cells are characterized by electrical (SPICE- level) simulation for all possible input combinations and fan-in/fan-out possibilities. Logic simulation uses this information to estimate power. The extent of the accuracy depends on the type of macro models followed and the accuracy of capacitances provided at the cell level [33]. 2.3.2 Probabilistic techniques Probabilistic methods involve modeling transitions occurring at a gate as probabil- ity functions. The probabilities of the nodes to change their logic state are propagated through the circuit. Since probabilities are used, no input vectors are required, resulting in a reduction in computational effort. Thus, these techniques are considered as pattern independent [15]. By deriving the switching activity as probabilistic measures, coupled with the node capacitances, one can estimate dynamic power. Issues like signal indepen- dence, spatial and temporal signal correlations determine the accuracy and complexity of the technique [83]. Also, most of the techniques focus on dynamic power, disregarding other components which can become significant at submicron and nanoscale technologies. In an early work of probabilistic estimation [27], zero delay model was assumed for all gates. Signal probabilities (denoted as Ps) were propagated through a circuit using simple probability expressions. For example, a two input AND gate with y = AND(x1,x2) has an output signal probability Ps(y) = Ps(x1)Ps(x2), assuming the input events x1 and x2 14 are independent. The transition probability, that is, the probability of signal transition (denoted as Pt) is calculated under the assumption of temporal independence as Pt(y) = 2Ps(y)(1?Ps(y)). The disadvantages of this method are, first, the use of a zero delay model neglects glitches, and second, both temporal and spatial independences introduce further inaccuracies. Probability waveforms were used insimulatorslikeCREST and other techniques [15,66, 67]. A signal probability waveform gives the probabilities for signal values and transitions during a vector period. It is a sequence of signal probability values for intervals between the instants of rising and falling transitions. The probability waveforms are propagated through the circuit. The propagation algorithm is similar to event driven simulation with one difference that instead of propagating exact transitions, the probability of a transition is propagated. Here spatial correlations have not been considered. These have been addressed in works describing tagged probabilistic simulation (TPS) [29, 90]. Hu and Agrawal [41, 42] developed a new glitch filtering method using dual transition probabilities and they observed that by using these techniques along with tagged probabilistic simulation more consistent power estimation was obtained. A useful concept is transition density proposed by Najm [63]. Transition density is defined as the average number of transitions for a particular node in one clock period. The propagation of transition density functions is done by Boolean difference [17]. If y is a Boolean function that depends on x, then the Boolean difference is expressed as, ?y ?x ?= y| x=1 ?y|x=0 15 where ? denotes exclusive-OR function. If the inputs to a Boolean module, xi, are assumed to be spatially independent, then the density of its output y is given by, D(y) = nsummationdisplay i=1 P(?y?x)D(xi) where transition density is denoted as D. However, the propagation of transition density here assumes that no two transitions occur at the same time. Also, the calculations of Boolean difference get more complex for signals nearer to primary outputs (POs) of a circuit. The above techniques so far have used only deterministic delay models, meaning that the transition probability waveforms consider the probabilities of transitions occurring at discrete time intervals based on fixed gate delays. With technology scaling, delay variations and uncertainties have become important considerations. In a recent work [32], authors pro- posed an algorithm of propagating transition probability waveforms considering uncertainty in delays. The probability waveforms described as continuous functions of time provided more accurate results as compared to fixed delay techniques. 2.3.3 Statistical methods Statistical techniques have been explored where randomly generated input patterns are applied to the circuit and monitored using a simulator until a desired accuracy of power estimation is achieved. The method is known as Monte Carlo analysis because the circuit is simulated repeatedly using a logic simulator, monitoring the power until it converges close to an average. To decide on how to select the input patterns so that the power converges, various statistical mean estimation techniques are employed. McPOWER [16] is the first known Monte Carlo power estimator. It is a variable delay simulator, which 16 measures power from random sets of input vectors and from the resulting data determines the average. The process is continued until the mean is considered stable for a large set of patterns. One can see that this method is time consuming but gives better accuracy as compared to probabilistic methods. Xakellis and Najm [99] improved upon McPOWER by considering transition densities tocomeupwith mean estimator of density(MED).Theystatisticallyestimatetheindividual node transition densities. It would be inefficient to let statistical estimates converge for all gates, especially as it would take a large number of vectors to converge for those nodes that switch infrequently. As a result, the authors classified nodes as low density and regular density nodes, based on a threshold value. A major advantage of this technique is that the error tolerance levels can be specified by users upfront and nodes with least tolerance can also be identified. The method was improved upon [70] by using different variable errors for different nodes, unlike the use of a constant user - specified estimation error. This way variable error rates are determined for nodes with higher switching activity in such a way as to estimate them more accurately. Other works of statistical nature include estimating power under uncertainties in input vector specifications. Since power is highly dependent on vector patterns applied, any uncertainties in their specifications can make the estimation process difficult. This problem has been addressed by considering average power as a range (minimum average power, maximum average power) and statistically estimating the sensitiveness of average power dissipation with respect to the uncertain nature of input vectors [26]. 17 2.3.4 Other work We have so far discussed notable techniques for power estimation. Other works include those which concentrate on specific power components. PowerPlay [49] is a fast algorithm to speed up the estimation of dynamic power. It is essentially a logic simulator that calculates the instantaneous power waveform at a gate (or cell) level. In other works [60, 79], leakage power estimation is emphasized where, based on the characterization of the device state, both pattern dependent and independent techniques are employed. Some works include analytical modeling of leakage current [61], considering variations in process parameters like doping profile, flat band voltage and supply voltage. In [25], the authors discuss accurate transistor stack modeling so as to improve efficiency by reducing the dependence of the device state on all possible input combinations. Significant work has also been done on developing accurate short circuit models [8, 39, 93, 96]. These have been extended to logic gates [94] so as to improve the accuracy of estimation for gates other than standard inverters. 2.4 Process Variation and Power Estimation Process variations are defined as the variations in the semiconductor fabrication process causing changes in threshold voltage, oxide thickness, channel length, interconnect wire width, thickness, etc. Process variations are divided into inter-die and intra-die variations. Inter-die variations are the variations among dies on the same wafer (or wafer lot), while the parameters remain constant within the same die. Intra-die variations, however, are differences within the same die, that is, devices within the same die have variations in certain parameters. It has been observed that intra-die variations have spatial correlations. This 18 means that those devices that are in close proximity within a die have a higher probability of being alike than devices that are physically far apart. In the present research, we have considered only process variations in gate delays, and their effect on dynamic power consumption. In reality, both inter-die and intra-die variations can effect load capacitances. However, since there can be increase on some nodes and decrease on others, this source of variation on an average may not lead to an increase in power dissipation in an optimized circuit. In the case of gate delays, an increase/decrease in delays can cause unbalanced paths or change the inertial delay properties of gates. This on the whole can change the switching activity of a circuit, leading to changes in total power dissipation. 2.4.1 Delay variation Circuit delay is particularly sensitive to process variations because it is dependent on a number of other variation-sensitive parameters. Variation in delay also adversely changes the dynamic power dissipation, because depending on the variation and topology of the circuit, it may or may not increase the number of transitions. That is why its effect is of utmost importance in power estimation. A simple model of gate delay can be given as ([21], p. 89), Td = CL ?VddI = CL ?Vdd?Cox 2 W L (Vdd ?Vt)2 where CL is the node capacitance, Vdd is the supply voltage, Vt is the threshold voltage, Cox = ?oxTox is the gate oxide capacitance per unit area and Td is the gate delay. From 19 this equation it can be seen that, even for a small change in the dependent variables, the combined effect of all the variations can cause a significant change in delay. The effect of temperature and variation in supply voltage can also change the delay. In this work, we only consider process variations and neglect other effects. 2.4.2 Existing work Effects of process variation on power has been discussed [22, 28, 61, 85, 86, 87] mainly in the analysis, optimization and reduction of leakage power. Another way to deal with process variation is to use a Monte Carlo approach [95]. Here we take the variations between individual dies into account and consider the delay to be a random variable. We then run Monte Carlo simulations to get a high quality simulated maximum power sample from which by statistical techniques we estimate the exact and mean maximum power. Due to the number of simulations required to have a reasonably accurate estimate, this technique is quite time consuming. This was also used in a system level power analysis methodology as discussed in [20]. Here the authors used a number of techniques including Monte Carlo sampling and power state (idle, sleep, active) leakage modeling while considering process variations to estimate system level power. Process variation has been considered in logic-level simulation, critical path delay and timing analysis through delay modeling of gates. Both statistical and bounded delay models have been used [7, 52, 87]. In the bounded delay model, each gate is assigned the lower and upper bounds for delays, also called the min-max delay specification [12, 13, 18, 19, 37, 54, 80, 82]. In recent work [12, 13, 34, 35] the correlations at the inputs of reconvergent gates were considered to improve the accuracy of bounded gate delay fault simulation. By 20 ignoring the effect of reconvergent fanouts, the results were pessimistic and the authors were able to improve the quality of the gate delay tests by considering the correlations. 2.5 Summary In this chapter, we have discussed various existing techniques for power estimation. In general, they have been classified as simulation based and static (non-simulation) ap- proaches. It is to be noted that, in most of the above techniques, focus has been on estimating total or average power consumption. There have been other notable works fo- cusing on a particular power components. However, it can be seen, due to the size of the problem, that there are few works that estimate and separate each component with notable accuracy and efficiency. We also discussed the effect of process variation and the effect of delay uncertainties on power estimation. We have also given a brief description of some existing approaches in this area and the current techniques that use the bounded gate delay approaches. 21 Chapter 3 Power Estimation 3.1 Introduction In the previous chapter, we discussed existing and current power estimation techniques. As observed there, the major hurdles in power estimation are the conflicting factors of ac- curacy and efficiency. Simulation based techniques, though accurate, are time and memory consuming, while probabilistic and statistical approaches reduce the accuracy. It is clear that a lot of work has been done in solving this problem. Most of the tools concentrate on estimating average power efficiently, while few concentrate on specific components. In the following sections, we aim to provide a tool that can accurately and efficiently separate each power component. The motivation is to have a single tool that can do simulation and power analysis, with emphasis on getting as much information as possible about each power component. In the subsequent sections, we will discuss the important methods and techniques we used to achieve this. 3.2 Event Driven Logic Simulation The first step, before any power analysis method, is to simulate the concerned circuit so as to get as much information about the circuit activity and its response to the given input vectors. This can be easily done by simple logic simulation methods to give us accurate switching activity information. However one may argue that for an accurate average power estimation by simulation methods, the power estimation tool would require a representative set of input vectors that will be too large for big circuits. The time and memory consuming 22 nature of this method has resulted in alternative methods of probabilistic and statistical nature which improve speed at the cost of reduced accuracy. This may be suitable for dy- namic power estimation, but since leakage power is also input specific, the overall reduction in accuracy is significant. Thus, even though it may be time consuming, most tools in the industry do simulation of functional vectors and then dump the waveforms or switching activity information for power analysis. For simulation of input vectors, we have used the most efficient method of event driven simulation. This type of simulation is at the gate level and is considered a standard tech- nique for logic simulation of circuits. In event driven simulation, events that occur at the gate outputs are scheduled to be processed in the order of their temporal occurrence. When an event occurs at time t at the output of gate it may or may not cause events at its fan out gates. If it does cause an event at a fan out gate, say at a time t + ?, then this event is scheduled to be processed at that time. This way, only those gates which have output events are processed at their respective time occurrences. It can be seen that event driven simulation is dependent on the gate delays, as they determine when the next events are scheduled. There are two types of delays that are processed during standard logic simula- tion, namely transport delays and inertial delays. The transport delay model just delays the change in the output by the time specified by the delay modeled. This is useful for cases of buses which may not absorb short pulses. However real gates have an inertia property which prevents any transitions with pulse width less than the gate delay from propagating. In other words a gate has an inertia to these kind of signal changes and this is modeled accurately by the inertial delay model. Here rather than schedule an event to occur during the next event processing, it is scheduled to occur after applying the delay to the current 23 time. Thus the simulator must maintain the current time value. When no more events exist to be processed at the current time value, time is updated to the time of the next earliest event and all events scheduled for that time will be processed. We have followed the inertial delay model for gates in our logic simulation, with delays being characterized with a wire load model which uses the fan out information of the gate. To improve the memory usage, we have implemented a circular stack for event driven simulation. This way events being processed for future time intervals will be pushed to the beginning of the stack, overwriting already processed events. In this way, the need for a huge memory stack is avoided, especially for larger circuits. The analysis is stopped when all events on the stack have been processed indicating the circuit is at a steady state. 3.3 Glitch Filtering The event driven simulation will simulate all events that will occur at the output of a gate. This means that if a gate has glitches or extra unnecessary transitions before it reaches its final logic state, they will be processed as events in order of their time occurrence. Since we consider the inertial delay properties of a gate, we filter out certain glitches that have too short a pulse width as compared to the gate delay [4]. If this is not addressed one may overestimate the switching activity and thus give the wrong information for power analysis. In our tool we have implemented a glitch filtering algorithm that addresses this and gives accurately the actual transitions that will occur at a gate output. 24 3.4 Dynamic Power Estimation Our approach is similar to commercial based cell or library based approaches. During logic simulation, if a gate output node has an event or transition, we dynamically SPICE characterize capacitance libraries for that particular cell. For accurate characterization we include the transistor sizing variations for different technologies, and also the region of op- eration the transistor is in. We have used both the gate oxide capacitances as well as the internal diffusion capacitances while estimating the overall node capacitance. Using this, for every transition we can estimate the energy spent in charging and discharging the ca- pacitance using the dynamic power dissipation equation, as mentioned in Section 2.2.1. We then average the power over the input vectors to get an average dynamic estimate. Our tool can accurately separate logic and glitch power, which are illustrated in our results. Though we have neglected the energy dissipated due to internal charging of parasitic capacitances, it can be seen from our results that overall contribution of this energy to the total power is not that prominent. 3.5 Static Power Dissipation Subthreshold leakage currents can be estimated from a number of BSIM models [88, 84]. It should be noted that the leakage dissipation is highly dependent on the input state the transistors are in. However, to estimate the leakage current for all possible input combinations for a gate will be time consuming. For a faster but accurate estimation process, we have implemented the transistor stacking model as followed in [24, 36]. Here when the transistors are in OFF states and in series, we estimate by SPICE characterization, the leakage current flowing. This is unavoidable as the transistors in series have different 25 voltages across their drain-source terminals, resulting in different currents (see the transistor stacking model [24, 36]). However if the transistors are in parallel, and in OFF states, all we need is to estimate the current for one equivalent transistor, and multiply by the number of parallel transistors. If any parallel transistor is in an ON state, then this will result in a short circuit, and the defining current is flowing through the ON transistor thus bypassing any leakage current flow in the OFF transistors. The average leakage estimation for a particular gate can be formulated from [51] as: Pleak = summationtext i (IDSqi ?VDSqi ?tinp) Tanalysis (3.1) where Pleak is the average leakage estimate for a gate, IDSqi is the quiescent leakage current for the ith transistor in OFF state, VDSqi is the drain source voltage across the transistor, tinp is the time that the gate is in that particular input state and Tanalysis is the total analysis period time. It should be noted that, as pointed out in [9, 51], since leakage estimation is vector dependent it may be time consuming for large circuits. In [51], the authors have come with a leakage regression model based on the number of gate cells obtained from previous experimental data of different smaller circuits. However, this is not feasible as one would have to repeat the acquisition of experimental data for different technologies. In our imple- mentation, for each new technology, leakage libraries are dynamically created. Once created for a particular gate, and for a particular input state, it need not be created again and is just read in when required. This is useful as the same cell may be used for different circuits and need not be created again. 26 3.6 Short Circuit Power Dissipation The estimation of short circuit dissipation has been a difficult problem. Most tools usually couple this with the dynamic dissipation and report the switching energy. Though this is correct, we aim to separate these two components, as the short circuit current flow has its own interesting characteristics that need to be studied. A number of short circuit models have been proposed and studied [8, 39, 93, 96]. An accurate model takes into account the effect of the loading capacitance as well as the input rise/fall times. The rise/fall time period determines the overall current flow while the loading capacitance determines how much of the short circuit current flows during the charging or discharging of the node capacitance. Thus a larger output load means a decrease in short circuit current. This is illustrated in Figure 3.1 which depicts a case of a rising input waveform applied to an inverter. The NMOS transistor will be in the saturation region while the PMOS transistor will be in the triode region during the time period t1 to t2. We can see that the peak short circuit current flows at the time the PMOS transistor moves from the triode region to the saturation region. By estimating this peak current, and calculating the area under the short circuit current curve we can get an estimate for the total short circuit flowing during this period. The estimation of the peak short circuit is calculated using a model described in [96]. Once again consider the time period t1 to t2 in Figure 3.1. The current discharging through the NMOS transistor when it is in the saturation region can be modeled approximately by the differential equation: ?CLV dd dvout dt = 0.5Kn(vin ?n) 2 (3.2) 27 Figure 3.1: Short-circuit scenario: Vi(t) is a rising waveform applied to the input of an inverter with Vo(t) the corresponding output waveform. Isc(t) is the short circuit current waveform that peaks at time t2 when the PMOS transistor goes from linear to saturation region. for vin ? n < vout and where CL is the load capacitance, Vdd is the supply voltage, vout is the normalized output voltage, Kn is the transconductance factor of the NMOS transistor, vin is the normalized input voltage and n is the normalized threshold voltage. Upon integration the output voltage with respect to time t can be found as [39]: vout = 1? VddKnt06C L ( tt 0 ?n)3 (3.3) where vin(t) = tt 0 (3.4) where t0 is the rise time of the input signal and t ? [0,t0]. It can be seen from the above equations that the important component of the current through the PMOS transistor has been neglected. This has been done for ease in solving the above equations. However this 28 creates inaccuracies in the estimation of short circuit current as the PMOS current is not negligible when compared to the current flowing through the NMOS transistor. This results in underestimating the output voltage and thus an overestimating the short circuit current. To correct this, a correction factor is used by scaling the results from equation 3.3 with actual SPICE simulations. The above equation is then rewritten as: vout = 1? 16(VddKnt0C L )?( tt 0 ?n)3 (3.5) where ? is the correction factor [96]. From our analysis the value of ? varies with input rise time t0, output load capacitance CL and the effect of the PMOS transistor on the short circuit current which is done by including the transconductance factor Kp. Thus ? can be modeled as: ? = ?0Kp +?1t0 +?2CL +?3 (3.6) where ?0,?1,?2,?3 are technology parameters than can be determined from SPICE simula- tions. We now estimate the peak current by finding the output voltage at time t2 (as seen in Figure 3.1) from the above modeled equations. For this purpose we first calculate the time t2 by replacing vout = vin ?p in equation 3.5, as that will be the output voltage just at the time PMOS begins to go to saturation. From the resulting third order polynomial we get one real solution for t2. Once we estimate the output voltage at time t2, using the drain current equations we can estimate the current flowing through the PMOS transistor and 29 then calculate the area under the short circuit current curve to get the total short circuit current flowing during this period. Thus we can then depict the short circuit energy calculated as: Escf = integraldisplay t3 t1 VddIsc(t)dt = (t3 ?t1)IscfmaxVdd2 (3.7) where Vdd is the supply voltage, Isc(t) is the short circuit flowing at time t, and Iscfmax is the peak short circuit current. For gates other than inverters, like NAND/NOR, they are converted into equivalent sized inverters and the short circuit current is estimated. Though this an approximately it gives reasonable results. This is understood to be a reasonable approach as parallel transistors can be modeled as equivalent sized transistors without any loss in accuracy. In case of series transistors, complexity increases due to multiple nodes and different regions of operation the transistors are in. However during short circuit estimation, since the series connected transistors are not considered during the charging or discharging phase, this parasitic behavior, as discussed [69], can be modeled with sufficient accuracy with equivalent sized transistor. In cases of more complex gates, they are broken down to simpler gates and and then modeled as equivalent sized inverters [48, 69] to estimate the short circuit current. 3.7 Clock Power and Flip Flop Cell Power The clock power consumption is considered one of the largest portions of power dissi- pation. Typically, the clock power accounts for 40% or more [23, 31] of the total processor 30 dissipation, rivaling only that those dissipated by memory structures. This may be ac- counted by the fact that the clock is distributed throughout the circuit and results in higher wiring capacitance, it affects the most blocks and thus faces a larger load and also by the fact that two transitions occur every clock cycle. At the gate level, estimating the total load the clock affects including the wiring capacitance, with the clock frequency will provide an apt clock power estimation. Another component that our implementation fo- cuses on is the power dissipated in the flip flop cells [65]. It should be noted that in a flip flop, even when the Q output does not change, power is dissipated due to internal switching within the cell. This switching continues when the D input changes or just the clock keeps switching. We estimate the clock power by calculating the total capacitance that is seen by the clock and then calculate the energy spent in charging and discharging this capacitance per clock cycle. Similarily for flip flop cell power, we characterize the energy spent during the internal switching within the flip flops. 3.8 Saving Flip Flop Cell Power From our observations, a significant amount of power is consumed in a sequential circuit as clock power and in the flip flop cells as described in the previous section. One way to reduce these components will be to not clock the flip flops if the D input and Q output have the same value. This can be done through a simple clock gating approach in which an XOR-AND gate is added to the D flip flop circuitry as shown in Figure 3.2 [71, 101]. We conducted experimental simulation of ISCAS?89 benchmark circuit s5378, which has 179 flip flops. The aim was to compare and observe the effect of clock gating on the various power dissipation components. The results are shown in Table 3.1 where we compare 31 Figure 3.2: (a) A standard D flip flop and (b) D flip flop with clock gating. the power dissipated by the circuit with normal D - flip flops (Figure 3.2 (a)) and that with D - flip flops with clock gating (Figure 3.2 (b)). In Table 3.1, they are shown as s5378 and s5378 clk, respectively. The power results are for 1000 random vectors and a clock period of 50 ns was used. These results are for TSMC025 technology. It is assumed that after the circuit is in steady state, the next vector is applied to the inputs. As can be seen from the results, because there is no unnecessary clocking when the inputs and outputs of the flip flops are identical, there is a significant decrease (from 751.6?W to 32.5?W) in the flip flop cell power dissipation. We observe that there is a slight increase in dynamic as well as in leakage power components which can be attributed to the overhead of extra gates. This increase in the combinational power is attributed to the fact that for each flip flop input change, there may be a maximum of 6 transitions within the XOR-AND gate combination. On balance, from the data in the last column of the table, we see a 71.5% reduction in power dissipation. 32 Table 3.1: Power dissipation results for ISCAS?89 benchmark circuit s5378. Circuit No. of Logic Glitch Dynamic Short Ckt. Leakage Clock Flip Flop Total name gates power power power power power power power power (?W) (?W) (?W) (?W) (?W) (?W) (?W) (?W) s5378 2958 77.9 17.46 95.40 14.09 0.1291 220.26 751.60 1081.47 s5378 clk 3316 79.23 54.22 133.46 23.06 0.1329 118.88 32.50 308.02 3.9 Test Power In this section, we will examine the dissipation of power during the scan testing of a circuit, which is a popular method for testing of sequential circuits [17]. This subject has received much attention in recent years [68]. Test power has two components, shift power and capture power. The shift power is the power consumed during the shifting of vectors through scan cells. Since the outputs of the scan cells are fed to the combinational circuit, for each vector shift, the inputs to the combinational logic change resulting in a large number of transitions during the test vector shifting process. Capture power is defined as power consumed when the normal model capture clock is applied so as to read the output response for a particular test vector that was scanned in. In case of scan cells, a major power component is the unnecessary combinational tran- sitions that occur during the test vectors shifts. This can be reduced by gating the outputs of the scan cells with the scan enable signal as shown in Figure 3.3. When the SE = 0, the circuit is in test mode and shifts the test vector sequence through the scan cells. Thus the outputs of the scan cells with respect the combinational logic is insignificant, and can be gated so as to avoid any unnecessary transitions. Additionally, we can also use the clock gating approach discussed in the previous section, to reduce the clock and flip flop cell power dissipation during test shifts (Figure 3.4). 33 Figure 3.3: (a) A standard scan cell (SFF) and (b) A scan cell with Q output gated with a scan enable signal (SE) (SE = 1 is normal mode). Figure 3.4: A scan cell with output Q gated with a scan enable Signal (SE) (SE = 1 is normal mode) and with clock gating. 34 Table 3.2: Power dissipation in ISCAS?89 benchmark circuit s5378 with scan cells of various types when the circuit is operated in normal mode. Circuit No. of Logic Glitch Dynamic Short Ckt. Leakage Clock Flip Flop Total name gates power power power power power power power power (?W) (?W) (?W) (?W) (?W) (?W) (?W) (?W) s5378 sff 3137 81.76 19.5 101.28 13.92 0.13 220.26 751.7 1087.29 s5378 sff g 3317 85.1 19.8 104.9 14.95 0.132 220.26 751.7 1091.94 s5378 sff g clk 3675 89.9 56.8 146.7 23.85 0.136 118.8 33.2 322.65 Table 3.3: Power dissipation in ISCAS?89 benchmark circuit s5378 with scan cells of various types when the circuit is being tested. Circuit No. of Logic Glitch Dynamic Short Ckt. Leakage Clock Flip Flop Total name gates power power power power power power power power (?W) (?W) (?W) (?W) (?W) (?W) (?W) (?W) s5378 sff 3137 356.82 60.37 417.19 26.22 0.1459 220.26 848.53 1512.35 s5378 sff g 3317 93.53 33.63 127.16 7.74 0.1504 220.26 850.69 1206.0 s5378 sff g clk 3675 146.78 241.89 388.67 61.9 0.1537 118.88 164.08 733.68 We have conducted experimental simulations on ISCAS?89 benchmark circuit s5378 with scan circuitry. We compare and observe the effect of the discussed low power scan cell design on the various power dissipation components. We explore the results for simulation in two modes (a) normal mode, Table 3.2 and (b) test mode, Table 3.3. The comparison is done between sequential circuit elements designed as a normal scan cell (Figure 3.3 (a)), a scan cell with gated Q output (Figure 3.3 (b)) and a Scan cell with gated Q output and with clock gating (Figure 3.4), respectively. In Tables 3.2 and 3.3, the three cases are shown as s5378 sff, s5378 sff g and s5378 sff g clk, respectively. From Table 3.2, it is seen that in the normal mode, the circuit behaves similar to that of Table 3.1. The simulation is for 1000 random vectors with a clock period of 50 ns but with scan enable signal fixed for normal mode (SE = 1). As expected, both dynamic and leakage power components increase due to the added logic gates, while the flip flop cell power decreases due to clock gating. The effect of the gated low power scan cell, is not seen as the circuit is in the normal mode. 35 The test power results given in Table 3.3. We used test patterns generated by an ATPG program. Full scan design was done using Mentor Graphics FastScan and test patterns with a fault coverage of 98.87% were obtained. The combinational power is seen to decrease in the shift mode with the power gated scan cell design. The addition of clock gating decreases the clock and flip flop cell power during test shifts. It can be seen from the results, that the combinational power increases for the clock gating approach due to the added dissipation in the clock gating logic. Thus, circuits with large sequential depth and shallow combinational logic, the increase in combinational power due to extra transitions in the clock gating circuitry can be significant. However, in our example, the flip flop design potentially saves about 50% power during test. 3.10 Summary In this chapter, we have discussed the techniques we have implemented for our gate level estimation tool. The tool is capable of separating and estimating the different power dissipation components. Some of the components we have focused our work on include dynamic,static,short circuit, clock and flip flop cell power and the significance of test power. In the coming chapters, our results will demonstrate that the techniques we have followed can accurately estimate the power dissipation while maintaining efficiency as compared to SPICE. The tool is also capable of separating and estimating the different power dissipation components. 36 Chapter 4 Power Estimation Results 4.1 Introduction In this chapter we discuss the power analysis results we have obtained using our gate level power estimator tool. We will first do a walk through of the various steps involved in our program setup for conducting our power estimation. We then move on to our actual results, citing our underlying assumptions and technological constraints. We compare our results against the HSPICE [88] standard to verify the tool?s accuracy and efficiency. 4.2 Experimental Procedure Our analysis is done at the gate level design. The algorithm was implemented through a C/C++ program with the following steps. ? We read in the circuit as a simple netlist in a format called rutmod format. This basically a gate level description of the circuit giving information such as primary inputs, primary outputs, gate number, gate type and fan-in list. We have used a flattened netlist in our analysis. ? We also read in a vector file which contains the input vectors to the circuit. For power analysis processing we decided to use randomly generated vectors, generated using a random number generator written in C programming language. We also read in a technology file that has the various technological parameters like threshold voltage, oxide thickness, overlap capacitances etc. The user will have to enter certain 37 constraints like voltage supply used, vector period, and the kind of delay model to be used which enables the user to have a certain degree of control on the power analysis procedure. ? The power estimation tool outputs the following useful information after its analysis of the circuit: 1. It reports circuit information like number of gates, total analysis period used and also the worst case delay for the particular series of vectors applied. 2. The average power dissipation, including the total power, dynamic power (logic power and glitch power), short circuit power and leakage power. If the circuit is sequential in nature it will report the clock power and the flip flop cell power. 3. It also gives other information like the maximum and minimum leakage causing vector, maximum and minimum power dissipation vector pair, the maximum number of glitches caused by a vector pair, etc. 4.3 Experimental Results We first compare our estimation tool against the SPICE standard to verify the accuracy. Table 4.1 shows the results of our power analysis tool run on an simple inverter connected to a NAND and NOR load. The power results are for the inverter only and not for the whole circuit. An input signal with a rise/fall time of 1 ns was applied to the inverter input with a total analysis period of 100 ns. The circuit was implemented in two technologies: TSMC025 technology with a voltage supply of 2.5V and Berkeley predictive 90 nm technology [3] with a voltage supply of 1.0 V. The SPICE simulator gave us information like the total switching 38 Table 4.1: Comparison with SPICE using an INVERTER with a NAND and NOR load. Techn. Input SPICE Simulation Our gate level estimator name Total Short ckt. Leakage Total Short ckt. Dynamic Short ckt. Leakage power current power power current power power power (?W) (?A) (pW) (?W) (?A) (?W) (?W) (pW) 250 0 - 1 1.08 17.10 24.01 1.031 16.32 0.6232 0.4077 22.65 nm 1 - 0 1.247 18.32 23.85 1.183 21.66 0.6424 0.5415 22.65 90 0 - 1 0.098 6.37 6180 0.099 7.33 0.0258 0.0733 5890 nm 1 - 0 0.0978 6.16 6190 0.1436 11.25 0.0311 0.1125 5890 Table 4.2: 1-BIT ADDER Simulation SPICE Simulation Our gate level estimator Total CPU s Total Dynamic Short circuit Leakage CPU s power (?W) power (pW) power (?W) power (?W) power (pW) 4.09 180 2.58 1.23 1.35 451 5.4 power, leakage power and short circuit current, which are shown in columns 3, 4 and 5. We compare them to our estimation (columns 6 through 10), where we have successfully separated the switching energy into the dynamic and short circuit components. The results show that we have sufficient accuracy as compared to SPICE. We have also plotted the output voltage waveforms of the inverter as shown in Figure 4.1. Our estimate follows closely the SPICE waveforms. This helps in making accurate estimate of the output rise/fall times which in turn is important for accurate short circuit power estimation. For further verification we ran a simulation on a 1-BIT adder circuit shown in Ta- ble 4.2. The 1-BIT adder was implemented in 0.25 micron technology. All possible input combinations were applied to the three inputs (A, B and Carryin), with a vector period of 100 ns and a rise time of 1 ns. The SPICE simulator took 180 ns while our estimator tool completed the simulation in 5.4 ns with reasonable accuracy. We conducted our power analysis running our tool on the ISCAS benchmark circuits. The results shown in Table 4.3 are for a vector set of 1000 random vectors. Each input 39 Figure 4.1: Output voltage waveforms for an inverter with a NAND and NOR load in (a) 0.25 micron (b) 90 nm technologies. vector represents a pulse width of 100 ns with a rise/fall time of 1ns. The CPU times in Table 4.3 are for a Sun Sparc Ultra 10 with 4GB shared memory system The tool can readily identify vectors that give the minimum, maximum and average power dissipation for each component. Figure 4.2 shows the progression of vectors for leakage power dissipation for the c880 benchmark circuit. Thus we are able to get a vector 40 Table 4.3: Average power dissipation for simulation of ISCAS Benchmark circuits using 1000 random vectors in 0.25 micron technology at a supply voltage of 2.5 volts. Circuit No. of Logic Glitch Dynamic Short Ckt. Leakage Total CPU s name gates power power power power power power (?W) (?W) (?W) (?W) (?W) (?W) c880 383 38.26 24.89 63.16 49.8 0.0149 112.99 195.84 c1355 546 70.39 37.1 107.49 81.75 0.0180 189.26 252.77 c1908 880 125.5 101.06 226.57 52.04 0.0285 278.64 570.7 c2670 1193 160.87 177.86 338.74 116.88 0.0477 455.68 1028.7 c3540 1669 198.01 250.77 448.78 125.83 0.0651 574.69 1347.6 c5315 2307 384.65 391.44 776.09 238.12 0.0950 1014.31 1921.1 c6288 2416 298.88 3841.54 4140.41 146.68 0.08277 4288.05 7564.8 c7552 3512 533.80 659.63 1193.44 230.32 0.133 1423.91 3047.5 Figure 4.2: Leakage power of c880 in 90nm technology for 1000 random vectors. set that can effectively give the least leakage possible for this circuit. This vector set is useful as it can be used to reduce leakage in circuits in standby mode. We conducted an experiment to study the short circuit power dissipation. We analyzed an inverter chain circuit with 6 inverters, with the last inverter being used as an output load. In the first case, we sized the inverters equally with the least size available for the 0.25 micron technology. In the second case, we increased the sizes by progressively doubling the sizes and in the third scenario we decreased the sizes in the same manner. The input vector signal 41 Figure 4.3: Effect of gate sizing on short circuit power dissipation. given was a 0?1?0 with a pulse width of 100 ns and a rise time of 1 ns .The effect of the rise or fall times was nullified as the increase or decrease in the resistance was compensated by the decrease or increase in the load capacitances, respectively. Thus the effect of the load capacitances became the determining factor for the short circuit dissipation. The results shown in Figure 4.3 are the average short circuit dissipation for each inverter for each of our sizing scenarios. As seen by our results, progressively decreasing the sizes gave the highest overall short circuit power which can be accounted for by the decreasing output load capacitances coupled with the high drivability of the inverters. From this analysis we can see that it may be possible and useful to optimize the gate sizes of gates in such a way as to decrease the overall short circuit power. 4.4 Summary In this chapter we have described our experimental results using our gate level power estimator tool. We have compared our results against the SPICE standard and found our 42 estimation to be of reasonable accuracy. We have obtained results for the ISCAS benchmark circuits and presented experimental analysis of few circuits to show the usefulness of such a tool. 43 Chapter 5 Bounded Delay and Dynamic Power Estimation 5.1 Introduction In this chapter, we discuss bounded delay principles and their usefulness in handling uncertainties in gate delay values due to process variation. We define and then discuss ambiguity regions, putting forward several theorems to determine them accurately. We give a novel technique for using the bounded delay information to estimate dynamic power under variation in delay values. 5.2 Background and Definitions To deal with the variability, significant advances have been made in logic simulation, timing analysis and delay testing areas with the use of the bounded delay [82] model. Both statistical and bounded delay models have been used [7, 52, 87]. In the bounded delay model, each gate is assigned the lower and upper bounds for delays, also called the min-max delay specification [12, 13, 18, 19, 37, 54, 80, 82]. To define and represent signals in bounded delay models, we use the term ambiguity region as a region of signal uncertainty where we cannot deterministically tell when the signal transitions within that region. The ambiguity region is depicted in Figure 5.1 and is described by: ? EA is the earliest arrival time for a signal. ? LS is the latest stabilization time for a signal. 44 (b) IV FV IV FV EA LSEA LS (a) Figure 5.1: Ambiguity regions of a rising signal (a) and a steady state signal (b) respectively. ? IV is the initial value of a signal. ? FV is the final value of a signal. In the following sections we will describe our theorems to accurately determine the ambiguity regions, that is, EA and LS values, from the bounded gate delays. 5.3 Signal Transition Analysis of Bounded Delay Gates In this work we aim to estimate the number of transitions that the output of a gate makes for each input vector pair. It is assumed that the steady-state values, which can be predetermined by zero-delay logic simulation, are known. In the next sections we will describe our techniques to statically determine the ambiguity intervals (transition periods) and the bounds on the number of transitions that the gate output would make. 5.3.1 Ambiguity intervals Our analysis follows the events that occur at gate outputs in a circuit after a change, typically on application of a new vector, occurs at primary inputs. We determine the ambiguity (transient) interval for the signal at the output of a gate from, 1. Bounded delays of the gate. 2. Steady-state signal values, that is, initial value (IV) and final value (FV). 45 LSsv= EAdv LSdv 88 LSsv LSdv EAsv EAsv 8 8 LSsv= LSdv=8 2, 4 EA LS EAdv=? EAdv=? EAdv LSsv 8 8 EAsv=? EAsv=? 8 LSdv= Figure 5.2: A four-input AND gate with delay bounds (2, 4). Shaded regions are ambiguity intervals. We will assume that the primary inputs change at deterministic times, synchronized with a clock. In general, however, input change ambiguities can be specified and treated in a similar way as described here. All times defined here are with reference to the start of the clock period. Because all gates are assumed to have delays within their respective specified (min, max) bounds, a typical gate?s inputs go through transient intervals before settling to some final values (FV). In the bounded delay formulation we do not precisely know how many transitions each signal makes. As explained earlier the ambiguity region is defined by the earliest arrival time of a signal (EA), and the latest stabilization time of a signal (LS). We derive certain variables from (EA,LS) which are dependent on (IV,FV) and on the particular type of gate it effects. For clarity, we will use the example of an AND gate with four inputs as shown in Figure 5.2. Borrowing from the literature [12], we define: ? EAdv is the earliest arrival time of a signal that causes the input of the gate to change from controlling value (e.g., 0 for AND gate) to non-controlling value (1 for AND gate). 46 ? LSdv is the latest stabilization time of an input signal changing from controlling value to non-controlling value. ? EAsv is the earliest arrival time of an input signal changing from a non - controlling value to a controlling value. ? LSsv is the latest stabilization time for an input signal changing from a non - con- trolling value to a controlling value. It should be noted that for EA = ? and LS = ??, the output is defined as having no ambiguity region or is in steady state condition. The following theorem determines the output ambiguity interval. Theorem 1: The ambiguity interval (EA,LS) for the output signal of a logic gate is determined from the ambiguity intervals of input signals, their pre-transition and post- transition steady-state values, and minimum and maximum gate delays, as follows. Con- sidering all inputs i of the gate, we define: E1 = maximum{EAdv(i)} (5.1) E2 = minimum{EAsv(i)} (5.2) L1 = minimum{LSsv(i)} (5.3) L2 = maximum{LSdv(i)} (5.4) EA? = maximum{E1,E2} (5.5) LS? = minimum{L1,L2} (5.6) 47 Then (EA,LS) = ?? ??? ???? ? ??? ???? ?? {EA? +mindel,LS? +maxdel} if (LS? ?EA?) ? maxdel {?,??} if (LS? ?EA?) < mindel where the inertial delay of the gate is bounded as (mindel, maxdel). In general, the output of a gate can have multiple ambiguity regions separated by deterministic signal values as we will demonstrate in the next section. In that case, each ambiguity region as well as each deterministic interval will be affected by the inertial filtering caused by mindel. For simplicity, Theorem 1 takes a pessimistic view by combining all possible ambiguity regions into one. Once the ambiguity interval (EA,LS) is determined according to Theorem 1, the steady-state values allow a straightforward conversion to the detailed signal specification. For the example of Figure 5.2, at the output of the AND gate, EA = EAdv and LS = LSsv. Also notice when IV takes a dominant (non-dominant) value, EAsv (EAdv) = ??. Similarly, when FV takes dominant (non-dominant) value, LSdv (LSsv) = ?. (EA,LS) = {?,??} means that the output pulse is completely suppressed due to gate inertia. 5.3.2 Multiple ambiguity regions In certain cases, multiple ambiguity regions may rise in outputs that are separated by regions of deterministic signal states. Unlike in timing scenarios where only the outside bounds of the ambiguity region are important, in case of power estimation, we need to 48 LS2+d2 EA1 LS1 LS2EA2 LS3EA3 (b) d1, d2 (a) EA1 LS1 LS2EA2 LS3EA3 LS3+d2d1, d2 EA1+d1 EA1+d1 LS3+d2 EA3+d1 Figure 5.3: A three-input AND gate depicting multiple ambiguity intervals. address this issue. Without considering the multiple regions within an ambiguity period, we may overestimate or underestimate the power. For this we follow a simple procedure. We first arrange all input (EA,LS) values, in order of their temporal occurrence. After we calculate the bounds, (EA?,LS?) from Theorem 1, we examine the EA and LS values of the inputs within these bounds. If any LS occurs immediately earlier than an EA value, then a multiple ambiguity region occurs and we propagate this value to the output, only if any two consecutive bound values are spaced at least the gate inertial delay apart. The example in Figure 5.3 depicts this. In (a), when we do not consider multiple intervals, we get a final output ambiguity region of (EA1+d1,LS3+d2). However in (b), we can see the difference. Since LS2 occurs earlier than EA3, clearly this would pass through the AND gate if the bound difference is greater than inertial delay d. Thus the corrected output ambiguity region will have multiple intervals of (EA1+d1,LS2+d2,EA3+d1,LS3+d2). The algorithm to implement this is given below: 1: for all gates (G) with multiambiguity inputs do 2: EAmin = EA(G)?mindel(G) 3: LSmax = LS(G)?maxdel(G) 49 4: for all gates (g) in fanin(G) do 5: if EA(g) negationslash= ? and LS(g) negationslash= ?? then 6: if (EA(g),LS(g)) > EAmin and (EA(g),LS(g)) < LSmax then 7: addtoBoundsList(EA(g),LS(g)) 8: end if 9: end if 10: end for 11: for all (EA,LS) values t(i) in addtoBoundsList do 12: sort t(i) in increasing order 13: end for 14: for all (EA,LS) values t(i) in addtoBoundsList do 15: if t(i) is of type ?LS? and t(i+1) is of type ?EA? then 16: if (t(i+1)?t(i)) ? maxdel(G) then 17: addtoGateBoundsList(t(i)+maxdel(G),t(i+1)+mindel(G)) 18: end if 19: end if 20: end for 21: end for 5.4 Problem Depiction A simple estimate for output transitions would be to take the sum of all fan-in tran- sitions, assuming they will all propagate through the gate. However, such a prediction is too pessimistic and we will show that this would give a very high bound for the maximum 50 1, 3 6 3 14 17 (b) (a) LSEA 5 10 128 7 141210 3 14 EA LS [0, 2] (mindel, maxdel) EA LS 5 17[0, 4] [mintran, maxtran] 2 Figure 5.4: Two-input AND gate transitions. power. Similarly using only the steady state values to predict the minimum power, i.e., zero transition if IV = FV and one transition if IV negationslash= FV, will give a very low bound on minimum power. Theoretically, though these bounds are correct, our aim is to get more precise bounds with a fast and efficient method. The two-input AND gate example in Figure 5.4 depicts the problem we aim to solve. In (a) the gate has a deterministic delay (2) and thus we know, for the given fan-in signals, that the output will have 4 transitions. However in (b), we have two fan-in ambiguity intervals which are (EA,LS) = (3,14) and (5,17), respectively. Also, the gate has delay bounds (1,3). Let us assume from the analysis of previous gates we know that minimum and maximum numbers (mintran, maxtran) of transitions for two fan-ins are, respectively, (0,2) and (0,4). With this information, we aim to obtain at the gate output a deterministic maximum bound for transitions maxtran and a minimum bound for transitions mintran. The next sections explain our techniques for determining these. 51 5.5 Maximum Number of Transitions Assuming that primary inputs are glitch-free, the number of transitions (0 or 1) there for a vector-pair is known. Using the algorithms of this and the next subsections iteratively, we can determine the bounds on the numbers of transitions for all signals. Consider a gate. Given data consists of ambiguity intervals and the minimum and maximum transitions, mintran and maxtran, respectively, for fan-ins. We will estimate the value of maxtran at the output. We consider two things: (1) cause - an output transition must be caused by an input transition, and (2) filtering - gateinertia can filter out transitions that are closer to each other than the gate delay. In the absence of detailed information, we assume that the transitions at a fan-in are evenly spaced within its ambiguity interval. Agrawal et al. [5] have derived two upper bounds for the number of events possible at the output of a gate. However, in their derivation, neither ambiguity regions nor the initial (IV) and final (FV) output values have been considered. We improve upon those bounds by considering these factors in the following theorem. Theorem 2: The maximum number of transitions is defined as the minimum of the two upper bounds: maxtran = minimum(Nd,N) (5.7) where Nd is the maximum number of transitions permitted by the gate inertial delay and N is the sum of all transitions present at the gate input. 52 Proof: We derive the two upper bounds for maximum transition and take the lower of those as a tighter upper bound. This analysis is an improvement over a previously reported result [5]. First upper bound (Nd): We calculate the maximum number of transitions that can be accommodated in the ambiguity interval given by the gate delay bounds and the (IV,FV) output values. We consider the filtering of glitches by gate inertia. Note that most transitions can be accommodated if they are evenly spaced over the output ambiguity interval with a spacing equal to or greater than the inertial delay. We consider the following cases: 1. If the output has a static hazard, then we allow an even number of transitions de- termined by tpd ? (2n ? 1) ? LS ? EA, and the number of transitions is given by 2n, where tpd is the gate delay given by minimum delay bounds, n is the number of hazards that can possibly be accommodated in the ambiguity interval, LS is the latest stabilization time and EA is the earliest arrival time for the output signal, as given by Theorem 1. 2. Similarly, for an output signal with a dynamic hazard we would get an odd number of transitions determined by tpd?2n ? LS ?EA, and the number of transitions is given by 2n?1. Second upper bound (N): We modify the sum (Nsum) of the input transitions as: N = Nsum?k (5.8) 53 8 (b) (a) [n1 = 6] [6] [4] LSEA [6 + 4 ? 2 = 8] EAdv EAdv [n2 = 4] LSsv LSsv LSdv = LSdv = EAsv =? 8 8 8EAsv = ? Figure 5.5: Effect of modification factor k on the second upper bound. where k = 0, 1, or 2 for a 2-input gate and is determined by the ambiguity regions and (IV, FV) values of inputs. The procedure is explained by the example of Figure 5.5. In (a), for the given input transitions n1 = 6 and n2 = 4, the output cannot have the total sum of input transitions. This is because, when we consider the sum, in the case of two signals going from a controlling value to a non-controlling value, only one of the two transitions should be counted. Thus. we see that a correction factor k is required, which would give us a maxtran = 8 for the example in (a). The example in (b) is a possible deterministic signal representation of the same. An application of the two upper bounds is shown in Figure 5.6. In (a) the gate is assumed to have zero delay. Thus from the above principles we get the maximum number of transitions from the first upper bound, Nd = ? and the second upper bound gives us N = 8. The minimum of the two gives us the final maxtran = 8 value. However when we consider the gate delay bounds (3,5) as in (b) we find that 8 transitions can never occur within the output ambiguity bounds of (6,23). Using the above principles we get Nd = 6 and N = 8. In fact, the maximum possible maxtran is only 6. 54 (a) 0, 0 5 8 10 15 15 18 LS 185 6 3, 5 (mindel, maxdel) (mindel, maxdel) EA 13 LSEA 113 6 7 EA 3 LS 131296 LS 11 18 EA 3 6 7 8 10 13 15 6 9 12 15 18 21 23 LSEALSEA (b) Figure 5.6: Filtering of transitions in a two-input AND gate. 5.6 Minimum Number of Transitions A rather pessimistic lower bound on minimum number of transitions, mintran, can be found from the steady-state values at the gate output. This bound is 0 or 1 depending upon whether the output values before and after transients, i.e., IV and FV, are same or different. This has been used as the condition for minimum glitch power design [5]. When there are split ambiguity regions, we can obtain a tighter lower bound. Theorem 3: The minimum number of transitions is higher of the two lower bounds: mintran = maximum(Ns,Ndet) (5.9) where Ns is the number of transitions required by the steady-state signal values and Ndet is that needed to produce deterministic signal values separating any non-overlapping ambi- guity intervals. Proof: First lower bound (Ns): This is obtained from the steady-state values, without considering any details of the ambiguity region. Logic changes 0?0, 1?1 need not have any 55 EAdv = ? EAdv LSsv LSdv LSsv = 8 EAsv 8 LSdv = d EA LS 8EAsv =? 8 Figure 5.7: Estimating lower bound on output transitions of a 2-input AND gate. transition and 0?1, 1?0 must have at least one transition. Provably, Ns = 1 if IV negationslash= FV, and Ns = 0 if IV = FV [5]. Second lower bound (Ndet): The number of definite transitions that can occur in the output ambiguity region is the number of deterministic signal changes that occur within the ambiguity region such that signal changes are spaced at time intervals greater than or equal to the inertial delay of the gate. The effect of the second lower bound can be seen in the example of Figure 5.7. There are at least two essential signal changes that must occur within the output ambiguity region. Thus, there will always be a hazard in the output as long as: (EAsv?LSdv) ? maxdel (5.10) where maxdel is the maximum delay of the gate producing the transient. In this case mintran is not zero, as given by the steady state first lower bound, but is 2. Detailed analysis of split ambiguity regions is possible with the help of transient output functions (TOF) [58] or timed Boolean functions (TBF) [53, 59]. In general, the value of Ndet can be higher than 2 depending on how many ambiguity and deterministic regions are produced and their widths with respect to the gate delay d. This can be easily estimated from the technique used to determine the multi ambiguity regions in subsection 5.3.2. 56 5.7 Dynamic Power Estimation The inputs to analysis are a gate-level combinational circuit netlist, in which each gate has two delay bounds (mindel, maxdel) and a node capacitance, and a set of vectors. Capac- itances may be extracted from layout or estimated using a wire-load model. Nominal delays for all gate types are precharacterized using SPICE-simulator, which also determines and saves the input-dependent leakage current data in a library. Manufacturing technology data on percentage process variation (?%) is used to determine the bounded delay specification, maxdel, mindel = nominal delay ??%. Dynamic power estimation is a three pass procedure for each input vector, performed in level-order for all gates: 1. The first pass is zero-delay logic simulation that determines the initial and final values, IV and FV, for all signals. 2. The second pass determines the earliest arrival (EA) and latest stabilization (LS) times according to Theorem 1 for all signals using the precalculated IV and FV, and the gate delay bounds. 3. The third pass determines the upper and lower bound, maxtran and mintran, for all gates according to Sections 5.5 and 5.6. For each primary input, we assume maxtran = mintran = 0, if the present and previous vectors have the same value, or = 1, if they have different values. 57 5.8 Summary In this chapter, we have discussed our bounded delay analysis method for estimating the dynamic power under uncertainties in gate delays. The uncertainties in delays can be caused due to process variations and other parameter changes. We discuss principles like ambiguity regions and put forward our theorems for estimating the delay bounds as well as the maximum and minimum transitions possible under delay uncertainties. In our results section we will show that our technique is far faster than traditional Monte Carlo simulations without any loss in accuracy. 58 Chapter 6 Bounded Delay Analysis Results 6.1 Introduction In this chapter we discuss the various experimental results we obtained using our bounded delay analysis algorithm for dynamic power estimation. We will first talk about the various steps involved in our program setup for conducting our experiments. We then move on to our actual results citing our underlying assumptions and technological constraints. 6.2 Experimental Procedure Our analysis is done at the gate level design. The algorithm was implemented through a C program with the following steps. ? We read in the circuit as a simple netlist in a format called rutmod (Rutgers modeling language) format as described in Section 4. ? We also read in a vector file which contains the input vectors to the circuit. A capacitance library file is used in which each gate number and the corresponding fan-out load capacitance is listed. This is pre-computed via SPICE characterization, which is specific to a technology file. Last we read in a delay file which has the gate number, the nominal delay (computed by a wire load delay model), the minimum bounded gate delay and the maximum bounded gate delay. ? The outputs of the program are two result files. The first contains the following: gate number, initial output value, final output value, minimum number of transitions 59 Table 6.1: Per vector energy consumption in picojoule in benchmark circuits for 1000 random vectors by Monte Carlo simulation of 1000 sample circuits and bounded delay analysis. Circuit Monte Carlo simulation Bounded delay analysis name picojoule per vector picojoule per vector Minimum Maximum Average CPU s Minimum Maximum Average CPU s c880 1.086 10.847 4.340 298.26 1.080 11.140 4.240 0.34 c1355 3.606 13.577 7.310 423.69 3.600 20.150 10.928 0.59 c1908 4.870 29.470 15.580 840.85 4.590 57.050 17.750 0.69 c2670 8.470 51.190 24.390 1452.24 8.390 59.010 23.200 1.09 c3540 6.036 66.660 30.770 1810.18 5.970 96.180 35.100 1.39 c5315 29.810 91.100 56.41 3435.53 23.030 113.200 55.610 2.14 c6288 45.360 194.860 129.700 20944.53 11.840 406.340 153.710 2.60 c7552 35.050 146.120 82.790 5834.87 29.470 196.310 82.180 3.34 and maximum number of transitions. The second gives the maximum, minimum and mean power consumption of the circuit for each vector pair. 6.3 Benchmark Results Our first results consist of the power analysis of ISCAS85 benchmark circuits for 1000 random vectors. The circuits were implemented using the TSMC025 2.5V CMOS library. Process variation can be modeled by assuming 15% intra-die and 5% inter-die variation [47]. For illustrative purposes, a standard size gate delay of 10ps and wire-load delay model were used to determine the nominal gate delays from which bounds (mindel, maxdel) were obtained by assuming a ?20% variation. It should be noted that any kind of variation modeled is suitable for our method. Node capacitances for the circuits as mentioned ealier were obtained from the SPICE modeling files. Table 6.1 gives two sets of data. The first set (columns 2-5) is from a Monte Carlo event-driven simulation. These results are for 1000 circuit samples. Each gate in a sample circuit was assigned a delay using a random number uniformly distributed in its (mindel, maxdel) range. From the simulation of 1000 random vectors, we have listed the energy 60 R 2 = 0.9511 0 1 2 3 4 5 6 7 8 9 10 0 5 10 15 MIN - MAX maximum power (mW) (a) Monte Carlo maximum power (mW) R 2 = 0.924 0 1 2 3 4 5 6 7 8 9 0 2 4 6 8 MIN - MAX minimum power (mW) (b) Monte Carlo minimum power (mW) Figure 6.1: Monte Carlo simulation versus bounded delay analysis for c880. Each point represents one vector-pair. One hundred sample circuits with nominal ?20% delay variation were simulated and for each vector-pair (a) maximum and (b) minimum power was determined. consumption in picojoule (pJ) for two vectors consuming the least and the most energy, and the average for all 1000 vectors. We observe that the bounded delay analysis always gives lower minimum energy and higher maximum energy. This is expected. As we simulate more vectors, the event-driven numbers drop on the minimum side and increase on the maximum side. Besides, to take the variability into account, the event-driven simulator will have to be used in a Monte Carlo experiment mode, which will take much more computing resources. 61 Figure 6.2: Monte Carlo simulation versus bounded delay analysis for c880. Regression graph for average power. The CPU times in Table 6.1 are simulation runs on a UNIX system using an Intel Duo Core processor with 2GB RAM. We next conducted a Monte Carlo analysis using the event-driven simulator. As de- scribed before, 100 samples were generated for c880. Each sample circuit was simulated by the event-driven simulator for the same set of 100 random vectors. For each vector-pair, we obtained the minimum, average and maximum power. The maximum and minimum numbers are shown in Figure 6.1 against the corresponding vector-pair results from the bounded delay analysis. Here the power was calculated using a vector application period of 1ns. R2 shown on the regression graphs is the coefficient of determination from Microsoft Excel, whose ideal fit value is 1.0. For a similar regression graph for average power shown in Figure 6.2, R2 = 0.9527, showing a reasonable accuracy. Note that the statistical dis- tribution in Figure 1.1 is skewed. Our average power is the simple arithmetic mean of the 62 8 2 4 6 8 10 50 40 30 20 10 0 0 2 4 6 8 10 50 40 30 20 10 0 0 60 70 60 70 FrequencyFrequency maxtran = maxtran = mintran = 0 4 delay bounds (11ps, 33ps) delay bounds (7ps, 12ps) Number of transitions Number of transitions (b)(a) 0 mintran = Figure 6.3: Transition statistics for high-activity gate 1407 in c2670 for a random vector-pair. Bounded delay analysis: (a) delay bounds (7ps, 12ps), mintran = 0, maxtran = 8, (b) delay bounds (11ps, 33ps), mintran = 0, maxtran = 4. Histograms were obtained by Monte Carlo simulation. minimum and maximum and assumes a symmetric distribution; an improvement may be possible. Figure 6.3 shows the transition statistics for gate 1407 in c2670. This is a high activity gate, which made up to 8 transitions on some vector-pairs. The left histogram (a) shows the number of transitions on this gate for one vector-pair applied to 100 sample circuits. The delay bounds of the gate were (7ps, 12ps). So, in each sample, its delay was randomly selected from this range. The transitions on the gate range between 0 and 8. Bounded delay analysis gave mintran = 0 and maxtran = 8. Leaving all other gate delays as before, when the delay bounds of 1407 were changed to (11ps, 33ps), the analysis computed mintran = 0 and maxtran = 4. The corresponding histogram from Monte Carlo simulation is shown in Figure 6.3(b). Figure 6.4 depicts the maximum power distribution obtained from simulation of two ISCAS ?85 benchmark circuits, c880 and c5315. It shows the comparative histograms of the Monte Carlo simulations (depicted by the light coloured bars) and our bounded delay 63 Figure 6.4: A comparison of the maximum power distribution for a vector-pair obtained by bounded delay analysis and Monte Carlo simulation for ISCAS ?85 benchmark circuits (a) c880 and (b) c5315. The maximum power values are for 1000 random vector pairs. The Monte Carlo simulation used 1000 circuit samples with random delays to find the maximum power for each vector pair. 64 analysis technique (shown by darker bars). The maximum power on the horizontal axis is the maximum power dissipation of a particular vector pair. The gray bars give the maximum value obtained by Monte Carlo simulation of 1000 delay samples with a delay variation of 20% for each of the 1000 vector pairs. As can be seen the bounded delay estimate closely follows Monte Carlo analysis. It should also be noted that the Monte Carlo maximum power distribution would be closer to the bounded delay result as we increase the number of delay samples. The average of maximum power tends to remain unchanged with increasing number of vector pairs. The peak maximum power also tends to converge for a large number of vector simulations. One can employ the statistical method of extreme order statistic to determine the peak maximum power from a random vector set as discussed in the literature [6, 72, 98]. Qiu et al. [72, 98] propose a technique through which they estimate maximum power with a confidence level of 90% for 5% error. They simulate only 2500 vector pairs. Here, a vector sample of size n is randomly simulated m times. The maximum power is obtained for each m simulation which tends to follow a generalized weibull distribution. The problem is equivalent to calculating the location parameter of the weibull distribution from random samples. The authors do so by using a maximum likelihood estimator which converges to a normal distribution. 6.4 Summary In this chapter we have described experimental results on the ISCAS benchmark circuits using our bounded delay analysis algorithm. We have verified both the accuracy and speed 65 of our approach. The new technique is shown to be far more efficient than the traditional Monte Carlo simulation and can serve as a useful alternative. 66 Chapter 7 Conclusion Low power design has become an important issue in solving the four-fold design prob- lem of area, performance, power and testability. Improved CAD systems and other low power estimation and optimization tools are necessary to give the designer adequate sup- port. From our own observations, a power estimation tool can give useful information on the impact of clock power consumption, short circuit dependencies and the effects of variations in power estimation. In this thesis we tried to study the effects of all these factors on power consumption, providing our own implementations for estimating the various components. We have successfully implemented a gate level power estimator tool that can separate differ- ent power dissipation components and provide the designer information about their effects on the total power consumption. The tool has been applied to combinational benchmark circuits and experimental results were validated against SPICE. We also developed an efficient power estimation method with consideration of process variations. Our present target is dynamic power. We have used the min-max (bounded) delay model and developed new algorithms to determine bounds on gate transitions. This analysis has a linear-time complexity in number of gates and is an efficient alternative to the Monte Carlo analysis. Presently, we can include leakage based on signal states that are obtained from inherent zero-delay simulation in this method. Our expectation for the future is to consider process variation in leakage as well. Besides, node capacitances that are considered fixed here can also have process-dependent variation. We hope to investigate that in the future. 67 Fordigital CMOScircuittechnologies, processvariationandleakagepower willcontinue to present significant design challenges. In low-power design, a combined optimization over multiple power components, such as leakage reduction by dual-threshold design and glitch reduction by path delay balancing, has been attempted [55, 56, 57]. Components of power are not independent and reduction of one component may affect the other. An analysis tool of the type discussed here is useful. Similarly, process variation can wipe out the benefits of a power optimization technique unless variation is considered in the design [40]. It is expected that the bounded delay methods discussed in the present research will be adopted in power optimization procedures. The analysis techniques discussed in the present work are simulation based. When a selected vector set defines the application domain, it can be used for vector-specific power optimization [43]. A simulation-based power analysis method is then a very effective eval- uation tool. In many cases, however, the defining vector set may be either too large or too difficult to find. In those cases, static or vector-less power analysis may be sufficient. Although vector independent approaches are, in general, less accurate they are significantly more efficient than the dynamic (simulation-based) analysis. The bounded delay power analysis algorithms of the present work can be modified for static analysis. 68 Bibliography [1] http://www.mosis.com/. [2] http://www.eecs.umich.edu/~jhayes/iscas.restore/benchmark.html. [3] http://www.eas.asu.edu/~ptm/. [4] V. D. Agrawal, ?Low-Power Design by Hazard Filtering,? in Proc. Tenth International Conf. on VLSI Design, Jan. 1997, pp. 193?197. [5] V. D. Agrawal, M. L. Bushnell, G. Parthasarathy, and R. Ramadoss, ?Digital Circuit De- sign for Minimum Transient Energy and a Linear Programming Method,? in Proc. Twelfth International Conference on VLSI Design, Jan. 1999, pp. 434?439. [6] V. Bartkute and L. Sakalauskas, ?Three Parameter Estimation of the Weibull Distribution by Order Statistics,? in C. H. Skiadas, editor, Recent Advances in Stochastic Modeling and Data Analysis, pp. 91?100, World Scientific, 2007. [7] J. W. Bierbauer, J. A. Eiseman, F. A. Fazal, and J. J. Kulikowski, ?System Simulation With MIDAS,? AT&T Tech. J., vol. 70, no. 1, pp. 36?51, Jan. 1991. [8] L. Bisdounis, S. Nikolaidis, and O. Loufopavlou, ?Propagation Delay and Short-Circuit Power Dissipation Modeling of the CMOS Inverter,? IEEE Trans. Circuits and Systems I: Funda- mental Theory and Applications, vol. 45, no. 3, pp. 259?270, Mar. 1998. [9] S. Bobba and I. N. Hajj, ?Maximum Leakage Power Estimation for CMOS Circuits,? in Proc. 15th International Conf. on VLSI Design and 7th Asia and South Pacific Design Automation Conf., Jan. 1999, pp. 116?124. [10] A. Bogiolo, L. Benini, and B. Ricc`o, ?Power Estimation of Cell-Based CMOS Circuits,? in Proc. Design Automation Conf., 1996, pp. 433?438. [11] A. Boliolo, L. Benini, G. de Micheli, and B. Ricco, ?Gate-Level Power and Current Simulation of CMOS Integrated Circuits,? IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 5, pp. 473?488, Dec. 1997. [12] S. Bose and V. D. Agrawal, ?Delay Test Quality Evaluation Using Bounded Gate Delays,? in Proc. 25th IEEE VLSI Test Symp., May 2007, pp. 23?28. [13] S. Bose, H. Grimes, and V. D. Agrawal, ?Delay Fault Simulation With Bounded Gate Delay Model,? in Proc. of International Test Conf., 2007, pp. 23?28. [14] R. E. Bryant, ?MOSSIM: A Switch-level Simulator for MOS VLSI,? in Proc. 18th Design Automation Conference, July 1981, pp. 786?790. [15] R. Burch, F. Najm, P. Yang, and D. Hocevar, ?Pattern-Independent Current Estimation for Reliability Analysis of CMOS Circuits,? in Proc. 25th ACM/IEEE Design Automation Conf., June 1988, pp. 294?299. [16] R. Burch, F. Najm, P. Yang, and T. Trick, ?McPOWER: A Monte Carlo Approach to Power Estimation,? Proc. IEEE/ACM International Conference on Computer-Aided Design, pp. 90? 97, Nov 1992. 69 [17] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Boston: Springer, 2000. [18] S. Chakraborty and D. L. Dill, ?More Accurate Polynomial-Time Min-Max Timing Simula- tion,? in Proc. Third International Symp. Advanced Research in Asynchronous Circuits and Systems, Apr. 1997, pp. 112?123. [19] S. Chakraborty, D. L. Dill, and K. Y. Yun, ?Min-Max Timing Analysis and an Application to Asynchronous Circuits,? Proc. of IEEE, vol. 87, no. 2, pp. 332?346, Feb. 1999. [20] S. Chandra, K. Lahiri, A. Raghunathan, and S. Dey, ?Considering Process Variations During System-level Power Analysis,? in Proc. International Symposium on Low Power Electronics and Design, 2006. [21] A. P. Chandrakasan and R. W. Brodersen, Low power Digital CMOS Design. Springer, 1995. [22] H. Chang and S. S. Sapatnekar, ?Full-Chip Analysis of Leakage Power Under Process Vari- ations, Including Spatial Correlations,? in Proc. 42nd Design Automation Conf., June 2005, pp. 523?526. [23] R. Y. Chen, N. Vijaykrishnan, and M. J. Irwin, ?Clock Power Issues in System-on-a-Chip Designs,? in Proceedings IEEE Computer Society Workshop, 1999, pp. 48?53. [24] Z. Chen, M. Johnson, L. Wei, and K. Roy, ?Estimation of Standby Leakage Power in CMOS circuits Considering Accurate Modeling of Transistor Stacks,? in Proceedings of the 1998 International symposium on Low Power Electronics and Design, Aug 1998, pp. 239?244. [25] Z. Chen, M. Johnson, L. Wei, and W. Roy, ?Estimation of Standby Leakage Power in CMOS Circuit Considering Accurate Modeling of Transistor Stacks,? Proc. International Symp. on Low Power Electronics and Design, pp. 239?244, Aug. 1998. [26] Z. Chen, K. Roy, and T.-L. Chou, ?Efficient Statistical Approach to Estimate Power Con- sidering Uncertain Properties of Primary Inputs,? IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 6, no. 3, pp. 484?492, Sep 1998. [27] M. A. Cirit, ?Estimating Dynamic Power Consumption of CMOS Circuits,? in Proceedings of IEEE International Conference on Computer-Aided Design, Nov. 1987, p. 534537. [28] A. Davoodi and A. Srivastava, ?Probabilistic Dual-Vth Leakage Optimization Under Vari- ability,? in Proc. International Symp. Low Power Electronics and Design, 2005, pp. 143?168. [29] C.-S. Ding, C.-Y. Tsui, and M. Pedram, ?Gate-Level Power Estimation Using Tagged Prob- abilistic Simulation ,? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 17, no. 11, pp. 1099?1107, Nov. 1998. [30] M. S. Elrabaa, I. S. Abu-Khater, and M. I. Elmasry, Advance Low-Power Digital Circuit Techniques. Boston: Springer, 1997. [31] J. Frenkil, ?A Multi-Level Approach to Low-Power IC Design,? IEEE Spectrum, vol. 35, no. 2, pp. 54?60, 1998. [32] S. Garg, S. Tata, and R. Arunachalam, ?Static Transition Probability Analysis Under Uncer- tainty,? in Proc. IEEE International Conference on Computer Design, 2004, pp. 380?386. [33] B. J. George, D. Gossain, S. C. Tyler, M. G. Wloka, and G. K. Yeap, ?Power Analysis and Characterization for Semi-Custom Design,? in Proc. Int. Workshop on Low Power Design, Apr. 1994, pp. 215?218. 70 [34] H. Grimes, ?Reconvergent Fanout Analysis of Bounded Gate Delay Faults,? Master?s thesis, Auburn University, Aug. 2008. Dept. of ECE. [35] H. Grimes and V. D. Agrawal, ?Analyzing Reconvergent Fanouts in Gate Delay Fault Simu- lation,? in Proc. 17th IEEE North Atlantic Test Workshop, May 2008, pp. 98?103. [36] R.X.GuandM.I.Elmasry, ?PowerDissipationAnalysisandOptimization ofDeepSubmicron Circuits,? IEEE Journal of Solid-State Circuits, pp. 707?713, May 1996. [37] S. Hassoun, ?Critical Path Analysis Using a Dynamically Bounded Delay Model,? in Proc. 37th Design Automation Conf., 2000, pp. 260?265. [38] J. Hayes, ?An Introduction to Switch-Level Modeling,? IEEE Design and Test of Computers, vol. 4, no. 4, pp. 18?25, 1987. [39] N. Hedenstierna and K. O. Jeppson, ?CMOS Circuit Speed and Buffer Optimization,? IEEE Transactions on CAD, vol. 6, no. 2, pp. 270?281, Mar. 1987. [40] F. Hu, Process-Variation-Resistant Dynamic Power Optimization for VLSI Circuits. PhD thesis, Auburn University, May 2006. Dept. of ECE. [41] F. Hu and V. D. Agrawal, ?Dual-Transition Glitch Filtering in Probabilistic Waveform Power Estimation,? in Proc. 15th IEEE Great Lakes Symp. on VLSI, Apr. 2005, pp. 357?360. [42] F. Hu and V. D. Agrawal, ?Enhanced Dual-Transition Probabilistic Power Estimation With Selective Supergate Analysis,? Proc. IEEE International Conference on Computer Design, pp. 366?369, Oct. 2005. [43] F. Hu and V. D. Agrawal, ?Input-Specific Dynamic Power Optimization for VLSI Circuits,? in Proc. Int. Symp. on Low Power Electronics and Design (ISLPED?06), Oct. 2006, pp. 232?237. [44] C. X. Huang, B. Zhang, A.-C. Deng, and B. Swirski, ?The Design and Implementation of PowerMill,? in Proc. Int. Workshop Low Power Design, Apr. 1995, pp. 105 ?110. [45] R. C. Jaeger and T. N. Blalock, Microelectronic Circuit Design. McGraw Hill, second edition, 1997. [46] S. M. Kang, ?Accurate Simulation of Power Dissipation in VLSI Circuits,? IEEE Journal of Solid-State Circuits, vol. 21, no. 5, pp. 889?891, Oct 1986. [47] S. P. Khatri, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, Cross - Talk Noise Immune VLSI Design Using Regular Layout Fabrics. Boston: Springer, 2001. [48] J.-T. Kong, S. Z. Hussain, and D. Overhauser, ?Performance Estimation of Complex MOS Gates,? IEEE Trans. Circuits and Systems I: Fundamental Theory and Applications, vol. 44, no. 9, pp. 785?795, Sep 1997. [49] T. H. Krodel, ?Power Play-Fast Dynamic Power Estimation Based on Logic Simulation ,? Proc. IEEE International Conference on Computer Design, pp. 96?100, Oct 1991. [50] J. C. Ku, M. Ghoneima, and Y. Ismail, ?The Importance of Including Thermal Effects in Es- timating the Effectiveness of Power Reduction Techniques,? in Proc. IEEE Custom Integrated Circuits Conference, Sept 2005, pp. 301?304. [51] R. Kumar and C. P. Ravikumar, ?Leakage Power Estimation for Deep Submicron Circuits in an ASIC Design Environment,? in Proc. IEEE Alessandro Volta Memorial Workshop, 2002, pp. 45?50. 71 [52] K. N. Lalgudi, D. Bhattacharya, and P. Agrawal, ?Architecture of a Min-Max Simulator on MARS,? in Proc. International Conf. VLSI Design, Jan. 1993, pp. 246?249. [53] W. K. C. Lam and R. K. Bryton, Timed Boolean Functions: A Unified Formalism for Exact Timing. Springer, 1994. [54] M. Linderman and M. Leeser, ?Simulation of Digital Circuits in the Presence of Uncertainty,? in Proc. of International Conf. Computer-Aided Design, 1994, pp. 248?251. [55] Y. Lu, Power and Performance Optimization of Static CMOS Circuits with Process Variation. PhD thesis, Auburn University, Aug. 2007. Dept. of ECE. [56] Y. Lu and V. D. Agrawal, ?Leakage and Dynamic Glitch Power Minimization Using Integer Linear Programming for Vth Assignment and Path Balancing,? in Proc. Power and Timing Modeling, Optimization and Simulation Workshop (PATMOS?05), Sept. 2005, pp. 217?226. [57] Y. Lu and V. D. Agrawal, ?CMOS Leakage and Glitch Minimization for Power-Performance Tradeoff,? Journal of Low Power Electronics, vol. 2, no. 3, pp. 378?387, Dec. 2006. [58] E. J. McCluskey, ?Transients in Combinational Logic Circuits,? in R. H. Wilcox and W. C. Mann, editors, Redundancy Techniques for Computing Systems, pp. 9?46, Washington, D.C.: Spartan Books, 1962. [59] P. C. McGeer and R. K. Brayton, Integrating Functional and Temporal Domains in Logic Design. Springer, 1991. [60] G. Merrett and B. M. Al-Hashimi, ?Leakage Power Analysis and Comparison of Deep Sub- micron Logic Gates,? in Proc. 14th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS), sep 2004. [61] S. Mukhopadhyay and K. Roy, ?Modeling and Estimation of Total Leakage Current in Nano- Scaled CMOS Devices Considering the Effect of Parameter Variation,? in Proc. International Symp. Low Power Electronics and Design, Aug. 2003, pp. 172?175. [62] L. W. Nagel, SPICE2, A Computer Program to Simulate Semiconductor Circuits. PhD thesis, University of California, Electronics Research Laboratory, Berkeley, California, May 1975. Dept. of EECS. [63] F. N. Najm, ?Transition Density: A New Measure of Activity in Digital Circuits,? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 12, no. 2, pp. 310?323, Feb. 1993. [64] F. N. Najm, ?A Survey of Power Estimation Techniques in VLSI Circuits,? IEEE Trans. VLSI Systems, vol. 2, no. 4, pp. 446?455, Dec. 1994. [65] F. N. Najm, ?Power Estimation Techniques for Integrated Circuits,? Proc. IEEE/ACM In- ternational Conf. on Computer-Aided Design, pp. 492?499, Nov 1995. [66] F. N. Najm, R. Burch, P. Yang, and I. N. Hajj, ?CREST - A Current Estimator for CMOS Circuits,? in Proceedings of IEEE International Conference on Computer-Aided Design, Nov. 1988, p. 204207. [67] F. N. Najm, R. Burch, P. Yang, and I. N. Hajj, ?Probabilistic Simulation for Reliability Anal- ysis of CMOS VLSI Circuits,? IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 9, no. 4, pp. 439?450, Apr. 1990. 72 [68] N. Nicolici and B. M. Al-Hashimi, Power-Constrained Testing of VLSI Circuits. Springer, 2003. [69] S. Nikolaidis and A. Chatzigeorgiou, ?Analytical Estimation of Propagation Delay and Short- Circuit Power Dissipation in CMOS Gates,? International Journal of Circuit Theory and Applications, vol. 27, pp. 39?2, 1999. [70] Y. Park and E. Park, ?Statistical Power Estimation of CMOS Logic Circuits with Variable Errors,? Electronics Letters, vol. 34, no. 11, pp. 1054?1056, May 1998. [71] C. Piguet, ?Circuit and Logic Level Design,? in W. Nebel and J. Mermet, editors, Low Power Design in Deep Submicron Electronics, pp. 105?133, Springer, 1997. [72] Q. Qiu, Q. Wu, and M. Pedram, ?Maximum power estimation using the limiting distributions of extreme order statistics,? in Proc. Design Automation Conference, June 1998, pp. 684?689. [73] R. Radjassamy and J. D. Carothers, ?Simulation-based Power Estimation for Low Power Eesigns: A Fractal Approach,? Simulation, vol. 72, no. 5, pp. 320?326, 1999. [74] T. Raja, ?A Reduced Constraint Set Linear Program for Low-Power Design of Digital Cir- cuits,? Master?s thesis, Rutgers University, Mar. 2002. Dept. of ECE. [75] T. Raja, Minimum Dynamic Power CMOS Design with Variable Input Delay Logic. PhD thesis, Rutgers University, May 2004. Dept. of ECE. [76] T. Raja, V. D. Agrawal, and M. L. Bushnell, ?Minimum Dynamic Power CMOS Circuit Design by a Reduced Constraint Set Linear Program,? in Proc. 16th International Conf. VLSI Design, Jan. 2003, pp. 527?532. [77] T. Raja, V. D. Agrawal, and M. L. Bushnell, ?CMOS Circuit Design for Minimum Dynamic Power and Highest Speed,? in Proc. 17th International Conf. VLSI Design, Jan. 2004, pp. 1035?1040. [78] T. Raja, V. D. Agrawal, and M. L. Bushnell, ?Variable Input Delay CMOS Logic Design for Low Dynamic Power Circuits,? in Proc. Power and Timing Modeling, Optimization and Simulation Workshop (PATMOS?05), Sept. 2005, pp. 436?445. [79] R. M. Rao, J. L. Burns, A. Devgan, and R. B. Brown, ?Efficient Techniques for Gate Leakage Estimation,? Proc. International Symp. on Low Power Electronics and Design, pp. 100?103, Aug. 2003. [80] S. Roy, P. P. Chakrabarti, and P. Dasgupta, ?Bounded Delay Timing Analysis Using Boolean Satisfiability,? in Proc. International Conf. VLSI Design, Jan. 2007, pp. 295?302. [81] A. Salz and M. A. Horowitz, ?IRSIM: An Incremental MOS Switch-Level Simulator,? in Proc. 26th Design Automation Conf., June 1989, p. 173178. [82] C. J. Seger, ?A Bounded Delay Race Model,? in Proc. of the IEEE International Conf. Com- puter Aided Design, Nov. 1989, pp. 130?133. [83] S. C. Seth and V. D. Agrawal, ?A New Model for Computation of Probabilistic Testability in Combinational Circuits,? INTEGRATION, The VLSI Journal, vol. 7, pp. 49?75, 1989. [84] J. Sheu, ?BSIM: Berkeley Short-Channel IGFET Model for MOS transistors,? IEEE Journal of Solid-State Circuits, vol. 22, no. 4, pp. 558?566, Aug 1987. 73 [85] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, ?Modeling and Analysis of Leakage Power Considering Within-Die Process Variations,? in Proc. International Symp. Low Power Electronics and Design, Aug. 2002, pp. 64?67. [86] A. Srivastava, D. Sylvester, and D. Blaauw, ?Statistical Optimization of Leakage Power Con- sidering Process Variations Using Dual-Vth and Sizing,? in Proc. 41st Design Automation Conf., June 2004, pp. 783?787. [87] A. Srivastava, D. Sylvester, and D. Blaauw, Statistical Analysis and Optimization for VLSI: Timing and Power. Boston: Springer, 2005. [88] Synopsys, HSPICE User?s Manual, w 2005.03 edition, 2005. [89] R. Tjarnstrom, ?Power Dissipation Estimate by Switch Level Simulation of CMOS Circuits,? Proc. IEEE International Symposium on Circuits and Systems, vol. 2, pp. 881?884, May 1989. [90] C.-Y. Tsui, M. Pedram, and A. Despain, ?Efficient Estimation of Dynamic Power Consump- tion Under a Real Delay Model,? Proc. IEEE/ACM International Conference on Computer- Aided Design, pp. 224?228, Nov. 1993. [91] S. Uppalapati, ?Low Power Design of Standard Cell Digital VLSI Circuits,? Master?s thesis, Rutgers University, Mar. 2004. Dept. of ECE. [92] S. Uppalapati, M. L. Bushnell, and V. D. Agrawal, ?Glitch-Free Design of Low Power ASICs Using Customized Resistive Feedthrough Cells,? in Proc. 9th VLSI Design & Test Symp. (VDAT?05), Aug. 2005, pp. 41?49. [93] H. J. M. Veendrick, ?Short-Circuit Dissipation of Static CMOS Circuitry and its Impact on the Design of Buffer Circuits,? IEEE Journal of Solid-State Circuits, vol. 19, no. 4, pp. 468?473, Aug. 1984. [94] S. R. Vemuru and N. Scheinberg, ?Short-Circuit Power Dissipation Estimation for CMOS Logic Gates,? IEEE Trans. on Circuits and Systems I: Fundamental Theory and Applications, vol. 41, no. 11, pp. 762?765, Nov. 1994. [95] C.-Y. Wang, T.-L. Chou, and K. Roy, ?Maximum Power Estimation for CMOS Circuits Under Arbitrary Delay Model,? in Proc. IEEE International Symp. Circuits and Systems, May 1996, pp. 763?766. [96] Q. Wang and S. B. K. Vrudhula, ?On Short Circuit Power Estimation of CMOS Inverters,? Proc. IEEE International Conference on Computer Design, pp. 70?75, Oct. 1998. [97] N. H. E. Weste and D. Harris, CMOS VLSI Design: A Circuits and System Perspective. Addison Wesley, third edition, 2004. [98] Q. Wu, Q. Qiu, and M. Pedram, ?Estimation of Peak Power Dissipation inVLSI Circuits Using the Limiting Distributions of Extreme Order Statistics,? IEEE transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 20, no. 8, p. 942. [99] M. G. Xakellis and F. N. Najm, ?Statistical Estimation of the Switching Activity in Digital Circuitry,? Proc. 31st Design Automation Conf., pp. 728?733, June 1994. [100] G. Y. Yacoub and W. H. Ku, ?An Accurate Simulation Technique for Short Circuit Power Dissipation Based on Current Component Isolation,? in Proc. of the International Symposium on Circuits and Systems, 1989, pp. 1157?1161. [101] G. K. Yeap, Practical Low Power Digital VLSI Design. Springer, 1998. 74