POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS
WITH PROCESS VARIATION
Except where reference is made to the work of others, the work described in this
dissertation is my own or was done in collaboration with my advisory committee.
This dissertation does not include proprietary or classified information.
Yuanlin Lu
Certificate of Approval:
Fa Foster Dai Vishwani D. Agrawal, Chair
Associate Professor James J. Danaher Professor
Electrical & Computer Engineering Electrical & Computer Engineering
Charles E. Stroud Joe F. Pittman
Professor Interim Dean
Electrical & Computer Engineering Graduate School
POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS
WITH PROCESS VARIATION
Yuanlin Lu
A Dissertation
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulfillment of the
Requirements for the
Degree of
Doctor of Philosophy
Auburn, Alabama
August 4, 2007
iii
POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS
WITH PROCESS VARIATION
Yuanlin Lu
Permission is granted to Auburn University to make copies of this dissertation at its
discretion, upon the request of individuals or institutions and at their expense.
The author reserves all publication rights.
Signature of Author
Date of Graduation
iv
VITA
Yuanlin Lu, daughter of Rongchang Lu and Afeng Kong, was born in Nanjing,
P. R. China. She attended Southeast University in 1995 and graduated with a
Bachelor of Engineering degree in Electronic Information Engineering in 1999. She
entered the Graduate School at Southeast University in 1999 and received the
Master of Science degree in Circuit and System in 2002. In January 2004, she joined
the Ph.D. program of the Department of Electrical and Computer Engineering,
Auburn University.
v
DISSERTATION ABSTRACT
POWER AND PERFORMANCE OPTIMIZATION OF STATIC CMOS CIRCUITS
WITH PROCESS VARIATION
Yuanlin Lu
Doctor of Philosophy, August 4, 2007
(M.S., Southeast University, 2002)
(B.S., Southeast University, 1999)
142 Typed Pages
Directed by Vishwani D. Agrawal
With the continuing trend of technology scaling, leakage power has become a main
contributor to power consumption. Dual threshold (dual-Vth) assignment has emerged as
an efficient technique for decreasing leakage power. In this work, a mixed integer linear
programming (MILP) technique simultaneously minimizes the leakage and glitch power
consumption of a static CMOS (Complementary Metal Oxide Semiconductor) circuit for
any specified input-to-output critical path delay. Using dual-threshold devices, the
number of high-threshold devices is maximized and a minimum number of delay
elements is inserted to reduce the differential path delays below the inertial delays of the
incident gates. The key features of the method are that the constraint set size for the
MILP model is linear in the circuit size and a power-performance tradeoff is allowed.
vi
Experimental results show 96%, 28% and 64% reductions of leakage power,
dynamic power and total power, respectively, for the benchmark circuit C7552
implemented in BPTM 70nm CMOS technology.
Due to the exponential relation between subthreshold current and process parameters,
such as the effective gate length, oxide thickness and doping concentration, process
variations can severely affect both power and timing yields of the designs obtained by the
MILP formulation. We propose a statistical mixed integer linear programming method
for dual-Vth design that minimizes the leakage power and circuit delay in a statistical
sense such that the impact of process variation on the respective yields is minimized.
Experimental results show that 30% more leakage power reduction can be achieved by
using a statistical approach when compared with the deterministic approach that has to
consider the worst case in the presence of process variations.
Compared to subthreshold leakage, dynamic power is less sensitive to the process
variation due to its linear dependency on the process parameters. However, the
deterministic techniques using path balancing to eliminate glitches, becomes ineffective
when process variation is considered. This is because the perfect hazard filtering
conditions can easily be destroyed even by a small variation in some process parameters.
We present a statistical MILP formulation to achieve a process-variation-resistant glitch-
free circuit. Experimental results on an example circuit prove the effectiveness of this
method.
vii
ACKNOWLEDGMENTS
I would like to express my appreciation and sincere thanks to my advisor, Dr.
Vishwani D. Agrawal, who guided and encouraged me throughout my studies. His advice
and research attitude have provided me with a model for my entire future career. I also
wish to thank my advisory committee members, Dr. Fa Foster Dai and Dr. Charles E.
Stroud for their guidance and advice on this work.
Appreciation is expressed to Badhri Uppiliappan who gave me a great help during my
internship in Analog Device Inc.
I also appreciate those who have made contributions to my research. Thanks to Jins
Alexander, Hillary Grimes, Kyungseok Kim, Khushboobenumesh Sheth, Fan Wang and
Nitin Yogi for their cooperation and helpful discussions throughout the course of this
research.
Finally, I would like to thank, although this is too weak a word, my parents and sister,
all the other family members and my friends for their continual encouragement and
support throughout this work.
viii
Style manual or journal used: Bibliography follows those of the transactions of the
Institute of Electrical and Electronics Engineers and is sorted in alphabetical order.
Computer software used: Microsoft Word 2003.
ix
TABLE OF CONTENTS
LIST OF FIGURES ?????????????????????????????????????????????????????????????????????????????????????????????????????????? xii
LIST OF TABLES???????????????????????????????????????????????????????????????????????????????????????????????????????????? xv
CHAPTER 1 INTRODUCTION???????????????????????????????????????????????????????????????????????????????????????? 1
1.1 Motivation ?????????????????????????????????????????????????????????????????????????????????????????????????????? 1
1.1.1 Leakage Power ????????????????????????????????????????????????????????????????????????????????????? 1
1.1.2 Glitch Power ????????????????????????????????????????????????????????????????????????????????????????? 2
1.1.3 Process Variation?????????????????????????????????????????????????????????????????????????????????? 3
1.2 Problem Statement ????????????????????????????????????????????????????????????????????????????????????????? 3
1.3 Original Contributions???????????????????????????????????????????????????????????????????????????????????? 4
1.4 Organization of the Dissertation ???????????????????????????????????????????????????????????????????? 5
CHAPTER 2 PRIOR WORK: TECHNIQUES FOR LOW POWER DESIGN ???????????????? 6
2.1 Components of Power Consumption?????????????????????????????????????????????????????????????? 6
2.1.1 Dynamic Power ???????????????????????????????????????????????????????????????????????????????????? 6
2.1.2 Leakage Power ????????????????????????????????????????????????????????????????????????????????????? 7
2.2 Techniques for Leakage Reduction???????????????????????????????????????????????????????????????? 9
2.2.1 Dual-Vth Assignment??????????????????????????????????????????????????????????????????????????? 10
2.2.2 Multi-Threshold-Voltage CMOS ??????????????????????????????????????????????????????? 12
2.2.3 Adaptive Body Bias???????????????????????????????????????????????????????????????????????????? 13
2.2.4 Transistor Stacking ????????????????????????????????????????????????????????????????????????????? 14
2.2.5 Optimal Standby Input Vectors ?????????????????????????????????????????????????????????? 15
2.2.6 Power cutoff ??????????????????????????????????????????????????????????????????????????????????????? 16
2.3 Techniques for Dynamic Power Reduction ????????????????????????????????????????????????? 17
2.3.1 Logic Switching Power Reduction ????????????????????????????????????????????????????? 17
x
2.3.2 Glitch Power Elimination ??????????????????????????????????????????????????????????????????? 21
2.4 Power Optimization with Process Variation ???????????????????????????????????????????????? 26
2.4.1 Leakage Minimization with Process Variation ?????????????????????????????????? 26
2.4.2 Glitch Power Optimization with Process Variation ??????????????????????????? 27
2.5 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 28
CHAPTER 3 DETERMINISTIC MILP FOR LEAKAGE AND GLITCH
MINIMIZATION??????????????????????????????????????????????????????????????????????????????????????? 29
3.1 Leakage and Delay??????????????????????????????????????????????????????????????????????????????????????? 29
3.2 A Deterministic MILP for Power Minimization?????????????????????????????????????????? 31
3.2.2 Objective Function ????????????????????????????????????????????????????????????????????????????? 32
3.2.3 Constraints?????????????????????????????????????????????????????????????????????????????????????????? 34
3.3 Delay Element Implementation???????????????????????????????????????????????????????????????????? 39
3.3.1 Delay Element Comparison???????????????????????????????????????????????????????????????? 40
3.3.2 Capacitances of a Transmission-Gate Delay Element???????????????????????? 41
3.4 MILP and Heuristic Algorithms??????????????????????????????????????????????????????????????????? 44
3.5 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 46
CHAPTER 4 STATISTICAL MILP FOR LEAKAGE OPTIMIZATION UNDER
PROCESS VARIATION ??????????????????????????????????????????????????????????????????????????? 48
4.1 Effects of Process Variation on Leakage Power ?????????????????????????????????????????? 48
4.2 Overview of Deterministic Dual-Vth Assignment by MILP????????????????????????? 53
4.3 Statistical Dual-Vth Assignment ??????????????????????????????????????????????????????????????????? 54
4.3.1 Statistical Subthreshold Leakage Modeling ??????????????????????????????????????? 55
4.3.2 Statistical Delay Modeling ????????????????????????????????????????????????????????????????? 58
4.3.3 MILP for Statistical Dual-Vth Assignment ???????????????????????????????????????? 59
4.4 Linear Approximations ???????????????????????????????????????????????????????????????????????????????? 61
4.5 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 63
CHAPTER 5 TOTAL POWER MINIMIZATION WITH PROCESS VARIATION
BY DUAL-THRESHOLD DESIGN, PATH BALANCING AND
GATE SIZING ??????????????????????????????????????????????????????????????????????????????????????????? 64
xi
5.1 Deterministic MILP for Total Power Optimization by Dual-Vth, Path
Balancing and Gate Sizing ??????????????????????????????????????????????????????????????????????????? 65
5.1.1 Gate Sizing for Dynamic Power Reduction ??????????????????????????????????????? 65
5.1.2 Deterministic MILP for Total Power Reduction ???????????????????????????????? 68
5.1.3 Results ???????????????????????????????????????????????????????????????????????????????????????????????? 72
5.2 Statistical MILP for Total Power Optimization ??????????????????????????????????????????? 77
5.2.1 The Impact of Process Variation on Dynamic Power ???????????????????????? 77
5.2.2 Statistical MILP for Power Optimization with Process Variation ????? 83
5.2.3 Minimizing Impact of Process Variation on Leakage or Glitch
Power ?????????????????????????????????????????????????????????????????????????????????????????????????? 88
5.3 Summary ?????????????????????????????????????????????????????????????????????????????????????????????????????? 93
CHAPTER 6 RESULTS ?????????????????????????????????????????????????????????????????????????????????????????????????? 95
6.1 Results of Deterministic MILP (Chapter 3) for Total Power Optimization? 95
6.1.1 Leakage Power Reduction ?????????????????????????????????????????????????????????????????? 95
6.1.2 Leakage, Dynamic Glitch and Total Power Reduction ?????????????????????? 98
6.1.3 Tradeoff Between Glitch Power Reduction and Area/Power
Overhead Contributed by the Delay Elements ????????????????????????????????? 101
6.2 Results of Statistical MILP (Chapter 4) for Leakage Optimization??????????? 104
6.3 Run Time of MILP Algorithms?????????????????????????????????????????????????????????????????? 109
6.4 Summary ???????????????????????????????????????????????????????????????????????????????????????????????????? 110
CHAPTER 7 CONCLUSION AND FUTURE WORK????????????????????????????????????????????????? 111
7.1 Conclusion?????????????????????????????????????????????????????????????????????????????????????????????????? 111
7.2 Future Work ??????????????????????????????????????????????????????????????????????????????????????????????? 112
7.2.1 Gate Leakage ???????????????????????????????????????????????????????????????????????????????????? 112
7.2.2 Techniques for Glitch Elimination with Process Variation?????????????? 113
7.2.3 Improvement of the MILP formulation???????????????????????????????????????????? 114
7.2.4 Complexity of the MILP formulation??????????????????????????????????????????????? 116
BIBLIOGRAPHY ?????????????????????????????????????????????????????????????????????????????????????????????????????????? 118
xii
LIST OF FIGURES
Figure 2.1 Leakage currents in an inverter. ........................................................................ 7
Figure 2.2 An example dual-Vth circuit............................................................................ 10
Figure 2.3 Schematic of MTCMOS, (a) original MTCMOS, (b) PMOS insertion
MTCMOS, (c) NMOS insertion MTCMOS. .................................................. 13
Figure 2.4 Scheme of an adaptive body biased inverter. .................................................. 14
Figure 2.5 Comparison of leakage for (a) one single off transistor in an inverter and
(b) two serially-connected off transistors in a 2-input NAND gate. ............... 15
Figure 2.6 Scheme of cluster voltage scaling. .................................................................. 18
Figure 2.7 Example circuit for illustrating ECVS. ........................................................... 19
Figure 2.8 Timing window for an n-input NAND gate. ................................................... 22
Figure 2.9 Glitch elimination methods, (a) glitches at the output of a NAND gate,
(b) glitch elimination by hazard filtering, and (c) glitch elimination by
path delay balancing........................................................................................ 23
Figure 2.10 Using redundant implicant to eliminate hazards, (a) a multiplexer with
hazards, and (b) a redundant implementation of multiplier free from
certain hazards................................................................................................. 25
Figure 3.1 Circuit for explaining MILP constraints.......................................................... 35
Figure 3.2 (a) An unoptimized circuit with high leakage and potential glitches, and
(b) its corresponding optimized glitch-free circuit with low leakage. ............ 37
Figure 3.3 A full adder circuit with all gates assigned low Vth (Ileak = 161 nA). ............ 38
Figure 3.4 (a) Dual-Vth assignment and delay element insertion for Tmax = Tc.
(Ileak = 73 nA), and (b) Dual-Vth assignment and delay element insertion
for Tmax = 1.25Tc. (Ileak = 16 nA) ................................................................... 39
Figure 3.5 Delay elements: (a) CMOS transmission gate and (b) Cascaded inverters..... 40
xiii
Figure 3.6 Capacitances in a MOS transistor.................................................................... 41
Figure 3.7 (a) Distributed and (b) Lumped RC models of a NMOS transmission gate. .. 43
Figure 3.8 Comparison of MILP with heuristic backtracking algorithm.......................... 46
Figure 4.1 Leakage power distribution of un-optimized C432 under local effective
gate length variation........................................................................................ 50
Figure 4.2 Leakage power distributions of the deterministically optimized dual-Vth
C432 due to process parameter variations, (a) global variations, (b) local
variations, (c) effective gate length variations, and (d) threshold voltage
variations......................................................................................................... 53
Figure 4.3 Basic idea of using MILP to optimize leakage................................................ 54
Figure 4.4 Detailed deterministic MILP formulation for leakage minimization.............. 54
Figure 4.5 Monte Carlo Spice simulation for leakage distribution of one MUX cell in
TSMC 90nm CMOS technology..................................................................... 56
Figure 4.6 Basic MILP for statistical dual-Vth assignment. .............................................. 59
Figure 4.7 Detailed formulation of statistical dual-Vth assignment MILP........................ 60
Figure 5.1 Extended cell library with 6 corners for gate sizing........................................ 66
Figure 5.2 Comparison of dynamic power optimization of circuits implemented by
2-corner and 6-corner cell library with different weight factors..................... 74
Figure 5.3 Optimization space comparison between leakage and dynamic power of
C432 @ 90?C. ................................................................................................. 75
Figure 5.4 Achieving the minimum total power by adjusting the weight factor (W)....... 76
Figure 5.5 Three possible glitch filtering conditions........................................................ 79
Figure 5.6 Three possible glitch filtering conditions under process variation.................. 80
Figure 5.7 Dynamic power distribution of un-optimized (with-glitch) C432 under
local delay variation. ....................................................................................... 81
Figure 5.8 Dynamic power distribution of optimized (glitch-free) C432 under local
delay variation................................................................................................. 82
Figure 5.9 Comparison of the impacts of 15% local process variation on the dynamic
power in C432 which is optimized by the statistical MILP with the emphasis
on the resistance of dynamic power to process variatin in Section 5.2.3.1, or
xiv
by the deterministic MILP in Section 5.1.2. (N=1, is the expected
normalized minimum dynamic power in the optimized glitch-free C432)..... 91
Figure 5.10 Comparison of the impacts of 15% local Leff process variation on the
leakage power in C432 which are optimized by the statistical MILP with
the emphasis on the resistance of dynamic power to process variation in
Section 5.2.3.1, or by the deterministic MILP in Section 5.1.2. (N1 and
N2 are the normalized nominal leakage power in the optimized glitch-
free C432)........................................................................................................ 92
Figure 5.11 Flowchart of making a decision as to which one, leakage or dynamic
power, should be optimized with process variation....................................... 94
Figure 6.1 Tradeoffs between leakage power and performance. ...................................... 97
Figure 6.2 (a) dynamic power reduction by delay elements with a certain delay D,
and (b) cumulative dynamic power reduction by delay elements with
delay 0~D. ..................................................................................................... 102
Figure 6.3 The relation between the number of inserted delay elements (assorted by
their contribution to the dynamic power reduction) and the corresponding
percentage of glitch power reduction............................................................ 103
Figure 6.4 Power-delay curves of deterministic and statistical approaches for C432.... 106
Figure 6.5 Leakage power distribution of dual-Vth C7552 optimized by deterministic
method, statistical methods with 99% and 95% timing yields, respectively. 107
Figure 7.1 An example circuit used for illustrating the timing violation........................ 115
Figure 7.2 Flowchart of an iterative power optimization procedure. ............................. 117
xv
LIST OF TABLES
Table 3.1 Leakage currents for low and high Vth NAND gates. ...................................... 30
Table 3.2 Delays of low and high Vth NAND gates......................................................... 30
Table 4.1 Leakage power distribution of un-optimized C432 under local effective gate
length variation................................................................................................ 49
Table 4.2 Comparison of leakage power of deterministically optimized dual-Vth C432. 51
Table 5.1 Extended cell library with 6 corners for gate sizing. ....................................... 66
Table 5.2 Comparison of dynamic power optimization of C432 implemented by 2
corners and 6 corners cell library, respectively............................................... 73
Table 5.3 Normalized dynamic power distribution of un-optimized C432 under local
delay variation................................................................................................. 80
Table 5.4 Normalized dynamic power distribution of optimized C432 under local
delay variation................................................................................................. 82
Table 6.1 Leakage reduction alone due to dual-Vth assignment (27?C ).......................... 96
Table 6.2 Comparison of the percentage of glitches in unoptimized circuits with the
real percentage of dynamic power reduction achieved by path balancing
with considering the additional loading capacitances contributed by delay
elements........................................................................................................... 99
Table 6.3 Leakage, glitch and total power reduction for ISCAS?85 benchmark circuits
(90?C )........................................................................................................... 100
Table 6.4 Number of delay elements for optimization. ................................................. 101
Table 6.5 Comparison of leakage power saving due to statistical modeling with two
different timing yields (?)............................................................................. 105
Table 6.6 Monte Carlo Spice simulation results for the mean and the standard
deviation of the leakage distributions of ISCAS?85 circuits optimized
by deterministic method, statistical methods with 99% and 95% timing
yields, respectively........................................................................................ 108
1
CHAPTER 1 INTRODUCTION
The primary contribution of this work is a new design methodology to minimize the
total power consumption in a static CMOS (Complementary Metal Oxide Semiconductor)
circuit. A mixed integer linear programming (MILP) formulation is proposed to optimize
leakage power and dynamic glitch power, without reducing circuit performance, by dual-
Vth assignment, path balancing and gate sizing. To consider the process variation,
statistical delay and leakage models are adopted to optimize power consumption in a
statistical sense such that the impact of process variation on the power and timing yields
is minimized.
1.1 Motivation
With the continuous increase of the density and performance of integrated circuits
due to the scaling down of the CMOS technology, reducing power dissipation becomes a
serious problem that every circuit designer has to face.
1.1.1 Leakage Power
In the past, the dynamic power dominated the total power dissipation of a CMOS
device. Since dynamic power is proportional to the square of the power supply voltage,
lowering the voltage reduces the power dissipation. However, to maintain or increase the
performance of a circuit, its threshold voltage should be decreased by the same factor,
2
which causes the subthreshold leakage current of transistors to increase exponentially and
make it a major contributor to power consumption.
To reduce leakage power, many techniques have been proposed, including transistor
sizing [45, 72], multi-Vth [12, 19, 103], dual-Vth [31, 45, 70, 72, 96-101], optimal standby
input vector selection [69, 84], transistor stacking [64, 65, 106], body bias [10, 91], etc.
As the threshold voltage (Vth) of transistors in a CMOS logic gate is increased, the
leakage current is reduced but the gate slows down. Dual-Vth assignment is an efficient
technique for leakage reduction. The basic idea is utilizing the timing slack on non-
critical paths to minimize the leakage power by assigning high Vth to some or all gates on
non-critical paths.
1.1.2 Glitch Power
Glitches as unnecessary signal transitions account for 20%-70% of the dynamic
switching power [20]. To eliminate glitches, a designer can adopt techniques of hazard
filtering [7, 38, 46, 83, 104] and path balancing [8, 46, 74]. In Hazard filtering, gate
sizing or transistor sizing is used to increase the gate?s inertial delay to filter out the
glitches. An obvious disadvantage of such hazard filtering, when used alone, is that it
may increase the circuit delay due to the increase of the gate delay. Alternatively, any
given performance can be maintained by path delay balancing, although the area
overhead and additional power consumption of the inserted delay elements can become a
major concern. The best way to eliminate glitches is to combine these two techniques [8].
3
1.1.3 Process Variation
The increase in variability of several key process parameters can significantly affect
the design and optimization of low power circuits in the nanometer regime [61]. Due to
the exponential relation of leakage current with some process parameters, such as the
effective gate length, oxide thickness and doping concentration, process variations can
cause a significant increase in the leakage current. There are two principal components of
leakage current. Gate leakage is most sensitive to the variation in oxide thickness (Tox),
while the subthreshold current is extremely sensitive to the variation in effective gate
length (Leff), oxide thickness (Tox) and doping concentration (Ndop). Compared to gate
leakage, subthreshold leakage is more sensitive to parameter variations [66].
Dynamic power is normally much less sensitive to the process variation because of
its approximately linear dependency on the process parameters. However, any
deterministic path balancing technique used for eliminating glitches becomes less
effective under process variation, since the perfect hazard filtering conditions can be
easily corrupted even with a small variation in some process parameters. To make the
glitch-free circuits optimized by path balancing resistant to process variations, a statistical
delay model is developed in this work.
1.2 Problem Statement
The problem solved in this work is: Find a deterministic mixed integer linear
programming (MILP) formulation to optimize the total power consumption by dual
threshold voltage (dual-Vth) assignment, path balancing and gate sizing. Further, derive
4
a statistical mixed integer linear programming formulation to minimize the impact of
process variations on the optimal leakage and dynamic glitch power.
1.3 Original Contributions
In this dissertation, we first propose a deterministic mixed integer linear
programming (MILP) formulation to minimize the leakage and dynamic power
consumption of a static CMOS circuit for a given performance. In a dual-threshold circuit
this method maximizes the number of high-threshold devices and simultaneously
eliminates glitches by balancing paths with the smallest number of delay elements. Gate
sizing is also considered to further minimize the dynamic switching power by reducing
the loading capacitances of gates.
Since leakage exponentially depends on some key process parameters, it is very
sensitive to process variations. We treat gate delay and leakage current as random
variables to reflect the impact of process variation. A mixed integer linear programming
(MILP) method for dual-Vth design is proposed to minimize the leakage power and circuit
delay in a statistical sense such that the effect of process variation on the respective yields
is minimized. Two types of yields are considered. Leakage yield refers to the probability
of an optimized circuit retaining the leakage current below the specified value in the
presence of random process variations. Similarly, timing yield is the probability of the
critical path delay staying below the specification. The experimental results show that
30% more leakage power reduction can be achieved by using the statistical approach,
referred to as statistical MILP, when compared with the deterministic approach.
5
Glitch-free circuits optimized by path balancing are also quite sensitive to process
variations. We further extend the statistical MILP formulation to optimize the dynamic
switching power considering process variation and achieve process-variation-resistant
glitch-free circuits.
1.4 Organization of the Dissertation
In Chapter 2, the basic components of power consumption in a static CMOS circuit
are first discussed, followed by a survey of the relevant published literature on low power
design techniques at the gate level. Chapter 3 proposes an original mixed integer linear
programming (MILP) method for total power minimization by dual-Vth assignment and
path balancing. To consider process variation, statistical MILP optimization of leakage
power and dynamic glitch power are presented in Chapter 4 and Chapter 5, respectively.
In Chapter 6, experimental results are presented. Finally, a conclusion and
recommendations for future work are given in Chapter 7.
6
CHAPTER 2 PRIOR WORK: TECHNIQUES FOR LOW POWER DESIGN
2.1 Components of Power Consumption
Power consumption in a static CMOS circuit basically comprises three components:
dynamic switching power, short circuit power and static power. Compared to the other
two components, short circuit power normally can be ignored in submicron technology.
2.1.1 Dynamic Power
Dynamic power is due to charging and discharging the loading capacitances. It can
be expressed by the following equation [73]:
FAVCP ddLdyn ??= 221 (2.1)
where
? CL is the loading capacitances, including the gate capacitance of the driven gate,
the diffusion capacitance of the driving gate and the wire capacitance;
? Vdd is the power supply voltage;
? A is the switching activity;
? F is the circuit operating frequency.
Equation (2.1) shows that dynamic switching power is directly proportional to the
switching activity, A, or the number of signal transitions. More the signal transitions,
7
higher is the dynamic power consumption. After a transition is applied at the input, the
output of a gate may have multiple transitions before reaching a steady state (see Figure
2.9(a)). Among these transitions, at most one is the essential transition, and all others are
unnecessary transitions that are called glitches or hazards. Hence, dynamic power is
composed of two parts, logic switching power which is contributed by the necessary
signal transitions for logic functions, and glitch power which is caused by glitches or
hazards.
2.1.2 Leakage Power
The leakage current of a transistor is mainly the result of reverse-biased PN junction
leakage, subthreshold leakage and gate leakage as illustrated in Figure 2.1.
Vdd
Vdd
Subthreshold
Leakage
Gate
Leakage
Reverse Biased
PN-Junction
Leakage
Gate
Leakage
Figure 2.1 Leakage currents in an inverter.
8
In submicron technology, the reverse-biased PN junction leakage is much smaller
than subthreshold and gate leakage and hence can be ignored. The subthreshold leakage
is the weak inversion current between source and drain of an MOS transistor when the
gate voltage is less than the threshold voltage [99]. It is given by [42]:
???
?
???
?
???
?
???
? ???
???
?
???
? ?=
T
ds
T
thgs
T
eff
oxsub V
V
nV
VVeV
L
WCI exp1exp8.12
0? (2.2)
where ?0 is the zero bias electron mobility, Cox is the oxide capacitance per unit area, n is
the subthreshold slope coefficient, Vgs and Vds are the gate-to-source voltage and drain-to-
source voltage, respectively, VT is the thermal voltage, Vth is the threshold voltage, W is
the channel width and Leff is the effective channel length, respectively. Due to the
exponential relation between Isub and Vth, an increase in Vth sharply reduces the
subthreshold current.
Gate leakage is the oxide tunneling current due to the low oxide thickness and the
high electric field which increases the possibility that carriers tunnel through the gate
oxide. Tunneling current will become a factor and may even be comparable to
subthreshold leakage when oxide thickness is less than 15-20? [102]. Unlike
subthreshold leakage, which only exists in weakly turned-off transistors, gate leakage
always exists no matter whether the transistor is turned on or turned off [100]. Equation
(2.3) gives the expression of the gate leakage [64].
?
?
?
?
?
?
?
?
?
?
?
?
?
?
???
?
???
? ???
=
ox
ox
ox
ox
ox
ox
effeffgate V
VB
T
VALWI
?
?
2
3
2
)1(1
exp)( (2.3)
9
where Vox is the potential drop across the thin oxide, ?ox is the barrier height for the
tunneling particle (electron or hole), and Tox is the oxide thickness. A and B are physical
parameters given by [64],
oxh
qA
?pi 2
3
16= and hq
mB ox
3
24 23?= ,
where m is the effective mass of the tunneling particle, q is the electronic charge, and h is
the reduced Plank?s constant. The oxide thickness Tox decreases with the technology
scaling to avoid the short channel effects. Equation (2.3) shows that gate leakage
increases significantly with the decrease of Tox.
In this work, we use BPTM (Berkeley Predictive Technology Models) 70nm
technology [1] to implement our designs. Since BPTM 70nm technology is characterized
by BSIM3.5.2, which cannot correctly model gate leakage, gate leakage is omitted in this
work, and all the techniques discussed in Section 2.2 aim at subthreshold leakage
reduction.
2.2 Techniques for Leakage Reduction
Leakage is becoming comparable to dynamic switching power with the continuous
scaling down of CMOS technology. To reduce leakage power, many techniques have
been proposed, including dual-Vth, multi-Vth, optimal standby input vector selection,
transistor stacking, and body bias.
10
2.2.1 Dual-Vth Assignment
Dual-Vth assignment is an efficient technique for leakage reduction. In this method,
each cell in the standard cell library has two versions, low Vth and high Vth. Gates with
low Vth are fast but have high subthreshold leakage, whereas gates with high Vth are
slower but have much reduced subthreshold leakage. Traditional deterministic
approaches for dual-threshold assignment utilize the timing slack of non-critical paths to
assign high Vth to some or all gates on those non-critical paths to minimize the leakage
power.
A
B
C
Co
S
Figure 2.2 An example dual-Vth circuit.
Figure 2.2 gives an example dual-Vth circuit. The bold lines represent the critical
paths. To keep the highest circuit performance, all gates on the critical paths are assigned
low Vth (white gates), while some gates on those non-critical paths can be assigned high
Vth (black gates) to reduce the leakage since there are timing slacks left on those non-
critical paths. Based on the techniques used for determining which gates on non-critical
paths should be assigned high Vth, the dual-Vth approaches can be basically divided into
11
two groups: heuristic algorithms [45, 72, 96-101] and linear programming algorithms [31,
70]. Among heuristic algorithms, the backtracking algorithm [97, 98] used to determine
the dual-Vth assignment only gives a possible solution, not usually an optimal one (see
example in Figure 3.8 in Section 3.4). Because the backtracking search direction for non-
critical paths is always from primary outputs to primary inputs, the gates close to the
primary outputs have a higher priority for high Vth assignment, even though their leakage
power savings may be smaller than those of gates close to the primary inputs. In [96],
dual-Vth assignment is described as a constrained 0-1 programming problem with non-
linear constraint functions. Wang et al. use a heuristic algorithm based on circuit graph
enumeration to solve this problem. Although their swapping algorithm tries to avoid the
local optimization, a global optimization still can not be guaranteed. Unlike a heuristic
algorithm that can only guarantee a locally optimal solution, a linear programming (LP)
formulation ensures a global optimization by describing both the objective function and
constraints as linear functions. Nguyen et al. [70] use LP to minimize the leakage and
dynamic power by gate sizing and dual-Vth device assignment. The optimization work is
separated into several steps. An LP is first used to distribute slack to gates with the
objective of maximizing total power reduction. Then, an independent algorithm is needed
to resize gates and assign threshold levels. This means that in [70] LP still needs the
assistance of a heuristic algorithm to complete the optimization. The method of [31] also
uses MILP to optimize the total power consumption by dual-threshold assignment and
gate sizing.
Dual-Vth assignment can reduce leakage in both active and standby modes since
some gates remain idle even when the whole circuit or system is in the active mode. But
12
the effectiveness of this method depends on the circuit structure. A symmetric circuit
with many critical paths leaves a much reduced optimization space for leakage reduction.
2.2.2 Multi-Threshold-Voltage CMOS
A Multi-Threshold-Voltage CMOS (MTCMOS) circuit [12, 19, 103] is implemented
by inserting high Vth transistors between the power supply voltage and the original
transistors of the circuit [68]. Figure 2.3(a) shows a schematic of a MTCMOS NAND
gate. The original transistors are assigned low Vth to enhance the performance while high-
Vth transistors are used as sleep controllers. In active mode, SL is set low and sleep
control high-Vth transistors (MP and MN) are turned on. Their on-resistance is so small
that VSSV and VDDV can be treated as almost being equal to the real power supply. In
the standby mode, SL is set high, MN and MP are turned off and the leakage current is
low. The large leakage current in the low-Vth transistors is suppressed by the small
leakage in the high-Vth transistors. By utilizing the sleep control high-Vth transistors, the
requirements for high performance in active mode and low static power consumption in
standby mode can both be satisfied.
To reduce the area, power and speed overhead contributed by the sleep control high-
Vth transistors, only one high-Vth transistor is needed. Figure 2.3(b) and 2.3(c) show the
PMOS insertion MTCMOS and NMOS insertion MTCMOS. NMOS insertion MTCMOS
is preferred because for any given size, an NMOS transistor has smaller on-resistance
than a PMOS transistor [100].
Compared to the dual-Vth technique, MTMOS can only reduce leakage in the
standby mode and has additional area-, power-, and speed overheads.
13
VDD
VSS
SL
SL
VDDV
VSSV
Vdd
MP
MN
High Vth
Low Vth
VDD
VSSV
VSS
SL MN
VDD
SL
VDDV
MP
VSS
(a) (c)(b)
Figure 2.3 Schematic of MTCMOS, (a) original MTCMOS, (b) PMOS insertion
MTCMOS, (c) NMOS insertion MTCMOS.
2.2.3 Adaptive Body Bias
The threshold voltage of a short-channel NMOSFET can be expressed by the
following equation [47].
( ) NWddDIBLsbssthth VVVVV ?+???+= ????0 (2.4)
where Vth0 is the threshold voltage with a zero body bias, ?S, ? and ?DIBL are constants for
a given technology, Vbs is the voltage applied between the body and source of the
transistor, ?VNW is a constant that models narrow width effect, and Vdd is the supply
voltage. Equation (2.4) shows that a reverse body bias leads to an increase of the
threshold voltage and a forward body bias decreases the threshold voltage.
Leakage power reduction can be achieved by dynamically adjusting the threshold
voltage through adaptive body bias according to the different operation modes. In the
active mode, forward body (or zero) bias is used to reduce the threshold voltage, which
results in a higher performance. In the standby mode, leakage power is greatly reduced by
14
the optimal reverse body bias, which increases threshold voltages. The basic scheme of
an adaptive-body-biased inverter is shown in Figure 2.4 [100].
Similar to the MTCMOS, adaptive body bias [11, 13, 28, 54, 63, 90] only reduces the
leakage power in the standby mode. With the continuous technology scaling, the optimal
reverse body bias becomes closer to the zero body bias and thus the technique of adaptive
body bias becomes less effective [44].
VDD
VSS
active
active
standby
standby
Vbp
Vbn
Figure 2.4 Scheme of an adaptive body biased inverter.
2.2.4 Transistor Stacking
The two serially-connected devices in the off state have significantly lower leakage
current than a single off device. This is called the stacking effect [64, 65, 106]. In Figure
2.5(b), when both M1 and M2 are turned off, Vm has a positive value due to the leakage
current flowing through M1 and M2. Assuming the bodies of M1 and M2 are both
connected to the ground, Vbs of M1 becomes negative and leads to an increase of M1?s
threshold voltage. At the same time, Vgs and Vds of M1 are both reduced. According to
equation (2.2), the subthreshold leakage in M1 is decreased sharply and suppresses the
15
relative larger leakage current in M2. On the contrary, Vm in Figure 2.4(a) is always equal
to zero and has no effect on Vbs, Vgs and Vds of M and hence on its subthreshold leakage.
Vdd=Vds
Vdd
GND
0
M Vdd=Vds1+ Vds2
0
Vdd
GND
M1
M2
0
0
0
0
(a) (b)
Vm
Vm
Figure 2.5 Comparison of leakage for (a) one single off transistor in an inverter and (b)
two serially-connected off transistors in a 2-input NAND gate.
With transistor stacking [40, 51, 55], by replacing one single off transistor with a
stack of serially-connected off transistors, leakage can be significantly reduced. The
disadvantages of this technique are also obvious. Such a stack of transistors causes either
performance degradation or more dynamic power consumption.
2.2.5 Optimal Standby Input Vectors
Subthreshold leakage current depends on the vectors applied to the gate inputs
because different vectors cause different transistors to be turned off. From the illustration
in Section 2.2.4, a 2-input NAND gate has the smallest subthreshold leakage due to the
stacking effect when the input vector is ?00?. When a circuit is in the standby mode, one
16
could carefully choose an input vector and let the total leakage in the whole circuit to be
minimized [6, 22, 32, 52, 69, 84]. Gao et al. in [32] model leakage current by means of
linearized pseudo-Boolean functions. An exact ILP model was first discussed to
minimize leakage with respect to a circuit?s input vector. A fast heuristic MILP was then
proposed to selectively relax some binary constraints of the ILP model to make a tradeoff
between runtime and optimality.
2.2.6 Power cutoff
Yu and Bushnell [108, 109] present a novel active leakage power reduction method
called the dynamic power cutoff technique (DPCT). The power supply to each gate is
only connected in its switching window, during which the gate makes its transition within
a clock cycle. The circuit is optimally partitioned into groups based on the minimal
switching window (MSW) of gates and power cutoff transistors are inserted into each
group to control the power connection of that group. Since the power supply of each gate
is only turned on during a small timing window within a clock cycle, significant active
leakage reduction can be achieved. One key of this leakage reduction technique is the
implementation of the cutoff transistors, which can be either implemented by high-Vth
transistors as discussed in Section 2.2.2, or by low-Vth transistors that are overdriven by a
power supply larger than Vdd for PMOS cutoff transistors or lower than Vss for NMOS
cutoff transistors.
17
2.3 Techniques for Dynamic Power Reduction
Dynamic power is comprised of logic switching power and glitch power, and can be
expressed by the following equation [73].
FAVCP ddLdyn ??= 221 (2.4)
To reduce dynamic power at a specified operating frequency F, we can either reduce
the dynamic power consumption per logic transition which is determined by loading
capacitances CL, and power supply Vdd, or reduce the number of logic transitions in the
circuit represented by switching activity A.
2.3.1 Logic Switching Power Reduction
2.3.1.1 Dual power supply
Reducing the supply voltage, or voltage scaling [15, 23, 27, 29, 107], is the most
effective technique for dynamic power reduction because dynamic power is proportional
to the square of the power supply. Similar to the dual-Vth approach, the dual Vdd technique
assigns high Vdd to all the gates on the critical paths and low Vdd to some of the gates on
the non-critical paths. When a gate operating at a lower Vdd directly drives a higher Vdd
gate, a level converter is required to avoid the undesirable short circuit power in that
higher Vdd gate due to the possible large DC current caused by the low voltage fanin.
Since the level converters contribute additional power, minimizing the number of level
converters is also important in voltage scaling [9].
18
High Vdd
Cluster
Low Vdd
Cluster
Level
Converters
Combinational LogicFFs
FFs
Figure 2.6 Scheme of cluster voltage scaling.
Clustered voltage scaling (CVS) [94] is an effective voltage scaling technique. The
basic idea is shown in Figure 2.6 [9]. The instances of low Vdd gates driving high Vdd
gates are not allowed and level converters are only used to convert low voltage signals to
high voltage as inputs to flip-flops (FFs) such that the total number of level converters is
minimized.
In contrast to CVS, extended clustered voltage scaling (ECVS) [95] allows level
conversion anywhere and the supply voltage assignment to the gates is much more
flexible. Thus greater dynamic power saving can be achieved compared to the CVS. The
algorithm of ECVS is more complicated than that of CVS, since CVS may use a
backtracking algorithm to determine just two clusters: one high Vdd cluster and the other a
low Vdd cluster. Figure 2.7 gives an example circuit whose dynamic power is optimized
by ECVS. The bold lines represent the critical paths.
19
High Vdd Gate Low Vdd Gate Level Converter FF
Figure 2.7 Example circuit for illustrating ECVS.
2.3.1.2 Gate sizing
Non-critical paths have timing slack and the delays of some gates on these paths can
be increased without affecting the performance. Since the lengths of devices (transistors)
in a gate are usually minimal for a high speed application, the gate delay can be increased
by reducing the device width. As a result, the dynamic power is accordingly decreased
due to smaller loading capacitance CL, which is proportional to the device size.
Gate sizing is a technique that determines device widths for gates. Traditional gate
sizing approaches use Elmore delay models in a polynomial formulation. Heuristics-
based greedy approaches [23-25, 67, 78, 86, 101] can be used to solve such a polynomial
problem. In general, a heuristic algorithm is relatively fast but cannot guarantee a global
optimal.
The gate delay with respect to its device size, used in [23-25, 67, 78, 101], is
generally given by the following equation,
20
i
out
iii GS
CCgdd
i+= (2.5)
where, di is the delay of the gate, gdi is the intrinsic gate delay of gate i, Ci is a constant,
Couti is the fanout load of gate i and GSi is the width of the gate i. The total loading
capacitance Couti is determined based on the fanout of the gate and is given as [78],
)(
)(
?
?
?+=
iFOj
jwireout GSCCC iji (2.6)
where, FO(i) is the set of gates that form the fan-outs for gate i, Cwireij is the capacitance
of the wire connecting gates i and j and C is a constant. When ignoring the wiring
capacitance, Equation (2.5) can be rewritten as (2.7).
?
?
+=
)(iFOj
j
iii GSi
GSkgdd (2.7)
where ki=C?Ci.
A linear programming method is proposed [14] in which a piecewise linear delay
model is adopted to achieve a global optimal solution. A non-linear programming
approach [59] gives the most accurate optimal solution but at a cost of long run times.
2.3.1.3 Transistor sizing
The basic idea of transistor sizing is exactly the same as that of gate sizing except
that in gate sizing all the transistors in one gate are sized together with the same factor
but in transistor sizing each transistor can be sized independently.
Gate intrinsic delay actually depends on the current and previous input vectors which
determine the internal IO path (from the gate inputs to gate output). Different internal IO
paths have different on-resistances that cause distinct path delays (gate intrinsic delays).
21
For a gate on a critical path, only part of its transistors contribute the largest intrinsic gate
delay, so the remaining transistors still can be sized to reduce the capacitances. In gate
sizing, gdi, the intrinsic gate delay of gate i in Equation (2.5) and (2.7) is a fixed value
which makes it impossible to differentiate among the internal IO paths. On the contrary,
transistor sizing [16, 43, 85, 105] explores the maximum possible optimization space by
sizing transistors independently.
2.3.2 Glitch Power Elimination
When transitions are applied at inputs of a gate, the output may have multiple
transitions before reaching a steady state (Figure 2.9(a)). Among these, at most one is the
essential transition, and all others are unnecessary transitions often called glitches or
hazards. Because switching power consumed by the gate is directly proportional to the
number of output transitions, glitches reportedly account for 20%-70% dynamic power
[20].
Agrawal et al. [8] prove that a combinational circuit is minimum transient energy
design, i.e., there is no glitch at the output of any gate, if the difference of the signal
arrival times at every gate's inputs remains smaller than the inertial delay of the gate,
which is the time interval that elapses after a primary input change before the gate can
produce a change at its output. This condition is expressed by the following inequality:
in dtt ?i iWi jileak
iWjidisizeWiIWMin ][3,21
3
? (5.37)
Although MILP tries to minimize ??[i]. ?[i] for some gate may still be positive since
the constraint (5.34) is too tight to be satisfied without the help of a positive ?[i]. Every
90
positive ?[i] possibly causes the glitch generation at gate i?s output. From Table 5.4, we
can also see that the average dynamic power linearly increase with the process variation
approximately. This increase is contributed by the glitch power which generates under
process variation condition. To counteract the increase in the average dynamic power due
to those glitches, or to let the really average dynamic power in process variation
condition still be close to that one achieved by the deterministic MILP formulation, we
have to sacrifice some leakage power to get a smaller logic switching power in advance.
This can be achieved by letting W1 and W2 both equal to 1 in the MILP objective
function (5.38) and adding a new constraint (5.39) to the statistical MILP formation.
[ ] [ ] [ ] ?
?
???
??
??? +
???
?
???
? ?++? ????
?>?i iWi jileak
iWjidCisizeCiICMin ][3,321 ? (5.38)
[ ] [ ]??? ?+
i ji
jidCisizeC ,32 < ( Pdyn_opt / ?) ( ?>1) (5.39)
Pdyn_opt is the optimal dynamic power obtained by the deterministic MILP in Section 5.1.2
and ? is a constant determined by the process variation. By letting ? larger than 1, the
statistical MILP formulation can give an optimal circuit which has less dynamic power.
91
0.00
0.10
0.20
0.30
0.40
0.50
0.95 0.97 0.99 1.01 1.03 1.05 1.07 1.09 1.11 1.13 1.15 1.17 1.19 1.21 1.23
Normalized Dynamic Power
Pro
ba
bil
ity
statistical ?=1.04 3?/?=2.82% (?-N)/N=3.63%
determistic ?=1.14 3?/?=5.13% (?-N)/N=13.53%
Figure 5.9 Comparison of the impacts of 15% local process variation on the dynamic
power in C432 which is optimized by the statistical MILP with the emphasis on the
resistance of dynamic power to process variation in Section 5.2.3.1, or by the
deterministic MILP in Section 5.1.2. (N=1, is the expected normalized minimum
dynamic power in the optimized glitch-free C432).
In C432 optimized by the deterministic MILP formulation in Section 5.1.2, the
optimized total power comprises 59.3?W dynamic power and 5.5?W leakage power as
shown in Figure 5.4. The data in Table 5.4 shows that with 15% local process variation,
its average dynamic power increase 13.53% and with 5.34% standard deviation. To
reduce the impact of process variation on its dynamic power, the objective function (5.38)
and constraint (5.39) (let Pdyn_opt=59.3?W and ?=1.10) are adopted in the statistical MILP
formulation. The two curves in Figure 5.9 show that the average dynamic power only
increases 3.63% instead of 13.53%, and standard deviation is also reduced to 2.82% from
5.13% when 15% local process variation is applied to the optimized glitch-free C432,
although at a cost of 94% average leakage power increase (from 1.0 to 1.94) and a little
bit wider spread of leakage power distribution, which is shown in Figure 5.10.
92
0.00
0.05
0.10
0.15
0.20
0.50 0.65 0.80 0.95 1.10 1.25 1.40 1.55 1.70 1.85 2.00 2.15 2.30 2.45 2.60 2.75 2.90
Normalized Leakage
Pro
ba
bil
ity
statistical N2=1.94 ?=2.25 ?/?=10.24% (?-N1)/N1=16.97%
deterministic N1=1.00 ?=1.17 ?/?=6.64% (?-N2)/N2=15.22%
Figure 5.10 Comparison of the impacts of 15% local Leff process variation on the leakage
power in C432 which are optimized by the statistical MILP with the emphasis on the
resistance of dynamic power to process variation in Section 5.2.3.1, or by the
deterministic MILP in Section 5.1.2. (N1 and N2 are the normalized nominal leakage
power in the optimized glitch-free C432).
5.2.3.2 Minimizing the impact of process variation on leakage
In case 2, leakage almost equals to or is even larger than the dynamic power. Since
leakage is so sensitive to the process variation that we cannot minimize the effect of
process variation on the dynamic power by sacrificing leakage any more. The technique
of using path balancing to eliminate glitches has to be discarded since the increase in the
average dynamic power under process variation may be close to or even larger than the
glitch power eliminated by path balancing. To let the leakage of optimized circuits
resistant to the process variation, we can still use the MILP proposed in Chapter 4 except
every gate has six possible choices instead of just two choices.
93
5.3 Summary
This chapter first introduces the technique of using gate sizing to reduce dynamic
power. Then a deterministic MILP formulation is proposed to optimize the total power
consumption by dual-Vth assignment, path balancing and gate sizing without considering
any process variation. The impact of process variation on dynamic power is analyzed and
a statistical MILP formulation is presented to minimize the impact of process variation on
the dynamic power by giving up some leakage power if the dynamic power is still the
dominant one under process variation. Figure 5.11 gives the flowchart of how to make a
decision as to which one, leakage or dynamic power, should be optimized considering
process variation.
94
Use determinstic MILP
to
get the optimal power
Use statistical MILP
to
minimize process variation
impact on dynamic power
Use statistical MILP
to
minimize process variation
impact on leakge power
Is circuit most time
in standby mode?
Y N
N
Y
Can leakage still be
ignored under certain
process variation?
Simulate optimized circuit
with certain process
variation, get mean and
standard deviation of
leakage
Figure 5.11 Flowchart of making a decision as to which one, leakage or dynamic power,
should be optimized with process variation.
95
CHAPTER 6 RESULTS
To study the increasingly dominant effect of leakage power, we use the BPTM 70nm
CMOS technology [1]. Low Vth for NMOS and PMOS devices are 0.20V and ? 0.22V,
respectively. High Vth for NMOS and PMOS are 0.32V and ? 0.34V, respectively. We
regenerated the netlists of ISCAS?85 benchmark circuits using a 2-corner cell library in
which the maximum gate fanin is 5. Two look-up tables for gate delays and leakage
currents, respectively, of each type of cell were constructed using Spice simulation. A C
program parses the netlist and generates the constraint set for the CPLEX LP solver in the
AMPL software package [30]. CPLEX then gives the optimal Vth assignment as well as
the value and position of every delay element. The dynamic power is estimated by an
event driven logic simulator that incorporates an inertial delay glitch filtering analysis.
6.1 Results of Deterministic MILP (Chapter 3) for Total Power Optimization
6.1.1 Leakage Power Reduction
The results of leakage power reduction for ISCAS?85 benchmark circuits are
shown in Table 6.1. Here the objective of the MILP in Section 3.2 was set to minimize
the leakage alone. All ?di,j variables were forced to be 0 and constraints (3.9) and (3.10)
96
were suppressed. The numbers of gates in column 2 are for our gate library and differ
from those in the original benchmark netlists. Tc in column 3 is the minimum delay of the
critical path when all gates have low Vth. This was determined by the LP discussed in
Section 3.2 in the paragraph following Equation (3.16). Column 4 shows the total leakage
current with all gates assigned low Vth. Column 5 shows the optimized circuit leakage
current with gate Vth reassigned according to the MILP optimization. Column 6 shows the
leakage reduction (%) for optimization without sacrificing any performance. Column 9
shows the leakage reduction with 25% performance sacrifice.
Table 6.1 Leakage reduction alone due to dual-Vth assignment (27?C ).
Optimized (Tmax= Tc) Optimized (Tmax= 1.25Tc)
Circuit
name
#
gates
Tc
(ns)
Unopt
Ileak
(?A) Ileak (?A) Leakage reduction
Sun
OS 5.7
CPU s
Ileak
(?A)
Leakage
reduction
Sun
OS5 .7
CPU s
C432 160 0.751 2.620 1.022 61.0% 0.42 0.132 95.0% 0.3
C499 182 0.391 4.293 3.464 19.3% 0.08 0.225 94.8% 1.8
C880 328 0.672 4.406 0.524 88.1% 0.24 0.153 96.5% 0.3
C1355 214 0.403 4.388 3.290 25.0% 0.1 0.294 93.3% 2.1
C1908 319 0.573 6.023 2.023 66.4% 59 0.204 96.6% 1.3
C2670 362 1.263 5.925 0.659 90.4% 0.38 0.125 97.9% 0.16
C3540 1097 1.748 15.622 0.972 93.8% 3.9 0.319 98.0% 0.74
C5315 1165 1.589 19.332 2.505 87.1% 140 0.395 98.0% 0.71
C6288 1177 2.177 23.142 6.075 73.8% 277 0.678 97.1% 7.48
C7552 1046 1.915 22.043 0.872 96.0% 1.1 0.445 98.0% 0.58
From Table 6.1, we see that by Vth reassignment, the leakage current of most
benchmark circuits is reduced by more than 60% without any performance sacrifice
(column 6). For several large benchmarks leakage is reduced by 90% due to a smaller
percentage of gates being on critical paths. However, for some highly symmetrical
97
circuits, which have many critical paths, such as C499 and C1355, the leakage reduction
is less. Column 9 shows that the leakage reduction reaches the highest level, around 98%,
with some performance sacrifice.
The curves in Figure 6.1 show the relation between normalized leakage power and
normalized critical path delay in a dual-Vth process. Unoptimized circuits with all low Vth
gates are at point (1, 1) and have the largest leakage power and smallest delay. With
optimal Vth assignment, leakage power can be reduced sharply by 61% (from point (1, 1)
to point (1, 0.4)) for C432 or 88% (from point (1, 1) to point (1, 0.1)) for C880,
depending on the circuit, without sacrificing any performance. When normalized Tmax
becomes greater than 1, i.e., we sacrifice some performance, leakage power further
decreases with a slower decreasing trend. When the delay increase is more than 30%, the
leakage reduction saturates at about 98%. Thus, Figure 6.1 provides a guide for making
tradeoffs between leakage power and performance.
1 1.1 1.2 1.3 1.4 1.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Critical Path Delay
No
rm
ali
ze
d L
ea
ka
ge
P
ow
er C432
C880
C1908
Figure 6.1 Tradeoffs between leakage power and performance.
98
6.1.2 Leakage, Dynamic Glitch and Total Power Reduction
The leakage current increases with temperature because VT (thermal voltage, kT/q)
and Vth both depend on the temperature. Our Spice simulation shows that for a 2-input
NAND gate with low Vth, when temperature increases from 27?C to 90?C, the leakage
current increases by a factor of 10. For a 2-input NAND gate with high Vth, this factor is
20.
The leakage in our look-up table is from simulation for 27?C operation. To manifest
the dominant effect of the leakage power, we estimate the leakage currents at 90?C by
multiplying the total leakage current obtained from CPLEX LP solver [30] by a factor
between 10 and 20 as determined by the proportion of low to high threshold transistors.
The dynamic power is estimated by a glitch filtering event driven simulator, and is
given by
( )c
i
i
iddinv
dyn
dyn T
FOTVC
T
EP
?
???
==
?
2.11000
5.0 2
(6.1)
where Cinv is the gate capacitance of an inverter, Ti is the number of transitions at the
output of gate i when 1,000 random vectors are applied at PIs, and FOi is the number of
fanouts for gate i. The vector period is assumed to be 20% greater than the critical path
delay, Tc. By simulating each gate?s number of transitions, we can estimate the glitch
power reduction.
When path balancing is used to eliminate glitches, the additional loading
capacitances contributed by the inserted delay elements consume extra dynamic power.
Whether the technique of path balancing is effective depends on the ratio of this dynamic
99
power overhead to the eliminated glitch power. Data in column 3 of Table 6.2 show that
less than 10% dynamic power reduction can be achieved for some circuits, for instance,
C432, C1908 and C2670, when the loading capacitances of the delay elements are
considered. This is mainly because we use a 2-corner cell library which has a limited
optimization space. As we discussed and illustrated in Section 5.1, using a 6-corner cell
library, normally we can achieve more dynamic power reduction since this type of cell
library makes it possible to eliminate glitches by path balancing and to reduce loading
capacitances for each logic transition by gate sizing simultaneously.
Table 6.2 Comparison of the percentage of glitches in unoptimized circuits with the real
percentage of dynamic power reduction achieved by path balancing considering the
additional loading capacitances contributed by the delay elements.
Cirt. Name Glitch % in Un-opt Circuits Dynamic Power reduction W/ C
L of delay elements
C432 27.4 % 8.63 %
C499 29.0 % 18.13%
C880 27.8 % 16.23%
C1355 43.5 % 35.79%
C1908 22.4 % 8.39%
C2670 21.6 % 7.42%
C3540 31.5 % 14.04%
C5315 34.6 % 12.08%
C6288 76.0 % 68.73%
C7552 40.2 % 27.74%
To demonstrate the projected dominant effect of leakage power in a sub-micron
CMOS technology, we compare the leakage power and dynamic power at 90?C in Table
6.3. ?All low Vth? means the unoptimized circuit that has all low threshold gates, and
?Dual Vth? means the optimized circuit whose Vth has been optimally assigned for
100
minimum leakage. Column 6 gives the dynamic power of the optimized design, which is
further reduced as shown in column 7 when glitches are eliminated by path balancing and
the power overhead contributed by the delay elements is considered. We observe that for
70nm BPTM CMOS technology at 90?C, unoptimized leakage power (column 3) of some
large ISCAS'85 benchmark circuits can account for about one half or more of the total
power consumption (column 9). With Vth reassignment, the optimized leakage power of
most benchmark circuits is reduced to around 10%. With further glitch (dynamic) power
reduction, the average total power reduction for ISCAS'85 benchmark is 40%. Some have
a total reduction of up to 70%.
Table 6.3 Leakage, glitch and total power reduction for ISCAS?85 benchmark circuits
(90?C ).
Leakage Power (?W) Dynamic Power (?W) Total Power (leakage+dynamic) (?W)
Cirt.
Name
#
gates All low
Vth
Dual
Vth
Reduc.
%
Dual
Vth
Delay
Opt.
Reduc.
%
All low
Vth
Dual Vth
+ Del
Opt.
Reduc
%
C432 160 35.77 11.87 66.8% 101.0 73.3 8.63 % 136.8 104.15 23.86%
C499 182 50.36 39.94 20.7% 225.7 160.3 18.13% 276.1 224.72 18.61%
C880 328 85.21 11.05 87.0% 177.3 128.0 16.23% 262.5 159.57 39.21%
C1355 214 54.12 39.96 26.3% 293.3 165.7 35.79% 347.4 228.29 34.29%
C1908 319 92.17 29.69 67.8% 254.9 197.7 8.39% 347.1 263.20 24.17%
C2670 362 115.4 11.32 90.2% 128.6 100.8 7.42% 244.0 130.38 46.57%
C3540 1097 302.8 17.98 94.1% 333.2 228.1 14.04% 636.0 304.40 52.14%
C5315 1165 421.1 49.79 88.2% 465.5 304.3 12.08% 886.6 459.06 48.22%
C6288 1177 388.5 97.17 75.0% 1691 405.6 68.73% 2079.7 625.95 69.90%
C7552 1046 444.4 18.75 95.8% 380.9 227.8 27.74% 825.3 293.99 64.38%
101
6.1.3 Tradeoff Between Glitch Power Reduction and Area/Power Overhead
Contributed by the Delay Elements
The area overhead due to the inserted delay elements is somewhat large. From Table
6.4, we observe that the number of delay elements (?di #) is almost equal to the number
of gates (Gates #), except for C1355. If we assume that the average number of transistors
in a gate is 4 (e.g., consider a 2-input NAND gate), and each delay element implemented
by a CMOS transmission gate has 2 transistors, the rough area overhead will be around
50% due to delay element insertion. The main reason is that our cell library has some
complex gates, for example, AOI (AND-OR-INVERT) gates whose fanin number may
be as large as 5. Some NAND or NOR gates can also have as large as 4 inputs. As a
result, it is very possible that more than one delay buffer is inserted for a gate. The
solution is to use a simpler and smaller cell library which will be used in our following
research.
Table 6.4 Number of delay elements for optimization.
Circuit Gates # ?di #
C432 160 160
C499 182 128
C880 328 303
C1355 214 112
C1908 319 313
C2670 362 330
C3540 1097 1258
C5315 1165 1198
C6288 1177 1307
C7552 1046 845
102
Considering the usually large routing area in an ASIC chip, and the fact that a large
percentage of delay elements have quite small delays (see the following discussion in this
section) and hence small sizes, the actual area overhead should be much less than 50%.
We also applied the path balancing technique to an ADI (Analog Devices Inc.)
RFID chip which is implemented in TSMC 0.35um CMOS technology and has 46,000
placeable cells (39,000 combinational cells and 7,000 sequential cells). The power
simulation results by PrimePower [5] show that 11.8% of the logic transitions are glitches
which consume 8% of the dynamic power. Here the internal logic switchings inside of a
standard cell are not considered. Although this RFID chip does not consume too much
glitch power, the analysis of the values and number of the delay elements is still
instructive.
Figure 6.2 (a) dynamic power reduction by delay elements with a certain delay D, and (b)
cumulative dynamic power reduction by delay elements with delay 0~D.
(b)
(a)
103
Figure 6.3 The relation between the number of inserted delay elements (sorted by their
contribution to the dynamic power reduction) and the corresponding percentage of glitch
power reduction
Figure 6.2(a) gives the PDF (probability distribution function) of the delay elements,
or dynamic power reduction by delay elements with a certain delay D. It shows that most
of the delay elements inserted for glitch elimination have small delays. This coincides
with the nature of the circuit structure in a high speed ASIC design. The logic depth of
any combinational logic between two flip-flops cannot be very large in a high speed
ASIC chip and hence the timing window determining the value of a delay element is not
wide. Figure 6.2(b) gives the CDF (cumulative distribution function) of the delay
elements, or the cumulative dynamic power reduction by delay elements with delay 0~D.
It is found that delay elements whose delays are larger than 5ns or 10ns for the best case
or worst case, respectively, contribute very little to the dynamic power reduction.
104
Therefore, Figure 6.2 gives us guidance for the selection of the delay elements when a
standard cell library of delay elements is constructed.
The relation between the number of inserted delay elements and the corresponding
percentage of glitch power reduction is shown in Figure 6.3. Delay elements are assorted
by their contribution to the dynamic power reduction. The fist 10,000 delay elements play
a much more important role in glitch elimination, while the remaining 4,000 cells?
contribution is very small. Figure 6.3 actually provides circuit designers a clue of how to
make a tradeoff between glitch reduction and power/area overhead introduced by those
delay elements. It should be noted that the glitches propagated at the outputs of buffers
and inverters disappear automatically when all the paths are balanced. In this RFID chip,
this type of glitches consumes 25% and 39% of the total glitch power for the worst case
and best case respectively. Therefore, the maximum glitch power contributed by all the
remaining glitches is 75% and 61% of the total glitch power for the worst case and best
case respectively.
6.2 Results of Statistical MILP (Chapter 4) for Leakage Optimization
To compare the power optimization results of the statistical MILP with those from
the deterministic approach, we assume that all the gates have the same ci1 and ci2
(sensitivities of gate delay to the variation of different process parameters) in equation
(4.9). Therefore, each gate has the same ri and we assume 3?/? of ri is 15%. This
assumption is only for the simplicity and does not change the efficacy of the statistical
approach.
105
In the deterministic method, the worst case is applied, which means all gate delays
increase 15% and hence Tmax increases 15% accordingly. To make the comparison
between the statistical method and the deterministic approach reasonable, Tmax in the
statistical approach is also 115% of the original value.
Table 6.5 Comparison of leakage power saving due to statistical modeling with two
different timing yields (?).
Circuit
Deterministic
Optimization
(? = 100%)
Statistical Optimization
(? = 99%)
Statistical Optimization
(? = 95%)
Name # gate
Unopt.
Leak.
Power
(?W)
Opt.
Leak.
Power
(?W)
Run
Time
(s)
Opt.
Leak.
Power
(?W)
Extra
Power
Saving
Run
Time
(s)
Opt.
Leak.
Power
(?W)
Extra
Power
Saving
Run
Time
(s)
C432 160 2.620 1.003 0.00 0.662 33.9% 0.44 0.589 41.3% 0.32
C499 182 4.293 3.396 0.02 3.396 0.0% 0.22 2.323 31.6% 1.47
C880 328 4.406 0.526 0.02 0.367 30.2% 0.18 0.340 35.4% 0.18
C1355 214 4.388 3.153 0.00 3.044 3.5% 0.17 2.158 31.6% 0.48
C1908 319 6.023 1.179 0.03 1.392 21.7% 11.21 1.169 34.3% 17.5
C2670 362 5.925 0.565 0.03 0.298 47.2% 0.35 0.283 49.8% 0.43
C3540 1097 15.622 0.957 0.13 0.475 50.4% 0.24 0.435 54.5% 1.17
C5315 1165 19.332 2.716 1.88 1.194 56.0% 67.63 0.956 64.8% 19.7
C7552 1046 22.043 0.938 0.44 0.751 20.0% 0.88 0.677 27.9% 0.58
Average of ISCAS?85 benchmarks 0.24 29.2% 9.04 41.3% 4.64
ARM7 15.5k 686.56 495.12 15.69 425.44 14.07% 36.79 425.44 14.07% 36.4
In Table 6.5, columns 4, 6 and 9 give the optimized leakage power by
deterministic MILP, by statistical MILP with 99% timing yield and by statistical MILP
with 95% timing yield. From Table 6.5, we see that compared to the deterministic
method, which uses the fixed values, when we use statistical models for gate delay and
subthreshold leakage current, ISCAS85 benchmarks can achieve on average 29% greater
leakage power saving with 99% timing yield and 41% greater power saving with 95%
timing yield. The reason is that statistical model has a more flexible optimization space,
106
while the deterministic approach assumes the worst case. For C499 and C1355, which
have many critical paths due to their extremely symmetrical circuit structures, the
optimization space is limited and therefore the additional power saving contributed by
optimization is much smaller, especially with the higher timing yield (99%). It is also
obvious that with a decreased timing yield, higher power saving can be achieved due to
the relaxed timing constraints, resulting in a larger optimization space.
1 1.05 1.1 1.15 1.2 1.25 1.3 1.35 1.4 1.45 1.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Normalized Timing
No
rm
ali
ze
d L
ea
ka
ge
P
ow
er
Deterministic LP
Statistical LP ( 99% Timing Yield)
Statistical LP ( 95% Timing Yield)
Figure 6.4 Power-delay curves of deterministic and statistical approaches for C432.
Figure 6.4 shows the power-delay curves for C432?s leakage optimization by
deterministic and statistical approaches. The starting points of the three curves, (1,1),
(1,0.66) and (1,0.59), indicate that if we can reduce the leakage power to 1 unit by
deterministic approach, 0.65 unit and 0.59 unit leakage power can be achieved by using
statistical approach with 99% and 95% timing yields, respectively. The lower the timing
yield, the higher the power saving. With a further relaxed Tmax, all three curves will give
more reduction in leakage power because more gates will be assigned high Vth.
107
0. 000
0. 050
0. 100
0. 150
0. 200
0. 250
0
1.00
E-0
7
2.00
E-0
7
3.00
E-0
7
4.00
E-0
7
5.00
E-0
7
6.00
E-0
7
7.00
E-0
7
8.00
E-0
7
9.00
E-0
7
1.00
E-0
6
1.10
E-0
6
1.20
E-0
6
1.30
E-0
6
1.40
E-0
6
Leakage Power (uW)
Pro
ba
bil
ity
C7552_d
C7552_p99
C7552_p95
Figure 6.5 Leakage power distribution of dual-Vth C7552 optimized by deterministic
method, statistical methods with 99% and 95% timing yields, respectively.
Figure 6.5 shows a clear comparison of the leakage power distributions of dual-Vth
C7552 optimized by the deterministic method, and the statistical methods with 99% and
95% timing yield, respectively. We can see that both mean and standard deviation of
C7552?s leakage distribution are reduced by statistical approaches as compared to the
deterministic method. Although not very obvious, leakage optimization with 95% timing
yield indeed has a smaller spread than that with 99% timing yield.
The reason for the narrower leakage distribution and lower average leakage lies in
the fact that more high threshold gates can be assigned by the statistical method
compared to the deterministic method. Because, when optimizing the leakage and
considering process variation by the deterministic approach, we have to analyze the worst
case which is too pessimistic. The leakage in high Vth gates is less sensitive to the process
variation, because although high Vth gates may have the same percentage of leakage
variation as low Vth gates, the absolute variation in high Vth gates is certainly much
108
smaller. Therefore, a higher percentage of high Vth gates in a dual-Vth circuit ensures a
narrower spread and a lower mean of leakage power.
Table 6.6 Monte Carlo Spice simulation results for the mean and the standard deviation
of the leakage distributions of ISCAS?85 circuits optimized by deterministic method,
statistical methods with 99% and 95% timing yields, respectively.
In global process variation, all the gate delays have the same percentage of variation,
and hence no effect on the timing window constraints in the statistical MILP, which
means the assignment of the dual threshold voltages is kept unchanged. On the other
hand, subthreshold current is most sensitive to the Leff variation. Therefore, in Table 6.6,
we simulate the leakage distributions of all the deterministically and statistically
optimized ISCAS?85 benchmark circuits with local Leff variation (3?/?=15%) by Spice.
Just as expected, almost all of the mean and standard deviations of the leakage
distributions are decreased by statistically approaches. Narrower spread and lower mean
Circuit Deterministic Optimization (? = 100%) Statistical Optimization (? = 99%) Statistical Optimization (? = 95%)
Name # gates
Nom.
Leak.
(nW)
Mean
Leak.
(nW)
S.D.
(nW)
Nom.
Leak.
(nW)
Mean
Leak.
(nW)
S.D.
(nW)
Nom.
Leak.
(nW)
Mean
Leak.
(nW)
S.D.
(nW)
C432 160 0.907 1.059 0.104 0.603 0.709 0.074 0.522 0.614 0.069
C499 182 3.592 4.283 0.255 3.592 4.283 0.255 2.464 2.905 0.197
C880 328 0.551 0.645 0.086 0.430 0.509 0.080 0.415 0.491 0.079
C1355 214 3.198 3.744 0.200 3.090 3.606 0.202 2.199 2.610 0.175
C1908 319 1.803 2.123 0.170 1.356 1.601 0.116 1.140 1.341 0.127
C2670 362 0.635 0.750 0.078 0.405 0.473 0.046 0.395 0.461 0.043
C3540 1097 1.055 1.243 0.119 0.527 0.611 0.032 0.493 0.575 0.031
C5315 1165 2.688 3.128 0.165 1.229 1.420 0.088 1.034 1.188 0.067
C7552 1045 0.924 1.073 0.069 0.774 0.903 0.049 0.701 0.823 0.045
Average of ISCAS?85 benchmarks 0.138 0.105 0.093
109
can be achieved by the statistical method with 95% timing yield compared to that with
99% timing yield.
6.3 Run Time of MILP Algorithms
The run time of MILP is always a big concern since its complexity is exponential
in the number of variables and constraints of the problem in the worst case. However, our
experimental results show that the real computing time may depend on the circuit
structure, logic depth, etc., and may not be exponential.
The CPU times shown in columns 7 and 10 of Table 6.1 are for the deterministic
MILP in Chapter 3. From the data in Table 6.1, it is hard to express any relation between
the CPU time and the problem size, such as the number of gates in the circuit. For
example, MILP solution time for the 1046-gate C7552 is only 1.1 CPU seconds, which is
much less than 140 CPU seconds used for the 1165-gate C5315. Even for the same size
problems, different constraints require varying solution times. Consider the 1177-gate
C6288 circuit as an example. When the timing constraints for primary outputs (POs) are
relaxed by 25%, CPU time decreases from 277 CPU seconds to 7.48 CPU seconds. As a
result, MILP formulation may still solve some very large size circuits and provide a
possibly better solution to dual-Vth assignment problem through global optimization.
Running on a 2.4GHz AMD Opteron 150 processor with 3GB memory, many
CPU run times for solving the statistical MILP problem (Chapter 4) were less than one
second (columns 5, 8 and 11 in Table 6.5). This is an advantage over other techniques [61]
because we achieve 30% more leakage reduction with 99% timing yield but in much less
CPU time.
110
Besides ISCAS?85 benchmark circuits, we also optimized the leakage for an
ARM7 IP core, which has 15,500 combinational cells and 2,400 sequential cells
implemented in TSMC 90nm CMOS process. The experimental results in the last row of
Table 6.5 show that 14% more leakage reduction is achieved with 37 seconds run time
and partly demonstrate the feasibility of applying our MILP approach to real circuits.
Although today's SOC may have over one million gates, it always has a hierarchical
structure. MILP constraints can be generated for submodules at a lower level and the run
times will be determined by the number of gates in the individual submodules. Such a
technique may not guarantee a global optimization, but still would get a reasonable result
within acceptable run time.
6.4 Summary
Experimental results are presented and discussed in this chapter. The results show
that the deterministic MILP formulation proposed in Chapter 3 for total power reduction
by path balancing and dual-Vth assignment can achieve on average 40% total power
reduction. If combining with the gate sizing technique discussed in Chapter 5, more
power reduction can be obtained. The statistical MILP proposed in Chapter 4, for
minimizing the impact of process variation on leakage power, can achieve 30% more
leakage power reduction compared to the deterministic MILP formulation. Whether is it
necessary to minimize the impact of process variation on dynamic power depends upon
the circuit applications and which one is the dominant power component in the optimized
circuit, so we only propose the corresponding statistical MILP formulation in Chapter 5
and do not give more detailed results in this chapter.
111
CHAPTER 7 CONCLUSION AND FUTURE WORK
In this chapter, we summarize the entire work of this dissertation and provide some
suggestions for future research.
7.1 Conclusion
With the continuing trend of technology scaling, leakage power has become a main
contributor to power consumption. Dual-Vth assignment has emerged as an efficient
technique for decreasing leakage power. In Chapter 3, a mixed integer linear
programming (MILP) technique simultaneously minimizes the leakage and glitch power
consumption of a static CMOS circuit for any specified input to output critical path delay.
Using dual-threshold devices, the number of high-threshold devices is maximized and a
minimum number of delay elements are inserted to reduce the differential path delays
below the inertial delays of the incident gates. The key features of the method are that the
constraint set size for the MILP model is linear in the circuit size and a power-
performance tradeoff is allowed. Experimental results show 96%, 28% and 64%
reductions of leakage power, dynamic power and total power, respectively, for the
benchmark circuit C7552 implemented in 70nm BPTM CMOS technology.
Due to the exponential relation between subthreshold current and process parameters,
such as the effective gate length, oxide thickness and doping concentration, process
112
variations can severely affect both power and timing yields of the designs obtained by the
MILP formulation. In Chapter 4, we propose a statistical mixed integer linear
programming method for dual-Vth design that minimizes the leakage power and circuit
delay in a statistical sense such that the impact of process variation on the respective
yields is minimized. Experimental results show that 30% more leakage power reduction
can be achieved by using the statistical approach when compared with the deterministic
approach that has to consider the worst case in the presence of process variations.
Compared to subthreshold leakage, dynamic power is less sensitive to the process
variation due to its linear dependency on the process parameters. However, the
deterministic technique discussed in Chapter 3, which uses path balancing to eliminate
glitches, becomes ineffective when process variation is considered. This is because the
perfect hazard filtering conditions can easily be destroyed even by a small variation in
some process parameters. We present a statistical MILP formulation to achieve a process-
variation-resistant glitch-free circuit in Chapter 5. Experimental results on an example
circuit prove the effectiveness of this method.
7.2 Future Work
Some ideas and suggestions for future work are given in this section.
7.2.1 Gate Leakage
In this work, the contribution of the gate-tunneling effect to the total leakage is not
considered. Neglecting such effect can result in an underestimation of the total leakage.
Our examples use BPTM 70nm technology, which is characterized by BSIM 3.5.2 and
may not correctly model gate leakage. However, with appropriate design, the gate
113
leakage of transmission?gate delay elements can be kept small. For example, it is
possible to use high-threshold transistors in the delay elements because these transistors
are always on and the switching speed is not important. These transistors have a thicker
gate oxide layer and hence have a lower gate leakage than low-threshold transistors.
Otherwise, in general, the problem of gate leakage will have to be answered by future
research.
7.2.2 Techniques for Glitch Elimination with Process Variation
Although leakage has become a dominant contributor to the total power
consumption with the continued technology scaling, its contribution drops much lower
than the dynamic power after the circuit is optimized by efficient techniques, such as
dual-Vth assignment and adaptive body bias [11, 13, 28, 54, 63, 90]. Elimination of
glitches in a high activity circuit is still imperative. Path balancing is not preferred due to
its sensitivity to the process variation. Hazard filtering (gate sizing) is sort of resistant to
the process variation but has its own limitation in that a 100% glitch reduction is not
guaranteed because of the impossibility of increasing any gate delays on critical paths [8].
Besides, there exists an upper bound on the achievable gate delay in any specific
technology. Combining the two methods together to achieve both a complete glitch
reduction and a process-variation-resistant circuit should be a challenging topic. In
Chapter 5, we propose such a combined technique but at a cost of leakage increase. More
efficient algorithms should be developed.
114
7.2.3 Improvement of the MILP formulation
We have applied our MILP formulation of dual-Vth assignment to some industry
circuits. (This work was done in the CAD group, Analog Devices Inc., during the
summer of 2006).
The basic steps were as follows.
1. Assign all cells in the circuit with low Vth. Then the LVT (low Vth) delay and
leakage for each cell is extracted by PrimeTime [4] from this LVT design.
2. Similarly, acquire the HVT (high Vth) delay and leakage for each cell from a
HVT design.
3. Extract timing (slack, specified clock period, input-delay, output-delay),
primary inputs and primary outputs for each timing group by PrimeTime [4].
4. Construct an MILP model based on the above information.
5. Solve this MILP problem and give the optimal dual-Vth solution.
6. Update the circuit from the original LVT design to the dual-Vth design according
to the CPLEX solution.
7. Check timing and power of the new dual-Vth design by PrimeTime [4] and
PrimePower [5] respectively.
Experimental results show that twice the leakage power reduction can be achieved
by our dual-Vth assignment MILP model as compared to a design by commercial tools,
Physical Compiler [3] and Astro [2]. About 42% of 15,500 combinational cells were
assigned high Vth. The runtime for solving this MILP was only several minutes since in
such an ASIC design, only small combinational logic clouds (sub-circuits) are inserted
115
between registers, primary inputs and registers, and registers and primary outputs. Thus,
the runtime of an MILP actually depends on the circuit structure of the most complicated
or deepest combinational cloud, instead of on the total number of the cells in the circuit.
FF FF
2 2 3
2 3 3.2
1 32
8ns
LVT design
dual-Vth design
+ +
++
=
=8.2ns
7ns
gate delay
Figure 7.1 An example circuit used for illustrating the timing violation.
However, there are some timing violations (the actual path delay is larger than the
timing specification) in the dual-Vth design optimized by our MILP formulation. A
possible reason is that the delays of LVT cells extracted in step 1 are not accurate. We
use the example circuit in Figure 7.1 to briefly explain the cause of a timing violation. In
LVT design, the path delay is 7ns (2+2+3) which is less than the specified clock period of
8ns. CPLEX finds that to reduce leakage, gate 2 can be assigned high Vth without a
timing violation since gate 2?s HVT delay is 3ns and hence the new path delay should be
8ns (2+3+3). However, we found that in the dual-Vth design, the LVT delay of gate 3
actually changes to 3.2ns due to the increase in its input transition time, as a result of the
increase in gate 2?s output transition time. Therefore, the real path delay is 8.2ns
(2+3+3.2) which is beyond the specified clock period 8ns. The cause of this phenomenon
116
is the interdependency of delays of gates, which was neglected for simplicity in our
MILP formulation.
An iterative method shown in Figure 7.2 may be adopted to get the accurate delays
and hence avoid the timing violation problem. If any timing violation is found, the new
delays for all LVT cells are extracted from the current dual-Vth design and the MILP
formulation is updated correspondingly. A different optimal solution is then given by the
CPLEX solver with fewer timing violations. We continue iterations until all timing
violations are eliminated.
7.2.4 Complexity of the MILP formulation
As discussed in Section 6.3, for a several-million-gate SOC, MILP constraints can
be generated for its submodules at a lower level and the run times will be determined by
the number of gates in the individual submodules. Such a technique may not guarantee a
global optimization, but still would obtain a reasonable result within acceptable run time.
To further reduce runtime of an MILP or ILP formulation, we may also adopt a
relaxed LP that uses the LP solution as the starting point and round off the variables such
that they satisfy the (M)ILP. Kompella et al. in [48, 49] use branch-and-bound methods
to do exhaustive search in the integer space. Although given enough computing time,
those methods can find an optimal solution, feasible non-optimal solutions with
acceptable run time can be achieved. In [41], the authors propose a new recursive
rounding approach which can produce solutions that are close to optimal and, most
importantly, the complexity of the new approach is polynomial.
117
According to current dual-
Vth soution to update the
delays of all LVT cells in
the MIP problem
CPLEX solves the updated
MIP problem, and gives a
new dual-Vth solution
Check the timing of the
new netlist by PrimeTime
Is there any timing
violation?
Extract the delays of all
LVT cells by PrimeTime
Update the dual-Vth netlist
N
Y
START
STOP
Figure 7.2 Flowchart of an iterative power optimization procedure.
118
BIBLIOGRAPHY
[1] BPTM: Berkeley Predictive Technology Model. http://www-
device.eecs.berkeley.edu/~ptm/.
[2] http://www.synopsys.com/products/astro/astro.html.
[3] http://www.synopsys.com/products/unified_synthesis/unified_synthesis.html.
[4] http://www.synopsys.com/products/analysis/primetime_ds.html.
[5] http://www.synopsys.com/products/solutions/galaxy/power/power.html.
[6] A. Abdollahi, F. Fallah, and M. Pedram, "Leakage Current Reduction in CMOS
VLSI Circuits by Input Vector Control," IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 12, no. 2, pp. 140-154, 2004.
[7] V. D. Agrawal, "Low Power Design by Hazard Filtering," in Proc. 10th
International Conference on VLSI Design, 1997, pp. 193-197.
[8] V. D. Agrawal, M. L. Bushnell, G. Parthasarathy, and R. Ramadoss, "Digital
Circuit Design for Minimum Transient Energy and a Linear Programming
Method," in Proc. of the 12th International Conference on VLSI Design, 1999, pp.
434-439.
[9] B. Amelifard, A. Afzali-Kusha, and A. Khadernzadeh, "Enhancing the Efficiency
of Cluster Voltage Scaling Technique for Low-Power Application," in IEEE
International Symposium on Circuits and Systems, 2005, pp. 1666-1669
[10] H. Ananthan, C. H. Kim, and K. Roy, "Larger-than-Vdd Forward Body Bias in
Sub-0.5V Nanoscale CMOS," in Proc. of the International Symposium onLow
Power Electronics and Design, 2004, pp. 8-13.
[11] H. Ananthan, C. H. Kim, and K. Roy, "Larger-than-Vdd Forward Body Bias in
Sub-0.5V Nanoscale CMOS," in Proc. of the 2004 International Symposium on
Low Power Electronics and Design, 2004, pp. 8-13.
[12] M. H. Anis, M. K. Mahmoud, and M. I. Elmasry, "Efficient Gate Clustering for
MTCMOS Circuits," in Proc. of the 14th IEEE International Conference on
ASIC/SOC, 2001, pp. 34-38.
119
[13] V. K. Arnim, E. Borinski, P. Seegebrecht, H. Fiedler, R. Brederlow, R. Thewes, J.
Berthold, and C. Pacha, "Efficiency of Body Biasing in 90-nm CMOS for Low-
Power Digital Circuits," IEEE Journal of Solid-State Circuits, vol. 40, no. 7, pp.
1549-1556, 2005.
[14] M. R. C. M. Berkelaar and J. A. G. Jess, "Gate Sizing in MOS Digital Circuits with
Linear Programming," in Proc. European Design Automation Conference, 1990, pp.
217-221.
[15] Z. Bo, D. Blaauw, D. Sylvester, and K. Flautner, "The Limit of Dynamic Voltage
Scaling and Insomniac Dynamic Voltage Scaling," IEEE Transactions on Very
Large Scale Integration (VLSI) Systems, vol. 13, no. 11, pp. 1239-1252, 2005.
[16] M. Borah, R. M. Owens, and M. J. Irwin, "Transistor Sizing for Low Power CMOS
Circuits," IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 15, no. 6, pp. 665-671, 1996.
[17] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and V. De,
"Parameter Variations and Impact on Circuits and Microarchitecture," in Proc.
Design Automation Conference, 2003, pp. 338-342.
[18] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital,
Memory and Mixed-Signal VLSI Testing. Boston: Springer, 2000.
[19] B. H. Calhoun, F. A. Honore, and A. Chandrakasan, "Design Methodology for
Fine-Grained Leakage Control in MTCMOS," in Proc. of the International
Symposium on Low Power Electronics and Design, 2003, pp. 104-109.
[20] A. P. Chandrakasan and R. W. Brodersen, Low Power Digital CMOS Design.
Boston: Kluwer Academic Publishers, 1995.
[21] H. Chang and S. S. Sapatnekar, "Full-Chip Analysis of Leakage Power under
Process Variations, Including Spatial Correlations," in Proc. Design Automation
Conference, 2005, pp. 523-528.
[22] X. Chang, D. Fan, Y. Han, Z. Zhang, and X. Li, "Fast Algorithm for Leakage
Power Reduction by Input Vector Control," in Proc. of the 6th International
Conference on ASIC, 2005, pp. 14-18.
[23] C. Chen and M. Sarrafzadeh, "Simultaneous Voltage Scaling and Gate Sizing for
Low-Power Design," IEEE Transactions on Circuits and Systems II: Analog and
Digital Signal Processing, vol. 49, no. 6, pp. 400-408, 2002.
[24] O. Coudert, "Gate Sizing for Constrained Delay/Power/Area Optimization," IEEE
Transactions on VLSI Systems, vol. 5, no. 4, pp. 465-472, 1997.
120
[25] O. Coudert, R. Haddad, and S. Manne, "New Algorithms for Gate Sizing: A
Comparative Study," in Proc. Design Automation Conference, 1996, pp. 734-739.
[26] A. Davoodi and A. Srivastava., "Probabilistic Dual-Vth Optimization Under
Variability," in Proc. of the International Symposium on Low Power Electronics
and Design, 2005, pp. 143-147.
[27] M. Elgebaly and M. Sachdev, "Efficient Adaptive Voltage Scaling System through
On-Chip Critical Path Emulation," in Proc. of the 2004 International Symposium on
Low Power Electronics and Design, 2004, pp. 375-380.
[28] W. Elgharbawy, P. Golconda, A. Kumar, and M. Bayoumi, "A New Gate-Level
Body Biasing Technique for PMOS Transistors in Subthreshold CMOS Circuits,"
in IEEE International Symposium on Circuits and Systems, 2005, pp. 4697-4700.
[29] A. Forestier and M. R. Stan, "Limits to Voltage Scaling from the Low Power
Perspective," in Proc. of the 13th Symposium on Integrated Circuits and Systems
Design, 2000, pp. 365-370.
[30] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Modeling Language for
Mathematical Programming. South San Francisco, California: The Scientific Press,
1993.
[31] F. Gao and J. P. Hayes, "Total Power Reduction in CMOS Circuits via Gate Sizing
and Multiple Threshold Voltages," in Proc. Design Automation Conference, 2005,
pp. 31-36.
[32] F. Gao and J. P. Hayes, "Exact and Heuristic Approaches to Input Vector Control
for Leakage Power Reduction," IEEE Transactions on Computer-Aided Design of
Integrated Circuits and Systems, vol. 25, no. 11, pp. 2564-2571, 2006.
[33] M. Hashimoto, H. Onodera, and K. Tamaru, "A Power Optimization Method
Considering Glitch Reduction by Gate Sizing," in Proc. of the International
Symposium on Low Power Electronics and Design, 1998, pp. 221-226.
[34] F. Hu, "Process-Variation-Resistant Dynamic Power Optimization for VLSI
Circuits," PhD Thesis, Auburn, Alabama: Auburn University, May 2006.
[35] F. Hu and V. D. Agrawal, "Dual-Transition Glitch Filtering in Probabilistic
Waveform Power Estimation," in Proc. of the 15th Great Lakes Symposium on
VLSI, 2005, pp. 357?360.
[36] F. Hu and V. D. Agrawal, "Enhanced Dual-Transition Probabilistic Power
Estimation with Selective Supergate Analysis," in Proc. of the 23rd International
Conference on Computer Design, 2005, pp. 366?369.
121
[37] F. Hu and V. D. Agrawal, "Input-Specific Dynamic Power Optimization for VLSI
Circuits," in Proc. of the International Symposium on Low Power Electronics and
Design, 2006, pp. 232-237.
[38] E. Jacobs and M. Berkelaar, "Using Gate Sizing to Reduce Glitch Power," in Proc.
of the PRORISC/IEEE Workshop on Circuits, Systems and Signal Processing, 1996,
pp. 183-188.
[39] E. Jacobs and M. Berkelaar, "Gate Sizing Using A Statistical Delay Model," in
Proc. Design, Automation and Test in Europe Conference and Exhibition, 2000, pp.
283-290.
[40] M. C. Johnson, D. Somasekhar, C. Lih-Yih, and K. Roy, "Leakage Control With
Efficient Use of Transistor Stacks in Single Threshold CMOS," IEEE Transactions
on Very Large Scale Integration (VLSI) Systems, vol. 10, no. 1, pp. 1-5, 2002.
[41] K. R. Kantipudi and V. D. Agrawal, "A Reduced Complexity Algorithm for
Minimizing N-Detect Tests," in Proc. of the 20th International Conference on VLSI
Design, 2007, pp. 492-497.
[42] J. T. Kao and A. P. Chandrakasan, "Dual-Threshold Voltage Techniques for Low-
Power Digital Circuits," IEEE Journal of Solid-State Circuits, vol. 35, no. 7, pp.
1009-1018, July 2000.
[43] W. H. Kao, N. Fathi, and L. Chia-Hao, "Algorithms for Automatic Transistor
Sizing in CMOS Digital Circuits," in Proc. Design Automation Conference, 1985,
pp. 781-784.
[44] A. Keshavarzi, S. Ma, S. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar,
and V. De, "Effectiveness of Reverse Body Bias for Leakage Control in Scaled
Dual Vt CMOS ICs," in Proc. of the International Symposium on Low Power
Electronics and Design, 2001, pp. 207-212.
[45] M. Ketkar and S. S. Sapatnekar, "Standby Power Optimization via Transistor
Sizing and Dual Threshold Voltage Assignment," in Proc. International
Conference on Computer-Aided Design, 2002, pp. 375-378.
[46] S. Kim, J. Kim, and S. Y. Hwang, "New Path Balancing Algorithm for Glitch
Power Reduction," IEE Proc. of Circuits, Devices and Systems, vol. 148, no. 3, pp.
151-156, 2001.
[47] P. Ko, J. Huang, Z. Liu, and C. Hu, "BSIM3 for Analog and Digital Circuit
Simulation," in Proc. IEEE Symposium on VLSI Technology CAD, 1993, pp. 400-
429.
122
[48] S. Kompella, S. Mao, Y. T. Hou, and H. D. Sherali, "Path Selection and Rate
Allocation for Video Streaming in Multihop Wireless Networks," in Proc. Military
Communications Conference, 2006, pp. 1-7.
[49] S. Kompella, S. Mao, Y. T. Hou, and H. D. Sherali, "Cross-Layer Optimized
Multipath Routing for Video Communications in Wireless Networks," IEEE
Journal on Selected Areas in Communications, Special Issue on Cross-Layer
Optimized Wireless Multimedia Communications, May 2007.
[50] H. W. Kuhn and A. W. Tucker, "Nonlinear Programming," in Proc. 2nd Berkeley
Symposium on Mathematical, Statistics and Probabilistics, Berkeley, 1951, pp.
481-492.
[51] V. Liberali, E. Malavasi, and D. Pandini, "Automatic Generation of Transistor
Stacks for CMOS Analog Layout," in Proc. IEEE International Symposium on
Circuits and Systems, 1993, pp. 2098-2101.
[52] Y. Lin and Q. Gang, "A Combined Gate Replacement and Input Vector Control
Approach for Leakage Current Reduction," IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 14, no. 2, pp. 173-182, 2006.
[53] M. Liu, W.-S. Wang, and M. Orshansky, "Leakage Power Reduction by Dual-Vth
Designs Under Probabilistic Analysis of Vth Variation," in Proc. of the
International Symposium on Low Power Electronics and Design, 2004, pp. 2-7.
[54] X. Liu and S. Mourad, "Performance of Submicron CMOS Devices and Gates with
Substrate Biasing," in Proc. of the 2000 IEEE International Symposium on Circuits
and Systems, 2000, pp. 9-12.
[55] Y. Liu and Z. Gao, "Timing Analysis of Transistor Stack for Leakage Power
Saving," in Proc. of the 9th International Conference on Electronics, Circuits and
Systems, 2002, pp. 41-44.
[56] Y. Lu and V. D. Agrawal, "Leakage and Dynamic Glitch Power Minimization
Using Integer Linear Programming for Vth Assignment and Path Balancing," in
Proc. of the International Workshop on Power and Timing Modeling, Optimization
and Simulation, 2005, pp. 217?226.
[57] Y. Lu and V. D. Agrawal, "CMOS Leakage and Glitch Power Minimization for
Power-Performance Tradeoff," Journal of Low Power Electronics, vol. 2, no. 3, pp.
378-387, Dec. 2006.
[58] Y. Lu and V. D. Agrawal, "Statistical Leakage and Timing Optimization for
Submicron Process Variation," in Proc. of the 20th International Conference on
VLSI Design, 2007, pp. 439-444.
123
[59] V. Mahalingam and N. Ranganathan, "A Nonlinear Programming Based Power
Optimization Methodology for Gate Sizing and Voltage Selection," in Proc. IEEE
Computer Society Annual Symposium on VLSI, 2005, pp. 180-185.
[60] N. R. Mahapatra, S. V. Garimella, and A. Tarbeen, "An Empirical and Analytical
Comparison of Delay Elements and a New Delay Element Design," in Proc. IEEE
Computer Society Workshop on VLSI, 2000, pp. 81 ? 86.
[61] M. Mani, A. Devgan, and M. Orshansky, "An Efficient Algorithm for Statistical
Minimization of Total Power Under Timing Yield Constraints," in Proc. Design
Automation Conference, 2005, pp. 309-314.
[62] M. Mani and M. Orshansky, "A New Statistical Optimization Algorithm for Gate
Sizing," in Proc. International Conference on Computer Design, 2004, pp. 272-277.
[63] M. Miyazaki, G. Ono, and T. Kawahara, "Optimum Threshold-Voltage Tuning for
Low-Power, High-Performance Microprocessor," in Proc. IEEE International
Symposium on Circuits and Systems, 2005, pp. 17-20.
[64] S. Mukhopadhyay, C. Neau, R. T. Cakici, A. Agarwal, C. H. Kim, and K. Roy,
"Gate Leakage Reduction for Scaled Devices Using Transistor Stacking," IEEE
Transactions on VLSI Systems, vol. 11, no. 4, pp. 716-730, 2003.
[65] S. Mukhopadhyay and K. Roy, "Accurate Modeling of Transistor Stacks to
Effectively Reduce Total Standby Leakage in Nano-Scale CMOS Circuits," in Proc.
Symposium on VLSI Circuits, 2003, pp. 53-56.
[66] S. Mukhopadhyay and K. Roy, "Modeling and Estimation of Total Leakage Current
in Nano-Scaled CMOS Devices Considering the Effect of Parameter Variation," in
Proc. of the International Symposium on Low Power Electronics and Design, 2003,
pp. 172-175.
[67] A. K. Murugavel and N. Ranganathan, "Gate Sizing and Buffer Insertion Using
Economic Models for Power Optimization," in Proc. of the 17th International
Conference on VLSI Design, 2004, pp. 195-200.
[68] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, "1-V
Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage
CMOS," IEEE Journal of Solid-State Circuits, vol. 30, no. 8, pp. 847-854, 1995.
[69] R. Naidu and E. T. A. F. Jacobs, "Minimizing Standby Leakage Power in Static
CMOS Circuits," in Proc. Design, Automation and Test in Europe, 2001, pp. 370-
376.
[70] D. Nguyen, A. Davare, M. Orshansky, D. Chinney, B. Thompson, and K. Keutzer,
"Minimization of Dynamic and Static Power Through Joint Assignment of
124
Threshold Voltages and Sizing Optimization," in Proc. of the International
Symposium on Low Power Electronics and Design, 2003, pp. 158-163.
[71] S. M. Nowick and D. L. Dill, "Exact Two-Level Minimization of Hazard-Free
Logic with Multiple-Input Changes," in IEEE/ACM International Conference on
Computer-Aided Design, 1992, pp. 626-630.
[72] P. Pant, R. K. Roy, and A. Chatterjee, "Dual-Threshold Voltage Assignment with
Transistor Sizing for Low Power CMOS circuits," IEEE Transactions on VLSI
Systems, vol. 9, no. 2, pp. 390-394, April 2001.
[73] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits: A
Design Perspective. Upper Saddle River, NJ: Prentice Hall, 2003.
[74] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Minimum Dynamic Power CMOS
Circuit Design by a Reduced Constraint Set Linear Program," in Proc. of the 16th
International Conference on VLSI Design, 2003, pp. 527-532.
[75] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Design of Variable Input Delay Gates
for Low Dynamic Power Circuits," in Proc. of the International Workshop on
Power and Timing Modeling, Optimization and Simulation, 2005, pp. 436?445.
[76] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Variable Input Delay CMOS Logic
for Low Power Design,," in Proc. of the 18th International Conference on VLSI
Design, 2005, pp. 596-604.
[77] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Transistor Sizing of Logic Gates to
Maximize Input Delay Variability," Journal of Low Power Electronics, vol. 2, no. 1,
pp. 121?128, Apr. 2006.
[78] N. Ranganathan and A. K. Murugavel, "A Microeconomic Model for Simultaneous
Gate Sizing and Voltage Scaling for Power Optimization," in Proc. International
Conference on Computer Design, 2003, pp. 276-281.
[79] R. Rao, A. Devgan, D. Blaauw, and D. Sylvester, "Parametric Yield Estimation
Considering Leakage Variability," in Proc. Design Automation Conference, 2004,
pp. 442-447
[80] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, "Statistical Estimation of
Leakage Current Considering Inter- and Intra-Die Process Variation," in Proc. of
the International Symposium on Low Power Electronics and Design, 2003, pp. 84-
89.
[81] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, "Statistical Analysis of
Subthreshold Leakage Current for VLSI Circuits," IEEE Transactions on VLSI
Systems, vol. 12, no. 2, pp. 131-139, Feb. 2004.
125
[82] T. Sakurai and A. R. Newton, "Alpha-Power Law MOSFET Model and its
Applications to CMOS Inverter Delay and Other Formulas," IEEE Journal of Solid-
State Circuits, vol. 25, no. 2, pp. 584?594, Feb. 1990.
[83] C. V. Schimpfle, A. Wroblewski, and J. A. Nossek, "Transistor Sizing for
Switching Activity Reduction in Digital Circuits," in Proc. European Conference
on Theory and Design, 1999.
[84] W.-T. Shiue, "Leakage Power Estimation and Minimization in VLSI Circuits," in
Proc. of the International Symposium on Circuits and Systems, 2001, pp. 178-181.
[85] J. M. Shyu, A. Sangiovanni-Vincentelli, J. P. Fishburn, and A. E. Dunlop,
"Optimization-Based Transistor Sizing," IEEE Journal of Solid-State Circuits, vol.
23, no. 2, pp. 400-409, 1988.
[86] J. Singh, V. Nookala, L. Zhi-Quan, and S. Sapatnekar, "Robust Gate Sizing by
Geometric Programming," in Proc. Design Automation Conference, 2005, pp. 315-
320.
[87] A. Srivastava, R. Bai, D. Blaauw, and D. Sylvester, "Modeling and Analysis of
Leakage Power Considering Within-Die Process Variations," in Proc. International
Symposium on Low Power Electronics and Design, 2002, pp. 64-67.
[88] A. Srivastava, S. Shah, D. Sylvester, D. Blaauw, and S. Director, "Accurate and
Efficient Gate-Level Parametric Yield Estimation Considering Correlated
Variations in Leakage Power and Performance," in Proc. Design Automation
Conference, 2005, pp. 535-540.
[89] A. Srivastava, D. Sylvester, and D. Blaauw, "Statistical Optimization of Leakage
Power Considering Process Variations Using Dual-Vth and Sizing," in Proc.
Design Automation Conference, 2004, pp. 773-778.
[90] M. Sumita, S. Sakiyama, M. Kinoshita, Y. Araki, Y. Ikeda, and K. Fukuoka,
"Mixed Body Bias Techniques with Fixed Vt and Ids Generation Circuits," IEEE
Journal of Solid-State Circuits, vol. 40, no. 1, pp. 60-66, 2005.
[91] J. W. Tschanz, S. G. Narendra, Y. Ye, B. A. Bloechel, S. Borkar, and V. De,
"Dynamic Sleep Transistor and Body Bias for Active Leakage Power Control of
Microprocessors," IEEE Journal of Solid-State Circuits, vol. 38, no. 11, pp. 1838-
1845, 2003.
[92] S. Uppalapati, "Low Power Design of Standard Cell Digital VLSI Circuits,"
Master's Thesis, New Brunswick, New Jersey: Rutgers University, Oct. 2004.
[93] S. Uppalapati, M. L. Bushnell, and V. D. Agrawal, "Glitch-Free Design of Low
Power ASICs Using Customized Resistive Feedthrough Cells," in Proc. of the 9th
VLSI Design and Test Symposium, 2005, pp. 41-48.
126
[94] K. Usami and M. Horowitz, "Clutser Vlotage Scaling Technique for Low-Power
Design," in Proc. International Symposium on Low Power Electronics and Design,
1995, pp. 3-8.
[95] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanzawa, M. Ichida, and K.
Nogami, "Automated Low-Power Technique Exploiting Multiple Supply Voltages
Applied to A Media Processor," IEEE Journal of Solid-State Circuits, vol. 33, no. 3,
pp. 463-472, 1998.
[96] Q. Wang and S. B. K. Vrudhula, "Static Power Optimization of Deep Submicron
CMOS Circuits for Dual VT Technology," in Proc. International Conference on
Computer-Aided Design, 1998, pp. 490-496.
[97] L. Wei, Z. Chen, M. Johnson, and K. Roy, "Design and Optimization of Low
Voltage High Performance Dual Threshold CMOS Circuits," in Proc. Design
Automation Conference, 1998, pp. 489-494.
[98] L. Wei, Z. Chen, K. Roy, M. C. Johnson, Y. Ye, and V. K. De, "Design and
Optimization of Dual-Threshold Circuits for Low-Voltage Low-Power
Applications," IEEE Transactions on VLSI Systems, vol. 7, no. 1, pp. 16?24, Mar.
1999.
[99] L. Wei, Z. Chen, K. Roy, Y. Ye, and V. De, "Mixed-Vth (MVT) CMOS Circuit
Design Methodology for Low Power Applications," in Proc. Design Automation
Conference, 1999, pp. 430-435.
[100] L. Wei, K. Roy, and V. K. De, "Low Voltage Low Power CMOS Design
Techniques for Deep Submicron ICs," in Proc. of the 13th International
Conference on VLSI Design, 2000, pp. 24-29.
[101] L. Wei, K. Roy, and C.-K. Koh, "Power Minimization by Simultaneous Dual-Vth
Assignment and Gate-Sizing," in Proc. of the IEEE Custom Integrated Circuits
Conference, 2000, pp. 413-416.
[102] N. H. E. Weste and D. Harris, CMOS VLSI Design: A Circuits and System
Perspective, 3rd ed.: Addison Wesley, 2004.
[103] H.-S. Won, K.-S. Kim, and K.-O. Jeong, "An MTCMOS Design Methodology and
Its Application to Mobile Computing," in Proc. of the International Symposium on
Low Power Electronics and Design, 2003, pp. 110-115.
[104] A. Wroblewski, C. V. Schimpfle, and J. A. Nossek, "Automated Transistor Sizing
Algorithm for Minimizing Spurious Switching Activities in CMOS Circuits," in
Proc. IEEE International Symposium on Circuits and Systems, 2000, pp. 291-294.
127
[105] A. C. H. Wu, N. Vander Zanden, and D. Gajski, "A New Algorithm for Transistor
Sizing in CMOS Circuits," in Proc. European Design Automation Conference,
1990, pp. 589-593.
[106] Y. Ye, S. Borkar, and V. De, "A New Technique for Standby Leakage Reduction in
High-Performance Circuits," in Proc. Symposium on VLSI Circuits, 1998, pp. 40-41.
[107] C. Yeh and Y.-S. Kang, "Cell-Based Layout Techniques Supporting Gate-level
Voltage Scaling for Low Power," IEEE Transactions on Very Large Scale
Integration (VLSI) Systems, vol. 9, no. 6, pp. 983-986, 2001.
[108] B. Yu, "A Novel Dynamic Power Cutoff Technology (DPCT) for Active Leakage
Reduction in Deep Submicron VLSI CMOS Circuits," PhD Thesis, New Brunswick,
New Jersey: Rutgers, The State University of New Jersey, October 2007.
[109] B. Yu and M. L. Bushnell, "A Novel Dynamic Power Cutoff Technique (DPCT) for
Active Leakage Reduction in Deep Submicron CMOS Circuits," in Proc. of the
International Symposium on Low Power Electronics and Design, 2006.