Ultra Low Power CMOS Design
by
Kyungseok Kim
A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy
Auburn, Alabama
May 9, 2011
Keywords: Ultra-Low Power Design, Subthreshold Circuits, Dual Voltage Design, Mixed
Linear Integer Program, Gate Slack Analysis
Copyright 2011 by Kyungseok Kim
Approved by
Vishwani D. Agrawal, Chair, James J. Danaher Professor of
Electrical and Computer Engineering
Victor P. Nelson, Professor of Electrical and Computer Engineering
Fa Foster Dai, Professor of Electrical and Computer Engineering
Abstract
The ubiquitous era of emerging portable devices demands long battery lifetime as a
primary design goal. Subthreshold circuit design can reduce energy per cycle in an order
of magnitude of nominal operating circuits by scaling power supply voltage (Vdd) below
the device threshold voltage. But, it lowers significantly circuit performance as a penalty.
Stringent energy budget and moderate speed requirements of ultra low power systems in the
market may not be best satisfied just by scaling a single supply voltage. Optimized circuits
with dual supply voltages provide an opportunity to resolve these demands.
Utilizing the time slack for dual-Vdd is a well-known technique for a circuit operating
with nominal Vdd for reducing the power consumption with small extra cost in physical
design. Most previous works in subthreshold circuit design only used a single supply voltage
scaled down to reduce the energy consumption without considering the time slack.
We propose a method for minimum energy digital CMOS (Complementary Metal Ox-
ide Semiconductor) circuit design using dual subthreshold supply. The delay penalty of a
traditional level converter is unacceptably high when the voltages are in the subthreshold
range. In this work, level converters are either not used at all or special multiple logic-level
gates are used only when, after accounting for their cost, they offer advantage. Starting from
a lowest energy per cycle design whose single supply voltage is in the subthreshold range,
a new mixed integer linear program (MILP) finds a second lower supply voltage optimally
assigned to gates with time slack. The MILP accounts for the energy and delay character-
istics of logic gates interfacing two different signal levels. New types of linearized AND and
OR constraints are used in this MILP. We show energy saving up to 24.5% over the best
available designs of ISCAS?85 benchmark circuits.
ii
For modern large VLSI systems, the MILP may suffer from unacceptable run-time as the
MILP algorithm for dual voltage design has exponential-time complexity. Gate slack analysis
gives an opportunity to reduce the time complexity as linear for assigning the optimal lower
supply voltage (VDDL) to initially all higher supply voltage (VDDH) gates in a single-Vdd
circuit. The slack of a gate in a digital circuit is the difference between the critical path
delay and the delay of the longest path through that gate. Using the previous work on static
timing analysis, we have developed a linear-time algorithm for computing the slack for all
gates in a circuit.
We propose a new slack-time based algorithm for dual-Vdd design to achieve maximum
energy saving. For a given lower supply voltage, we first compute slacks for all gates of the
circuit and then partition them into three groups. In one group, all gates can be uncondi-
tionally assigned the low voltage. In the second group, no gate can be assigned low voltage.
In the third group, low voltage assignment to any single gate will not violate the critical
path timing and, therefore, the low voltage must be sequentially assigned to gates one at
a time. Because all steps of the voltage assignment algorithm rely on linear-time analysis,
the overall complexity of this energy optimization method is close to linear in the number of
gates. We apply our algorithm to optimize ISCAS?85 benchmark circuits and compare the
results with those from MILP. Energy savings from the new slack-time based algorithm is
very closed to the global optimum MILP solutions. The optimization time using gate slack
can be as low as 1/43 when compared to that of the MILP method for dual-Vdd design. The
new slack-time based algorithm is especially beneficial for large circuits, which may contain
few critical or near-critical paths and many paths with large slack.
iii
Acknowledgments
Without seamless encouragement, guidance, and support from my advisor, Professor
Vishwani D. Agrawal, the dissertation would not have been written. First, I am deeply
thankful to him as a very generous mentor throughout my doctoral studies. The work has
been delightful and successful under his valuable advice.
I would like to thank Professor Victor P. Nelson and Professor Fa Foster Dai for their
great suggestions as my advisory committee members and through their distinguished lec-
tures. I am grateful to Professor Allen Landers for serving as the outside reader for my
dissertation and his valuable suggestions. I am also grateful to Professor Prathima Agrawal,
the Director of Wireless Engineering Research and Education Center (WEREC), for provid-
ing financial support for my research.
I sincerely appreciate our former and current colleagues for invaluable discussion and en-
couragement. Thanks to Nitin, Jins, Lu, Hillary, Khushboo, Ashfaq, Fan, Wei, Yu, Manish,
Mridula, Priya, Rakshith, Jia, Murali, Lixing and Suraj. I would like to thank my friends
for unforgettably joyful memories at Auburn.
Finally, I would like to thank my parents for their endless love and support during my
whole life. I am grateful to my brother and his family for their encouragement. I am greatly
thankful to my wife and lovely daughter for their patience and support.
iv
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Contribution of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Organization of the Dissertation . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Overview of Subthreshold Circuit Design . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Origin of Subthreshold Circuit Design . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Minimum Voltage Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Minimum Energy Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 True Minimum Energy Design Using Dual Below-Threshold Supply Voltages . . 18
3.1 Subthreshold Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1.1 Minimum Operating Voltage . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.2 Delay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.1.3 Energy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.2 Dual-Vdd Scheme for Subthreshold Operation . . . . . . . . . . . . . . . . . 21
3.3 MILP for VDDL Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
v
4 Minimum Energy CMOS Design with Dual Subthreshold Supply and Multiple
Logic-Level Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.1 Operation of Conventional Level Converters in Subthreshold Regime . . . . 39
4.2 MILP for Dual Voltage Design with Multiple Logic-Level Gates . . . . . . . 44
4.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5 Process Variation Effect on Minimum Energy Design Using Dual Subthreshold
Supply . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1 Multiple Supply Voltages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Technology Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6 Dual Voltage Design for Minimum Energy Using Gate Slack . . . . . . . . . . . 65
6.1 MILP for Optimal VDDL and Dual Vdd Assignment . . . . . . . . . . . . . . . 66
6.2 New Slack-Time Based Algorithm for Dual-Vdd Design . . . . . . . . . . . . 68
6.3 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.2.1 Minimum Energy Design with Process Variations Using Dual-Vdd . . 80
7.2.2 Level Converter for Multi-Vdd Design in Subthreshold Regime . . . . 80
7.2.3 A New Hybrid (MILP + Gate Slack Analysis) Linear-Time Algorithm
for Low Power Design Using Multi-Vdd . . . . . . . . . . . . . . . . . 81
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
vi
List of Figures
2.1 First measurement of an MOS transistor at very low current (annotated copy of
Vittoz?s notebook [75]). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 CMOS inverter voltage transfer characteristics (VTC) [66]. . . . . . . . . . . . . 8
2.3 Minimum voltage operation for 10%-90% output swing for a 0.18?m ring oscil-
lator [10]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Energy per cycle for an 8-bit ripple carry adder through HSPICE [27] simulation
in PTM 90nm CMOS, Emin = 3.29fJ at Vdd = 0.17V (Vth,pmos = -0.21V and
Vth,nmos = 0.29V). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.5 The delay and leakage current normalized to an inverter at Vdd = 1.2V through
HSPICE simulation in PTM 90nm CMOS. . . . . . . . . . . . . . . . . . . . . . 14
2.6 Total Energy vs. Vdd for a 16?16 multiplier [81]. . . . . . . . . . . . . . . . . . 17
3.1 HSPICE [27] simulations for the output logic levels of inverter chains normalized
to nominal supply voltage, 1.2V, with scaling Vdd in PTM 90nm CMOS (INV:
Wp = 5.5?Lg, Wn = 2.4?Lg). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Dual-Vdd schemes and level converter schematic [67, 68]. . . . . . . . . . . . . . 23
3.3 A two-inverter chain without level converter. . . . . . . . . . . . . . . . . . . . . 24
3.4 Driven gates and input swing levels. . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 Topological constraints. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.6 Simulation setup. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7 Energy per cycle for a 16-bit ripple carry adder for single-Vdd and dual-Vdd in
subthreshold region, activity factor ? = 0.21, PTM 90nm CMOS. . . . . . . . . 30
3.8 Gate slack distribution (number of gates vs. slack) of a 16-bit ripple carry adder
and a 4?4 multiplier for single-Vdd (= VDDH) and dual-Vdd (= VDDH, VDDL) at
the minimum energy point; slacks obtained by static timing analysis using gate
delays for PTM 90nm CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
vii
3.9 Gate slack distribution of c880 and c6288 for single-Vdd and dual-Vdd at the min-
imum energy point in PTM 90nm CMOS. . . . . . . . . . . . . . . . . . . . . . 34
3.10 Output signal waveforms of s1 and s1q in a 16-bit ripple carry adder at minimum
operating voltage, VDDL = 0.09V, in HSPICE simulation, PTM 90nm CMOS. . 35
3.11 VDDL bound for given VDDH with LH configured cells. . . . . . . . . . . . . . . 35
4.1 Energy and speed benefits of dual Vdd design in subthreshold voltage operation
for a 32-bit ripple carry adder through HSPICE simulation in PTM 90nm CMOS
(activity factor ? = 0.17, number of gates = 352). . . . . . . . . . . . . . . . . . 39
4.2 Two traditional level converter schematics [40]. . . . . . . . . . . . . . . . . . . 41
4.3 Multiple logic-level NAND2 gate [17]. . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Multiple logic-level gate leakage power normalized to a standard INV (Vdd=Vin
= 300mV) in PTM 90nm CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5 Gate slack distribution for minimum energy per cycle for c3540. . . . . . . . . . 50
4.6 Gate slack distribution for minimum energy per cycle for c880. . . . . . . . . . . 51
5.1 Gate slack distribution (number of gates vs. slack) for c2670 at Vdd = 0.30V;
slacks obtained by static timing analysis using gate delays for PTM 90nm CMOS. 54
5.2 HSPICE simulation results of minimum energy per cycle and energy optimal
voltage for a 32-bit RCA for a single-Vdd in PTM CMOS technology (? = 0.30). 56
5.3 The optimal VDDL from MILP [35] algorithm and total energy per cycle from
HSPICE simulation of dual-Vdd design for a 32-bit RCA (Fig. 5.2) in PTM CMOS
Technology. The relationship of figure of merit (FOM) to energy saving is shown
for technology scaling trend. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.4 HSPICE simulation results of NMOS Vth variation and active current Ion vari-
ability at Vdd = 0.30V from a 1k-point Monte Carlo simulation with normally
distributed vth0 parameter in PTM CMOS technology. . . . . . . . . . . . . . . 60
5.5 HSPICE simulation results of critical path delay and minimum energy for a 32-
bit RCA (Fig. 5.3(a)) from a 1k-point Monte Carlo simulation in PTM CMOS
technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.6 Distribution of the output capacitance and delay variability for an inverter with
fanout of four from a 1k-point Monte Carlo simulation with normally distributed
vth0 parameter in PTM CMOS technology. . . . . . . . . . . . . . . . . . . . . 63
6.1 Procedure of slack-time based algorithm for ISCAS?85 benchmark circuit c2670
in PTM 90nm CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
viii
6.2 Slack time distribution of an optimized c2670 with VDDH = 1.2V and VDDL =
0.69V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Slack time distribution before and after optimization of slack-time based algo-
rithm for c880. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
ix
List of Tables
3.1 Measurement of a gate delay with a single INV load and static leakage power
in Figure 3.4 configurations at VDDH = 250mV and VDDL = 200mV through
HSPICE simulation for PTM 90 nm CMOS. . . . . . . . . . . . . . . . . . . . . 24
3.2 ComparisonofconventionalLC(Figure3.2(c))delaysnormalizedtoINV(FO=4)
delay (VDD = VDDH) for normal and subthreshold operations through HSPICE
simulation in PTM 90 nm CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3 Total energy per cycle with optimal VDDL for given VDDH and maximum corre-
sponding speed. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Energy saving with optimal VDDL for given VDDH (minimum energy operating
point) in ISCAS?85 benchmark circuits for PTM 90nm CMOS. . . . . . . . . . . 33
4.1 Delays of two optimal sized ALCs with a single INV load at VDDL = 230mV and
VDDH = 300mV in PTM 90nm CMOS. . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Multiple logic-level gate delays with a single INV load at VDDL = 230mV and
VDDH = 300mV in PTM 90nm CMOS (High PMOS Vth = 0.29V). . . . . . . . 42
4.3 Total energy per cycle with optimal VDDL for given VDDH and performance of
ISCAS?85 benchmark circuits and 32-bit ripple carry adder. . . . . . . . . . . . 49
5.1 The optimal VDDL and energy saving of c2670 at VDDH = 0.30V from MILP
solutions [35] for multiple-Vdd design without topological constraints in PTM
90nm CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
6.1 Energy saving and optimal VDDL from MILP [35] or slack-time based algorithm
for given VDDH in ISCAS?85 benchmark circuits in subthreshold region in PTM
90nm CMOS. Both algorithms produced identical result. . . . . . . . . . . . . . 74
6.2 Energy saving and optimal VDDL from MILP [35] and slack-time based algorithm
for ISCAS?85 benchmark circuit operating in nominal Vdd in PTM 90nm CMOS. 74
x
Chapter 1
Introduction
Ultra-low power applications such as micro-sensor networks, pacemakers, and many
portable devices require extreme energy constraint for long battery lifetime. Subthreshold
operation presents an opportunity for such energy-constrained applications with its very
low energy consumption [32, 62, 69, 76, 77, 84]. Subthreshold circuits offer a promising
solutionforimplementinghighlyenergy-constrainedsystemsinclockrangesoflowtomedium
frequencies for remote or mobile applications.
As the power supply voltage (Vdd) is scaled below the device threshold voltage (Vth),
the subthreshold current ever so slowly charges and discharges nodes for the circuit?s logic
function [76]. This weak driving current inherently limits the performance but minimum
energy operation of the circuit is achieved with reduced dynamic and leakage power, resulting
in long battery life [36, 37, 38].
In the past decades, subthreshold circuit design was not well recognized in the area of
digital circuits as high performance demand was a major concern. Lately, however, portabil-
ity has become a trend in the electronics marketplace. Low energy per operation is a primary
design parameter in such applications. Without the performance requirement, a subthresh-
old circuit can operate at its minimum energy operating point that is only slightly above
the absolute minimum voltage [81] that would guarantee the correct logic function. Even for
applications requiring high peak performance, ultra-dynamic voltage scaling (UDVS) [8] can
provide an opportunity for subthreshold circuit design that would switch between a nominal
voltage high performance mode and an energy efficient subthreshold mode according to the
system workload.
1
To support more features or long uninterrupted operation in energy constrained systems,
subthreshold circuit designers strive to further increase the performance or reduce the energy
consumption, as much as possible. These enhancements can be achieved by utilizing the
time slack in subthreshold circuits using the new design methodologies proposed in this
dissertation.
1.1 Motivation
Subthreshold circuit design is suitably applicable for emerging portable applications
that need tremendously low energy operation. The limitation of this technique is very slow
speed of operation due to the extremely scaled down supply voltage. Despite a very high
energy efficiency, the subthreshold design has been applied only in niche markets due to
its low performance. Depending upon the application, size, weight and cost can be equally
important as performance. Especially for remote, portable and mobile applications, low-
power has significance. Reduced power consumption makes the circuits lighter, reduces or
eliminates cooling subsystems, and reduces the weight and extends the life of the energy
source.
According to the available literature, most low-power techniques exploit time slack on
non-critical paths of a circuit to reduce power consumption without performance loss. These
techniques have been applied to circuits operating with the nominal supply voltage by sizing
device widths, using multi-Vth devices, or using multiple Vdd [64, 50, 79]. For subthreshold
circuits, the technique of sizing device width affects the correct logic function of CMOS (Com-
plementary Metal Oxide Semiconductor) circuits at low supply voltage [76]. The multi-Vth
technique does not adequately utilize the time slack in the subthreshold regime [4], because
semiconductor foundries normally provide standard cell libraries with two to three fixed Vth
values, namely, high Vth, standard Vth, and low Vth, for low-power design. Gate delay expo-
nentially depends on Vth in a subthreshold circuit. Therefore, we cannot utilize all possible
2
time slack on non-critical paths in a subthreshold circuit without further manipulation of
these device threshold voltages.
The multi-Vdd technique has been widely implemented for two supply voltages [41]. The
dual-Vdd design is best suited for exploiting the time slack in a subthreshold circuit as well.
Although the gate delay exponentially depends on Vdd in the subthreshold region it may be
possible to find an optimal lower supply voltage for the available time slack in the circuit.
A DC to DC voltage converter [57] will then allow the voltage management.
There are two scenarios for applying dual-Vdd design to subthreshold circuits in energy
constrained low-performance applications. Consider a digital circuit working in an absolutely
minimum energy consumption mode. The supply voltage for such an operation is known
to be in the subthreshold range [76]. First, we can further reduce the energy consumption
without changing the performance by assigning an extra lower supply voltage. The lower
voltage is supplied to gates on non-critical paths. Alternatively, the subthreshold circuit can
be sped up by several times by selecting two supply voltages, one of which is higher than
the optimal single Vdd. In this scenario, the dual-Vdd design retains the energy consumption
close to that of the minimum energy point but operates at a higher speed obtained by using
the higher supply for gates on critical paths.
1.2 Problem Statement
The aim of this dissertation is:
? Investigate the validation of dual-Vdd design for bulk CMOS subthreshold circuits.
? Develop new mixed integer linear programs (MILP) that automatically and optimally
assign gate voltages and maintain a wide range of speed requirements for a given circuit,
while minimizing the total energy per cycle.
? Develop new methods for dual-Vdd design using linear-time gate slack analysis to reduce
computation time for optimization.
3
1.3 Contribution of the Dissertation
In this dissertation, we propose a framework for finding the optimal dual-Vdd assignment
in subthreshold circuits to achieve minimum energy design. The minimum energy per cycle
operation with a very low single voltage in the subthreshold region is known [76]. We
further lower the energy per cycle below that point by using dual subthreshold supply.
Without a proper level converter for this mode, special considerations are used in the design
for eliminating or substituting the level converters that otherwise would have unacceptable
delay overhead. For a wide range of speed requirements, new mixed integer linear programs
(MILP) globally determine an energy-efficient circuit configuration by assigning an extra
supply voltage VDDL to gates on non-critical paths. This work could provide solutions for the
demands of either lower energy or higher performance in subthreshold design applications.
A subthreshold circuit is susceptible to process variation [20, 72], which affects the delay of
gates. We investigate the benefit of dual-Vdd design for reducing the delay variability of a
subthreshold circuit with process variation. To the best of our knowledge this work is the
first to present a dual-Vdd scheme for subthreshold logic circuits to achieve lower minimum
energy, which is an improvement over the known minimum energy operating point.
The new design procedure formulates mixed integer lineal programs (MILP) that, given
today?s computing capabilities, can deal with moderately large circuit complexity [19]. But,
the exponential time complexity of the MILP method for energy optimized circuits may not
be acceptable for modern VLSI (Very Large Scale Integration) systems. We propose a new
slack-time based algorithm to save computation time and obtain a nearly global solution
similar to that obtained by an MILP. The new technique is highly efficient and gives a
quality of solution very close to the MILP. The time complexity of the basic slack analysis
algorithm is linear in total number of gates, while the heuristic algorithms of dual-Vdd design
in the literature still have polynomial time complexity O(n2) [13]. The proposed method
of gate slack analysis can be applicable for other low-power design techniques to quickly
4
classify positive slack gates available for possible power-optimization in a large circuit. This
approach reduces the optimization effort and saves run-time of the algorithms.
1.4 Organization of the Dissertation
The dissertation is organized as follows. Chapter 2 briefly provides an overview of
subthreshold circuit design with a perspective of minimum voltage and minimum energy
operation.
Chapter 3 demonstrates a new MILP algorithm for minimum energy design using dual-
Vdd in the subthreshold regime. Unacceptable delay overhead of a level converter is avoided
in the optimized circuit by using topological constraints in the MILP.
In Chapter 4, we propose another new MILP algorithm for minimum energy design
with dual subthreshold supply and multiple logic-level gates. Multiple logic-level gates that
suppress DC leakage currents are inserted to remove topological constraints and further
improve the energy saving for the optimized circuit.
Chapter 5 investigates process variation effects on minimum energy design using dual
subthreshold supply. An optimized circuit shows more immunity to process variation with
technology scaling.
In Chapter 6, we propose a new slack-time based algorithm for dual-Vdd design. Gate
slack analysis is used to reduce the time complexity of the optimization process in the
minimum energy design.
Finally, the conclusion and ideas for the future advancement of this work are given in
Chapter 7.
5
Chapter 2
Overview of Subthreshold Circuit Design
In this chapter, we provide the fundamental aspects of subthreshold design for ultra-low
power circuits [76]. A description of subthreshold circuit properties as given here will be
helpful to illustrate our proposed methods in this dissertation.
2.1 Origin of Subthreshold Circuit Design
The MOS (Metal Oxide Semiconductor) transistor conducts current, majority carriers,
through an inverted channel between the source and drain caused by a nominal voltage
applied to the gate. When a low voltage is applied to the gate, majority carriers in the
substrate are repelled from the surface directly below the gate. Then, a depletion charge
of immobile atoms forms a depletion region beneath the gate. The minority carriers in the
depletion layer are made to move by diffusion and induce a drain current by applying a
voltage between the drain and source in the MOS device. This weak inversion current was
considered to be insignificantly small and ignored in digital circuit design until the recent
decade.
As is relevant to the electronic wrist watch design [74, 75], the properties of MOS
transistors have been investigated at a very low current level. The study uncovered an
unusual exponential relationship of the drain current with the gate voltage. Figure 2.1
shows the first measurement of drain current of an MOS transistor below the device threshold
voltage. This weak inversion current has been named the subthreshold current.
The early exploration of subthreshold design was focused on analog circuits such as
amplitude detector, quartz ring oscillator, bandpass amplifier, and transconductance ampli-
fier [29, 44, 73]. In the past years, subthreshold digital CMOS designs have been implemented
6
Figure 2.1: First measurement of an MOS transistor at very low current (annotated copy of
Vittoz?s notebook [75]).
for biomedical devices, FFT processors, and SRAMs [24, 32, 62, 77, 83, 43]. This unintended
discovery provides an opportunity for meeting the demands of extreme energy efficient sys-
tems.
2.2 Minimum Voltage Operation
In 1972, Swanson and Meindl built a revised charge based model for an inverter, con-
sidering the weak and strong mixed inversion region [66]. Previously, their model [49] only
considered both weak and strong inversion currents, but there was discontinuity in the model
at the point where two regions meet. The revised model was used to analyze the voltage
transfer characteristic (VTC) of the inverter that demonstrated operation down to 100mV,
as shown in Figure 2.2. The off-currents for PMOS and NMOS transistors were equated and
the gain of the inverter was calculated in the subthreshold region for finding the minimum
7
Figure 2.2: CMOS inverter voltage transfer characteristics (VTC) [66].
voltage. For sufficient gain at Vdd/2, the minimum voltage was considered as 8kT/q, or
200mV at room temperature, based on device parameters at that time. The term kT/q is
the thermal voltage (VT).
The ideal limit for lowest operable voltage was expected to be 2kT/q, or 57mV at room
temperature, in 2001 [6]. To achieve this ideal limit, the PMOS and NMOS device threshold
voltages in the inverter must be adjusted to ensure comparable off-currents for the two MOS
devices. Otherwise, minimum voltage larger than 2kT/q is needed to guarantee the correct
logic function. The circuits with very low supply voltages were successfully fabricated in
standard 1.5V 180nm CMOS technology.
8
Figure 2.3: Minimum voltage operation for 10%-90% output swing for a 0.18?m ring oscil-
lator [10].
Another approach for the minimum voltage limit was derived by balancing the threshold
voltages of PMOS and NMOS transistors [52]. The use of the proposed Vth matching scheme
reduces the lowest required supply voltage to 0.15V?0.30V for SRAM and enables CMOS
LSI minimum supply voltage at 0.1V.
At very low supply voltage, sizing of a transistor affects the functionality of CMOS
logic circuits. The minimum voltage operation (Vmin) occurs when the currents of PMOS
and NMOS devices are the same [61]. In Figure 2.3, the shaded region is the operational
region of a ring oscillator. The line of maximum Wp guarantees the output voltage of an
inverter for logic zero below 10% of Vdd. Large width of a PMOS device increases logic 0
level at the output from the subthreshold leakage through the PMOS device for a smaller
NMOS device. Conversely, the minimum Wp line shows the output voltage of the inverter for
logic 1 always maintains above 90% of Vdd. The output voltage of the inverter is reduced by
9
the subthreshold leakage through the larger NMOS device. The minimum voltage operation
occurs at the point where maximum Wp is equal to minimum Wp and maintains the 10%
to 90% output voltage swing. The ratio of the PMOS size to NMOS size is 12 for Vmin
in 0.18?m technology [10]. This ratio means that the subthreshold current of a unit width
NMOS transistor is 12 times larger than that of a unit width PMOS transistor by technology
imbalance.
Process variations affect the strength of the current for both devices [9]. To find mini-
mum voltage operation considering process variations, maximum Wp should be defined at the
worst case process corner, i.e., the strong PMOS and weak NMOS corner. For minimum Wp,
the worst case corner of the weak PMOS and strong NMOS should be considered. Minimum
energy operation of a circuit always occurs above Vmin for the correct logic function.
2.3 Minimum Energy Operation
The minimum energy operation point (Emin) for a digital circuit means that the circuit
consumes less Energy per cycle than any other point in the parameter space. Among the dif-
ferent parameters, power supply voltage (Vdd) and device threshold voltage (Vth) are mainly
considered for the minimum energy point. The energy and delay contours for a ring oscillator
circuit with varying Vdd and Vth show that Emin occurs in the subthreshold region [78].
For given Vdd and Vth, the minimum energy point for a circuit is determined by the
relationship between energy and latency. As Vdd scales down, dynamic energy is quadratically
reduced, while the delay of a circuit exponentially increases at supply voltages below Vth.
The increased delay induces an exponential increase of leakage energy. The minimum energy
point occurs where the magnitudes of dynamic energy and leakage energy are equal, as shown
in Figure 2.4.
The switching activity of a circuit affects its minimum energy point. When the dynamic
energy is decreased by reducing switching events, the leakage energy remains constant with
switching activity. Thus, the leakage energy contributes substantially more to the total
10
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
10?16
10?15
10?14
10?13
Vdd in volts
Energy per cycle (J)
Etot
Edyn
Eleak
Figure 2.4: Energy per cycle for an 8-bit ripple carry adder through HSPICE [27] simulation
in PTM 90nm CMOS, Emin = 3.29fJ at Vdd = 0.17V (Vth,pmos = -0.21V and Vth,nmos =
0.29V).
energy of a circuit. In that case, the minimum energy point occurs at higher supply voltages
compared to higher activity circuits. Adversely, higher switching circuits move the minimum
energy point to lower supply voltages to suppress the dynamic energy.
There are two representative minimum energy models in the literature. First, when the
operating frequency and technology of a subthreshold circuit are given, the minimum energy
model is derived to obtain the closed forms for optimal Vdd and Vth, respectively [7, 76].
This model uses fitting parameters normalized to a characteristic inverter for the given
technology, where the minimum sized inverter, for simplicity, is a good choice. All other
gates are normalized with respect to the inverter.
The delay of a characteristic inverter with output capacitance Cg is derived in sub-
threshold region as [51],
td = K ?Cg ?Vdd
Io,g exp
parenleftBigV
dd?Vth,g
mVT
parenrightBig (2.1)
11
where K is a delay fitting parameter, m is the subthreshold slope coefficient, and Io,g and
Vth,g are fitted parameters for the on-currents of a NMOS and PMOS transistor that are not
symmetrical.
The longest (critical) path delay of a circuit is obtained as,
TD = tdLDP (2.2)
where LPD is the logic depth of the longest path normalized to the characteristic inverter
delay.
Subthreshold leakage current is not the only component for the leakage of nanometer
CMOS transistors. But, the leakage energy mainly comes from subthreshold leakage in a
circuit operating in the subthreshold region. From this assumption, total energy per cycle
(Etot) and its components, dynamic energy (Edyn) and leakage enrgy (Eleak), are expressed
as,
Edyn = CeffV 2dd
Eleak = IleakVddTD
= WeffIo,g exp
parenleftbigg?V
th,g
mVT
parenrightbigg
VddtdLDP
= WeffKCgLDPV 2dd exp
parenleftbigg?V
dd
mVT
parenrightbigg
Etot = Edyn +Eleak
= V 2dd
parenleftbigg
Ceff +WeffKCgLDP exp
parenleftbigg?V
dd
mVT
parenrightbiggparenrightbigg
(2.3)
where Ceff is the average total switched capacitance for the circuit and Weff is the average
total width that contributes to the leakage current. The derivative of total energy with
respect to Vdd is given by
?Etot
?Vdd = 2CeffVdd +
parenleftbigg
2? VddmV
T
parenrightbigg
WeffKCgLDPVdd exp
parenleftbigg?V
dd
mVT
parenrightbigg
(2.4)
12
To solve for the optimal voltage (Vopt) for minimum energy, Equation (2.4) is set to zero and
an analytical solution for Vopt is obtained:
Vopt = mVT
parenleftbigg
2?lambertW
parenleftbigg ?2C
eff
WeffKCgLDP exp(2)
parenrightbiggparenrightbigg
(2.5)
The Lambert W function is subject to the constraint [16]:
?2Ceff
WeffKCgLDP exp(2) > ?exp(?1) (2.6)
For obtaining Vth,opt, the operating frequency for the circuit is given by
f = 1t
dLDP
(2.7)
and Equation (2.1) substitutes td for a given f:
Vth,opt = Vopt ?mVT ln
parenleftbiggfKC
gLDPVopt
Io,g
parenrightbigg
(2.8)
When the natural log argument exceeds 1, the circuit no longer operates in subthreshold re-
gion, Vth,opt < Vopt. This limits the maximum operating frequency for a subthreshold circuit.
From Equations (2.5) and (2.8), the energy optimal voltage and device threshold voltage
are determined for a given performance. For a given Vth with respect to the technology, the
energy optimal voltage is still determined by Equation (2.5) and the corresponding operating
frequency is given by Equation (2.7).
When Vdd reduces, the delay and leakage current of a circuit change simultaneously. The
leakage current reduces due to drain-induced barrier lowering (DIBL) effect, while the delay
increases exponentially in subthreshold regime. The leakage energy is the product of delay
and leakage current, but the delay induces the overall leakage energy increase. Figure 2.5
shows the trends of normalized td and Ileak for an inverter in the Predictive Technology
13
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.210
?4
10?2
100
102
104
Vdd in volts
Normalized I
leak
and t
d
td.nom
Ileak.nom
Eleak.nom
Figure 2.5: The delay and leakage current normalized to an inverter at Vdd = 1.2V through
HSPICE simulation in PTM 90nm CMOS.
Model (PTM) 90nm CMOS technology [85]. The normalized leakage energy, Eleak,nom, starts
to increase at the beginning of the subthreshold region.
Another minimum energy model is derived from an analytical expression for the energy
consumption of an n-stage inverter chain as a function of Vdd [81]. The total energy per cycle
of an n-stage inverter chain with switching activity ? is given by:
Etot = Edyn +Eleak
= ??n?Eswitch,inv +Pleak ?Td
= ??n?
parenleftbigg1
2 ?Cs ?V
2
dd
parenrightbigg
+ (n?Vdd ?Ileak)?(n?td)
= 12 ???n?Cs ?V 2dd +n?Vdd ?Ileak ?n? ?CsVdd2I
on
= 12nCsV 2dd ?
parenleftbigg
? +? ?n? IleakI
on
parenrightbigg
= 12nCsV 2dd ?
parenleftbigg
? +? ?n?e?
Vdd
mVT
parenrightbigg
(2.9)
14
Where, the symbols used in these expressions are listed below:
? n: number of inverter stages.
? Eswitch,inv: switching energy of an inverter.
? Pleak: total leakage power of the inverter chain.
? Td: delay of the inverter chain.
? Cs: total switched capacitance of an inverter.
? td: delay of an inverter.
? Ion: average on-current of an inverter in subthreshold region.
? ?: technology-dependent linear coefficient for the gap of inverter delay between actual
and step delay.
The energy optimal voltage is obtained by equating ?Etot/?Vdd = 0. From setting
u = ? ? n/? and t = Vdd/mVT, the minimum energy is achieved by the supply voltage Vdd
that satisfies the following equation:
et = u2 ?t?u (2.10)
Equation (2.10) is solved using curve-fitting to get the closed-form expression due to its
non-linear characteristic:
t = 1.587lnu?2.355 (2.11)
By replacing u and t with the original variables, the energy optimal voltage is finally obtained
as:
Vopt =
parenleftBig
1.587ln
parenleftBig
? ? n?
parenrightBig
?2.355
parenrightBig
?mVT (2.12)
The energy optimal voltage only depends on ? and m for technology trends. Also, Vth
does not affect the minimum energy and energy optimal voltage as seen in Equations (2.9)
15
and (2.12). The dependency of the leakage current and delay on Vth is the same, but opposite.
Therefore, the leakage energy is constant with different Vth values, not as Vdd as shown in
Figure 2.5, in subthreshold regime. The minimum energy and optimal voltage are strongly
determined by ? and n, which account for the relative amounts of dynamic and leakage
energies in the total energy, respectively.
For large complex circuits, Equation (2.9) is extended as follows:
Edyn = ??SHD ?Cw0 ?Wtot ?V 2dd
Eleak = Ileak ?Vdd ?Tc
= (? ?Wtot ?Ileak0)?V 2dd ?(nd ?td,FO4)
(2.13)
Etot = Edyn +Eleak
= Cw0WtotV 2dd
parenleftbigg
?SHD + 2? ?nd ?e?
Vdd
mVT
parenrightbigg
where the delay of an inverter with fanout of four (FO4) is given with Ion0, on-current of a
unit width inverter:
td,FO4 =
1
2 ?(4Winv ?Cw0)?Vdd
Winv ?Ion0 (2.14)
where,
? SHD: switching factor to model the hamming distance of inputs [21].
? Cw0: capacitance of a unit width transistor.
? Wtot: total width of transistors in a circuit.
? Tc: critical path delay of a circuit.
? ?: leaking factor to model leakage stack effect and input pattern dependency.
? Ileak0: leakage current of a unit width transistor.
16
Figure 2.6: Total Energy vs. Vdd for a 16?16 multiplier [81].
? nd: logic depth in terms of inverter delay with fanout of four.
As shown in Figure 2.6, the proposed total energy model is compared to SPICE sim-
ulation results for a 16?16 multiplier circuit, where the parameters used in the SPICE
simulation are SHD ? 0.55, ? ? 0.5, and nd ? 65. The switching activity for each block has
a different value. Thus, we should consider the switching activity difference across the entire
chip for minimum energy point. Low switching activity in a circuit corresponds to greater
logic depth with normal switching activity when Vdd is scaled down to achieve Emin.
17
Chapter 3
True Minimum Energy Design Using Dual Below-Threshold Supply Voltages
This chapter investigates subthreshold voltage operation of digital circuits. Operation in
the subthreshold voltage region has been long predicted and since verified [76]. To exploit the
time slack on non-critical paths, some designs use dual voltages within a circuit. Although
dual voltage operation for above threshold Vdd has been studied [11, 39, 65, 67, 68], below-
threshold dual voltages have not been examined until the work presented here. Utilizing
the time slack for dual-Vdd assignment can give valuable energy saving with small extra cost
in physical design. This results in circuit operation below the minimum energy point for a
single-Vdd circuit. Therefore, we call this the true minimum energy point.
We provide a framework for optimizing subthreshold circuits using dual-Vdd assignments
with given speed requirements, where the design procedure formulates mixed integer lineal
programs (MILP). In a dual-Vdd circuit, signal level converters are considered essential. Level
converters insert delays and consume power [54, 80]. In the absence of level converters,
certain interfaces become unsatisfactory. Especially, driving a high Vdd gate with a low
voltage signal presents problems of high leakage and long delay. We characterize the multi-
level interfaces and our MILP contains constraints to avoid the use of level converters.
3.1 Subthreshold Circuits
Before optimizing the minimum energy of subthreshold circuits by dual-Vdd assignments,
we briefly summarize the properties of subthreshold circuits in terms of functional operation
and failure, performance, and energy in this section.
18
0.01 0.02 0.04 0.06 0.1 0.2 0.4 0.6 0.8 1 1.2
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Vdd in volts
V out
/V
dd
INV with a single INV load
10 INV chain
100 INV chain
1000 INV chain
Figure 3.1: HSPICE [27] simulations for the output logic levels of inverter chains normalized
to nominal supply voltage, 1.2V, with scaling Vdd in PTM 90nm CMOS (INV: Wp = 5.5?Lg,
Wn = 2.4?Lg).
3.1.1 Minimum Operating Voltage
For the correct functional operation of a subthreshold logic circuit, the supply voltage
Vdd should be higher than a certain minimum voltage (Vmin). For bulk CMOS technology,
the theoretical Vmin is given as [48, 81],
Vmin = 2?VT ?ln
parenleftbigg
1 + Sln10?V
T
parenrightbigg
(3.1)
where VT = kT/q is the thermal voltage, k = 1.381 ? 10?23 J/K is Boltzmann?s constant,
T is absolute temperature in Kelvin, q = 1.602 ? 10?19 C is electronic charge and S is
the subthreshold swing. From [23], S is degraded with the downscaling trend of CMOS
technology, which means that the reduced ratio of on-current Ion at Vgs = Vds = Vdd to
off-current Ioff at Vgs = 0 and Vds = Vdd in subthreshold region (Vdd < Vth) causes smaller
noise margins and possible functional logic failures at or below Vmin. Figure 3.1 shows the
19
inverter chains work properly at lower supply voltages. The minimum operating voltage
of the inverter chains, 80mV, guarantees 10% to 90% output voltage swing. The increased
number of inverters in a chain slightly degrades Vmin, but the degradation is saturated.
Basically, this means that the logic 0 and 1 levels stabilize close to ground and supply
voltages, respectively, and do not continue to degrade with the depth of the circuit.
3.1.2 Delay
The delay of a gate in a subthreshold circuit can be simply formulated from the CMOS
gate delay equation [23],
td = K ?CL ?VddI
on
(3.2)
where K is a fitting parameter and CL is the load capacitance of the gate. If it is assumed
that total subthreshold current is equal to subthreshold drain current (Isub), we replace Ion
with Isub [76]
Isub = Io ?10
?Vgs?V
th+?Vds
S
?
?
parenleftbigg
1?e
?Vds
VT
parenrightbigg
(3.3)
where ? is the drain-induced barrier lowering (DIBL) coefficient and Io is the drain current
at Vgs = Vth in the weak inversion [58].
Io = ?o ?Cox ? WL ?(m?1)?V 2T (3.4)
?o is the zero bias electron mobility, Cox is the gate oxide capacitance, and m is the sub-
threshold slope coefficient.
When Vgs = Vds = Vdd ? VT (? 26mV at 300K), we get gate delay as,
td = K ?CL ?Vdd
Io ?10
?(?+1)V
dd?Vth
S
?. (3.5)
Thus, td is exponentially dependent on Vdd, Vth, ?, and S.
20
3.1.3 Energy
Energy per cycle of a circuit is a key parameter for energy efficiency in ultra-low power
applications. Because computing workload is characterized in terms of clock cycles, this
measure directly relates energy consumption to the workload. Before considering the energy
consumed by a circuit, we start by examining the total energy per cycle (Etot) of a single
gate, which is composed of dynamic energy (Edyn) and leakage energy (Eleak):
Edyn = ?0?1 ?CL ?V 2dd
Eleak = Pleak ?td
= Ioff ?Vdd ?td
= K ?CL ?V 2dd ?10?VddS
Etot = Edyn +Eleak
=
parenleftBig
?0?1 +K ?10?VddS
parenrightBig
?CL ?V 2dd
(3.6)
where ?0?1 is the low to high transition activity for the gate output node and Pleak is static
leakage power. Ioff is static leakage current and presented by (3.3) :
Ioff = Io ?10
??V
th+?Vds
S
?
Vds ? VT (3.7)
3.2 Dual-Vdd Scheme for Subthreshold Operation
Scaling Vdd down in circuits reduces both dynamic power and static leakage power
besides reducing the performance. To reduce power consumption without degrading per-
formance, a multi-Vdd technique exploits time slacks and lowers voltage VDDL for gates on
non-critical paths.
As shown in Figure 3.2(a), a clustered voltage scaling (CVS) algorithm [67] does not allow
the VDDL cells to feed directly into VDDH cells and so level converting is implemented inside
the filp-flop (LCFF) [28]. This topological limitation reduces full use of time slacks that
21
exist in a circuit. The extended clustered voltage scaling (ECVS) in Figure 3.2(b) eliminates
this constraint by inserting a level converter (LC) with each VDDL cell feeding into a VDDH
cell. ECVS gives better power saving than CVS but LC adds to power and delay overheads.
Without a level converter the low to high output transition delay of the second stage
inverter in Figure 3.3 is not affected by the input voltage swing VDDL from the previous stage,
because the delay of the pull-up PMOS is only dependent on its own power supply VDDH [59].
During the high to low output transition of the second inverter, the pull-down NMOS delay
is affected by both the input swing VDDL and the power supply VDDH. Therefore, lower
input swing reduces discharge current through the NMOS, which increases the pull-down
delay. Because the pull-up PMOS in the inverter could not be shut off completely by the
lower input swing level, severe DC current from the power supply VDDH induces higher static
leakage power consumption.
In subthreshold operation, the lower input swing exponentially increases the delay (3.5)
of the driven gate. We investigate the delay and leakage power penalty from lower input
swing voltage. For simplicity, we use only four types of cells, namely, INV, NAND2, NAND3
and NOR2, to synthesize example circuits. For cell characterization, all simulation results
are from HSPICE using the Predictive Technology Model (PTM) for 90 nm CMOS [85].
CMOS device threshold voltages are Vth,PMOS = 0.21V and Vth,NMOS = 0.29V at nominal
Vdd = 1.2V and room temperature (300K).
Various input and output configurations interfacing gates in dual Vdd assignments are
shown in Figure 3.4. Table 3.1 summarizes the delay and static leakage power for each case
where VDDH = 250mV and VDDL = 200mV such that the entire operation is in subthreshold
region. The difference between LL and HH delays shows that gate delay (3.5) is exponentially
sensitive to the power supply voltage, while Pleak has a smaller change.
In Table 3.1, as expected, due to smaller discharging time constants, HL delays for
NAND2 and NAND3 gates are lower than those for the LL configuration. However, that
is not the case for INV and NOR2 gates, which are faster in the LL configuration. This
22
FF LCFF
VDDH Cluster VDDL Cluster
(a) Clustered voltage scaling (CVS).
FF LCFF
LC
(b) Extended clustered voltage scaling (ECVS).
VDDH
VDDH
VDDL
VDDL
VDDHIN
OUT
(c) Level converter (LC).
Figure 3.2: Dual-Vdd schemes and level converter schematic [67, 68].
23
VDDL VDDH
VDDL
DC current
Discharge
current
Figure 3.3: A two-inverter chain without level converter.
Table 3.1: Measurement of a gate delay with a single INV load and static leakage power
in Figure 3.4 configurations at VDDH = 250mV and VDDL = 200mV through HSPICE
simulation for PTM 90 nm CMOS.
Gate delay, td (ns) Leakage power, Pleak (pW)
Gate (a) LL (b) HH (c) HL (d) LH (e) L-LC-H (a) LL (b) HH (c) HL (d) LH (e) L-LC-H
INV 2.81 0.83 2.98 2.70 255.04 30.9 46.2 22.8 126.2 260.8
NAND2 6.82 2.10 5.31 7.92 260.32 31.1 45.3 26.2 101.5 259.9
NAND3 9.72 3.04 7.31 11.17 264.16 53.1 75.6 49.0 135.5 290.2
NOR2 8.33 2.54 8.91 5.73 262.27 32.6 48.4 20.8 156.6 263.0
speed increase is due to a higher logic 0 level for the LL configuration in charging time. In
the case of leakage power for HL, all gates suppress the leakage current through the pull-up
PMOS (Vgs > 0) from the power supply. Severe increases of the delay and power in dual-Vdd
schemes are from LH, which is prohibited in CVS methodology and is allowed in ECVS with
LC. But, a common LC used for above-threshold in Figure 3.2(c) cannot be used due to its
unacceptable delay overhead, besides the power overhead.
From Table 3.2, the LC delay penalty in subthreshold operation is around 80 fanout-of-
four (FO4) inverter delays, which exceeds a clock cycle time of a pipelined microprocessor
(13-15 FO4 delays) or an ASIC processor (44 FO4 delays) [14]. A new LC design suitable
for subthreshold circuits may be needed but is out of the scope of the present work. In
the next section, we include additional constraints in the MILP that will not allow the LH
configuration (similar to CVS) for energy optimization.
24
IN OUT
VDDL
VDDLVDDL Gate
(a) LL: Low input swing driving a low Vdd gate.
IN OUT
VDDH
VDDHVDDH Gate
(b) HH: High input swing driving a high Vdd gate.
IN OUT
VDDL
VDDLVDDH Gate
(c) HL: High input swing driving a low Vdd gate.
IN OUT
VDDH
VDDHVDDL Gate
(d) LH: Low input swing driving a high Vdd gate.
VDDL Gate
VDDH
VDDH
LC
(e) L-LC-H: Low input swing driving a high Vdd gate through a level converter.
Figure 3.4: Driven gates and input swing levels.
25
Table 3.2: Comparison of conventional LC ( Figure 3.2(c) ) delays normalized to INV(FO=4)
delay (VDD = VDDH) for normal and subthreshold operations through HSPICE simulation
in PTM 90 nm CMOS.
Normal Subthreshold
Gate delay VDDH = 1.2V VDDH = 300mV
VDDL = 0.8V VDDL = 250mV
INV(FO=4) 23.64 ps 1.52 ns
LC 112.33 ps 121.86 ns
LC norm. to INV(FO4) 4.8 80.2
3.3 MILP for VDDL Assignment
In this section, we design minimum energy circuits with dual-Vdd assignments using
mixed integer linear programming (MILP) [19]. First, the optimal (i.e., minimum energy
per cycle) supply voltage (Vopt) for a single Vdd operation is determined. The critical path
delay (or clock cycle time) of this design is used as the timing requirement for the dual
voltage design. Thus, the MILP automatically applies higher supply voltage VDDH = Vopt to
gates on critical paths to maintain the performance and finds an optimal lower supply voltage
VDDL assigned to gates on non-critical paths to reduce the total energy consumption by a
global optimization considering all possible VDDL. This differs from the backward traversal
CVS heuristic algorithms that tend to be non-optimal. Note that more paths now may have
delays that are either equal or close to the critical path delay.
Let Xi be an integer variable that is 0 for VDDH or 1 for VDDL for the power supply
assignment of gate i. Let Tc be a predetermined critical path delay for the circuit. The
optimal minimum energy voltage assignment problem is formulated as an MILP model:
Minimize
summationdisplay
i ? all gates
bracketleftbigg
Etot,VDDL,i ?Xi +Etot,VDDH,i ?(1?Xi)
bracketrightbigg
(3.8)
Etot,i for VDDL and VDDH are given by (3.6)
Etot,i = ?i ?CL,i ?V 2dd,i +Pleak,Vdd,i ?Tc (3.9)
26
Figure 3.5: Topological constraints.
Subject to timing constraints:
td,i = td,VDDL,i ?Xi +td,VDDH,i ?(1?Xi) ?i ? all gates (3.10)
Ti ? Tj +td,i ?j ? all fanin gates of gate i (3.11)
Ti ? Tc ?i ? all primary output gates (3.12)
Subject to topological constraints:
Xi ?Xj ? 0 ?j ? all fanin gates of gate i (3.13)
In above constraints, Ti is the latest arrival time at the output of gate i corresponding to a
primary input event [55, 56]. As mentioned in Section 3.2, the unacceptable delay penalty
of asynchronous LC prohibits its use in a dual-Vdd scheme in the subthreshold region. The
MILP model does not allow a VDDL cell to drive a VDDH cell as its fanout gate on account
of topological constraint (3.13) as shown in Figure 3.5. Thus, the LH configuration of
27
Figure 3.6: Simulation setup.
Figure 3.4(d) never occurs in the optimized circuit. Within the given timing constraint Tc,
originally obtained for the best energy per cycle for single subthreshold VDDH operation, the
MILP searches for the best VDDL such that the energy per cycle is further reduced to a true
minimum.
3.4 Simulation Results
As mentioned before, we use only four basic cells (INV, NAND2, NAND3 and NOR2)
for synthesizing two example circuits, a 16-bit ripple carry adder and a 4 ? 4 multiplier,
and ISCAS?85 benchmark circuits in PTM 90nm CMOS technology. The delay, capacitance
and average leakage power of these four basic cells are characterized for the MILP model by
scaling Vdd with a 10mV resolution in HSPICE simulations.
Switching activity ? is the average number of low to high transitions at circuit nodes,
which is calculated using a logic simulator with randomly generated input vectors. These
randomly generated input vectors are the same as input signal vectors to the circuit for
HSPICE simulation to measure energy consumption.
As shown in Figure 3.6, our example circuit, embedded in a test bench, is driven by
randomly generated high input swing flip-flops. Two subthreshold voltages may be provided
28
by a DC to DC voltage converter [57, 77, 41]. The energy per cycle measurement is for the
combinational circuit, excluding flip-flops.
From Figure 3.7(a), the minimum energy point for a 16-bit ripple carry adder with
an activity factor ? = 0.21 is 9.65fJ at Vdd = 0.21V. The clock frequency was found to
be 2.15MHz. With dual Vdd assignments the optimized circuit with VDDH = 0.21V and
VDDL = 0.14V reduces the energy per cycle by up to 23.6% retaining the same performance.
This energy reduction is shown by the downward arrow in Figure 3.7(b).
Consider again the minimum energy per cycle (9.65fJ) operation of the 16-bit ripple-
carryaddercircuitwithasinglesubthresholdvoltage0.21Vandaclockfrequencyof2.15MHz.
In an alternative design, we may hold the minimum energy constant and improve the per-
formance.
From the MILP results in Table 3.3, we find that operation with two supply voltages
0.27V (VDDH) and 0.19V (VDDL) consumes 9.42fJ, which is just under the minimum energy
but has a clock frequency 8.41MHz. This, as shown by the right arrow in Figure 3.7(b), has
about 4X speed improvement.
As a worst case example, a path balanced 4?4 multiplier reduces the energy per cycle
to 5% below the minimum energy point with VDDH=0.17V and VDDL=0.12V, where the
performance is not degraded. For better performance, the 4 ? 4 multiplier can operate at
1.67MHz from a clock frequency 1MHz on minimum energy with single-Vdd, where two supply
voltages 0.19V (VDDH) and 0.13V (VVDDL) are provided and minimum energy increases
slightly.
Two example circuits using dual-Vdd show that performance improves largely for a circuit
with large positive slack. Figure 3.8 (a) and (b) illustrate gate slack distribution of a 16-bit
ripple carry adder and a 4?4 multiplier, respectively, for single and dual Vdd (Optimized)
design at the minimum energy point.
Table 3.3 summarizes HSPICE simulations giving the total energy per cycle for the
single voltage Vdd = VDDH reference and the optimized dual voltage Vdd = {VDDH, VDDL}
29
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2
10?16
10?15
10?14
10?13
10?12
Vdd in volts
Energy per cycle (J)
Etot
Edyn
Eleak
(a) Energy per cycle for single Vdd.
0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3
0.8
1
1.2
1.4
1.6
1.8
2 x 10
?14
Vdd in volts
Energy per cycle (J)
Single Vdd
Dual Vdd
This Work
(b) Energy per cycle for single and dual subthreshold supply voltages.
Figure 3.7: Energy per cycle for a 16-bit ripple carry adder for single-Vdd and dual-Vdd in
subthreshold region, activity factor ? = 0.21, PTM 90nm CMOS.
30
0 20 40 60 80 100 120 140 160 1800
20
40
60
Number of gates
Slack time (nsec)
0 20 40 60 80 100 120 140 160 1800
20
40
60
Slack time (nsec)
Number of gates
Optimized
(a) 16-bit ripple carry adder at VDDH = 0.21V and VDDL = 0.14V.
0 50 100 150 200 250 300 3500
20
40
60
Number of gates
Slack time (nsec)
0 50 100 150 200 250 300 3500
20
40
60
Slack time (nsec)
Number of gates
Optimized
(b) 4?4 multiplier at VDDH = 0.17V and VDDL = 0.12V.
Figure 3.8: Gate slack distribution (number of gates vs. slack) of a 16-bit ripple carry
adder and a 4?4 multiplier for single-Vdd (= VDDH) and dual-Vdd (= VDDH, VDDL) at the
minimum energy point; slacks obtained by static timing analysis using gate delays for PTM
90nm CMOS.
31
Table 3.3: Total energy per cycle with optimal VDDL for given VDDH and maximum corre-
sponding speed.
16-bit ripple carry adder (? = 0.21, total gates = 176) 4?4 multiplier (? = 0.32, total gates = 140)
VDDH VDDL VDDL Etot,single Etot,dual reduction Freq. VDDL VDDL Etot,single Etot,dual reduction Freq.
(V) (V) gate # (fJ) (fJ) (%) (MHz) (V) gate # (fJ) (fJ) (%) (MHz)
0.10 0.09 108 19.40 17.52 9.7 0.13 0.09 18 13.78 13.35 3.1 0.16
0.11 0.09 106 17.55 14.64 16.6 0.17 0.09 18 12.44 11.80 5.1 0.21
0.12 0.10 106 15.83 13.38 15.5 0.22 0.10 18 11.41 10.85 4.9 0.27
0.13 0.10 101 14.31 11.51 19.6 0.28 0.10 15 10.61 10.08 5.0 0.35
0.14 0.11 101 13.00 10.58 18.6 0.37 0.11 15 10.04 9.56 4.8 0.46
0.15 0.11 99 11.92 9.27 22.3 0.48 0.11 15 9.69 9.13 5.8 0.60
0.16 0.12 99 11.14 8.73 21.6 0.62 0.12 15 9.51 8.98 5.6 0.78
0.17 0.12 95 10.52 7.99 24.0 0.80 0.12 13 9.48 8.99 5.2 1.00
0.18 0.13 95 10.04 7.73 23.0 1.02 0.13 13 9.59 9.11 5.0 1.30
0.19 0.13 88 9.72 7.42 23.6 1.32 0.13 13 9.74 9.19 5.6 1.67
0.20 0.14 88 9.66 7.45 22.9 1.68 0.14 13 10.21 9.65 5.5 2.14
0.21 0.14 84 9.65 7.37 23.6 2.15 0.15 13 10.66 10.08 5.4 2.73
0.22 0.15 84 9.73 7.49 23.1 2.72 0.15 12 11.06 10.60 4.2 3.46
0.23 0.16 84 10.06 7.80 22.5 3.44 0.16 12 11.83 11.24 5.0 4.37
0.24 0.17 84 10.40 8.14 21.8 4.33 0.17 12 12.53 11.93 4.8 5.50
0.25 0.18 84 10.78 8.48 21.3 5.43 0.18 13 13.28 12.61 5.0 6.87
0.26 0.18 78 11.31 8.91 21.2 6.77 0.19 13 14.14 13.43 5.0 8.55
0.27 0.19 78 11.87 9.42 20.7 8.41 0.19 12 15.03 14.30 4.9 10.60
0.28 0.20 78 12.49 9.97 20.2 10.39 0.20 12 15.98 15.22 4.8 13.06
0.29 0.22 88 13.16 10.52 20.1 12.79 0.21 12 16.98 16.19 4.7 16.02
0.30 0.23 88 13.88 11.16 19.6 15.65 0.22 12 18.03 17.21 4.5 19.54
Average 20.5 4.9
circuits. Voltages vary from 0.1V to 0.3V. Both single and dual Vdd circuits have the same
speed because all gates on critical paths have the same VDDH for either circuit.
The energy savings at minimum energy operating points using dual-Vdd are obtained
from HSPICE simulations for ISCAS?85 benchmark circuits, as shown in Table 3.4. The
optimized c880 (an 8-bit ALU) shows 22.2% energy saving as the best case. The energy
saving for c6288 (a 16?16 multiplier) is only about 2.1%. Gate slack distribution is shown
for c880 and c6288, respectively, in Figure 3.9.
Logic function failure occurs at 0.08V in NAND3, so the possible lowest VDDL assign-
ment in MILP optimization is 0.09V. This minimum operating voltage guarantees 10% to
90% output voltage swing for all four cells in the full range of operational voltages used.
Figure 3.10 shows sample signal waveforms from an optimized 16-bit ripple carry adder cir-
cuit for VDDH = 0.11V and VDDL = 0.09V. This has VDDL assigned to cells on a non-critical
path that leads to the least significant sum bit (s1). The output flip-flop (s1q) holds correct
signal values at the minimum operating voltage on positive clock edges.
32
Table 3.4: Energy saving with optimal VDDL for given VDDH (minimum energy operating
point) in ISCAS?85 benchmark circuits for PTM 90nm CMOS.
Benchmark Total Activity VDDH VDDL VDDL Esingle Edual Ereduc. Freq.
circuit gates ? (V) (V) gates (%) (fJ) (fJ) (%) (MHz)
c432 154 0.19 0.25 0.23 5.2 7.9 7.8 1.1 14.4
c499 493 0.21 0.22 0.18 9.7 20.2 19.8 2.0 11.9
c880 360 0.18 0.24 0.18 46.4 14.4 11.2 22.2 13.6
c1355 469 0.21 0.21 0.18 10.2 19.5 19.0 2.5 9.8
c1908 584 0.20 0.24 0.21 24.3 26.5 25.0 5.8 11.8
c2670 901 0.16 0.25 0.21 46.4 32.8 28.0 14.8 17.4
c3540 1270 0.33 0.23 0.14 7.0 88.0 84.6 3.8 7.2
c5315 2077 0.26 0.24 0.19 47.1 116.8 98.0 16.1 9.8
c6288 2407 0.28 0.29 0.18 2.7 165.4 162.0 2.1 9.4
c7552 2823 0.20 0.25 0.21 42.3 131.7 117.1 11.1 13.6
Average 24.1 8.2
When VDDH is 100mV, it is approaching the lower end of its range beyond which the
circuit would fail to operate. The MILP now has limited choices for a solution and gives a
VDDL that provides smaller energy saving. The 16-bit ripple carry adder has better energy
reduction because it can utilize more time slack from non-critical paths compared to the 4?4
multiplier with more balanced paths. The gate delay in subthreshold operation increases
exponentially with reducing supply voltage, which forces the optimal VDDL close to VDDH.
Even though the MILP model only allows HL configuration and eliminates the use of
LC for a dual Vdd circuit block, level conversion may be needed at outputs to match signal
levels across block to block connections of a system. The differential cascode voltage switch
(DCVS) based level converter of a normal standard cell library in Figure 3.2(c) is not suitable
for dual subthreshold design due to its huge delay penalty. Realizing that the design of LC
for ultra low voltage is an open problem, our design refrains from using level converters
while taking the penalty of energy saving into account. For level converting, we always
assign VDDH to primary output (PO) gates before the output flip-flops at multiple voltage
boundaries between circuit blocks. The PO gates driven by VDDL cells are found to correctly
execute their logic functions if, for a given VDDH, VDDL is bounded as shown in Figure 3.11.
33
0 10 20 30 40 50 60 700
10
20
30
40
Number of gates
Slack time (nsec)
0 10 20 30 40 50 60 700
10
20
30
40
Slack time (nsec)
Number of gates
Optimized
(a) c880 at VDDH = 0.24V and VDDL = 0.18V.
0 20 40 60 80 1000
50
100
150
200
250
300
Number of gates
Slack time (nsec)
0 20 40 60 80 1000
50
100
150
200
250
300
Slack time (nsec)
Number of gates
Optimized
(b) c6288 at VDDH = 0.29V and VDDL = 0.18V.
Figure 3.9: Gate slack distribution of c880 and c6288 for single-Vdd and dual-Vdd at the
minimum energy point in PTM 90nm CMOS.
34
t(sec)
100u 150u 200u 250u
(V)
0.0
20.0m
40.0m
60.0m
80.0m
0.1
(V)
0.0
20.0m
40.0m
60.0m
80.0m
0.1
(V)
0.0
50.0m
0.1
0.15
clk
s1
s1q
Figure 3.10: Output signal waveforms of s1 and s1q in a 16-bit ripple carry adder at minimum
operating voltage, VDDL = 0.09V, in HSPICE simulation, PTM 90nm CMOS.
0.1 0.12 0.14 0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.3
0.08
0.1
0.12
0.14
0.16
0.18
0.2
VDDH in volts
Lowest V
DDL
in volts
INV
NAND2
NAND3
NOR2
VDDL limit
Figure 3.11: VDDL bound for given VDDH with LH configured cells.
35
This lowest possible VDDL raises the minimum operating voltage for the dual voltage
optimized circuit block. The optimal VDDL in the MILP model can be higher than its true
optimal value to suppress DC leakage power of the LH configured PO gates. Using two small
example circuits, a 16-bit ripple-carry adder and a 4 ? 4 multiplier show average reduced
energy savings of 11.9% and 2.6%, respectively. The penalty of energy saving from level
converting may be negligible for a large system in which most blocks would operate at VDDL
and only a few need VDDH.
3.5 Summary
In this chapter, we first introduced dual-Vdd design for a bulk CMOS subthreshold cir-
cuit [35]. Some applications in the market may need minimum energy consumption without
a performance concern. This work could solve those design problems. For a wide range
of speed requirements, the MILP determines globally the energy optimized circuit by as-
signing the optimal VDDL to gates on non-critical paths. A 16-bit ripple carry adder shows
on average 20.5% reduced energy consumption, while maintaining same performance as the
original single Vdd circuit. The worst case example of a 4?4 multiplier still gives on average
4.9% reduction. Further, allowing a small amount of increase in the energy consumption
can significantly speed-up the subthreshold operation of a logic circuit. The methodology of
dual Vdd assignment is valid for substantial speed-up without energy increase, as well as for
energy reduction below the minimum achievable in a single voltage circuit.
The proposed MILP algorithm is not restricted to subthreshold operation alone. When
a higher performance, impossible to achieve in the subthreshold region, is required we would
then obtain two above-threshold voltages that will satisfy the performance criteria and min-
imize the energy per cycle. There may be potential for greater energy saving as circuit size
increases due to larger critical path delay leading to greater slack for many gates. The pro-
cess variation of the device threshold voltage (Vth) can seriously affect a subthreshold voltage
design and this will be studied for nanometer technologies later. Higher leakage technologies
36
display higher speed in the subthreshold region because the logic operation relies on leakage
currents. These aspects of dual-Vdd design in subthreshold region are worth exploring.
37
Chapter 4
Minimum Energy CMOS Design with Dual Subthreshold Supply and Multiple Logic-Level
Gates
Some energy constrained applications that require moderate speed may not aggressively
scale the supply voltage down to the minimum energy point to maintain the performance.
Small energy increase from the absolute minimum energy point of a subthreshold circuit can
notably improve performance. Near-threshold operating circuit design is another choice to
cover a wider range of system performances for applications with tolerable energy increase
(?2X) from Emin by scaling Vdd to near Vth [18, 47, 30]. Technology down-scaling improves
the speed of a subthreshold circuit, but greater variability may adversely affect Emin for
extremely small feature size [5].
In Chapter 3, the presented MILP limits full use of the time slack by topological con-
straints considering multiple voltage boundaries without level converters. Thus, the energy
saving of dual Vdd design is not as much as expected. We are motivated to exploit full time
slack on non-critical paths in a subthreshold circuit using multiple logic-level gates to further
reduce Emin at its original speed or alternatively have the circuit operate at a higher speed
holding the energy consumption close to Emin.
Figure 4.1 shows the benefit of dual voltage design for a 32-bit ripple carry adder in 90nm
CMOS technology operating in the subthreshold regime. Energy per cycle for the optimized
dual voltage design (Edual) is reduced ?0.67X from Emin that is obtained by scaling down
a single supply voltage to its minimum energy operating point at Vdd=0.31V. This 32-bit
ripple carry adder can also operate ?7X faster with same energy as Emin in another dual
voltage design using Vdd=0.45V. Finding an optimal lower supply voltage (VDDL) for a given
higher supply voltage (VDDH) and its assignments is the main problem in dual voltage design.
38
107 108
1
1.5
2
2.5
3
3.5
4
x 10?14
Freq.[Hz]
Energy per cycle (J)
Single?Vdd E
Vdd = 0.45V
Single?Vdd Emin
Vdd = 0.31V
Dual?Vdd Edual
VDDH = 0.45V
VDDL = 0.30V
~7.17X~0.67X
Dual?Vdd Edual
VDDH = 0.31V
VDDL = 0.18V
Figure 4.1: Energy and speed benefits of dual Vdd design in subthreshold voltage operation
for a 32-bit ripple carry adder through HSPICE simulation in PTM 90nm CMOS (activity
factor ? = 0.17, number of gates = 352).
We formulate a mixed integer linear program (MILP) to solve this problem with multiple
logic-level gates considering multiple voltage boundaries.
4.1 Operation of Conventional Level Converters in Subthreshold Regime
In a dual-Vdd design, assigning lower supply voltage (VDDL) only to gates on non-critical
paths reduces both dynamic and static leakage power of the circuit. Higher supply voltage
(VDDH = Vdd) is assigned to gates on critical paths to maintain the overall circuit perfor-
mance. By utilizing the time slack, we ensure that there is no performance loss. But, an
asynchronous level converter (ALC) is considered essential to suppress DC leakage current
and guarantee the correct switching of a VDDH gate driven by a low voltage input signal.
Level converting cost, however, reduces the power saving of the dual-Vdd scheme.
Clustered voltage scaling (CVS) [67] assigns VDDL to gates with positive time slack
starting from primary outputs to primary inputs and so does not allow the VDDL gates to
39
feed directly into VDDH gates by grouping gates into VDDH and VDDL clusters. VDDH cluster
is always located upstream as signals flow. This topological constraint reduces the potential
power saving from full use of the time slack that exists inside a circuit. Asynchronous
level converters are not needed inside a combinational circuit block, but the level converting
flip-flops (LCFF) are needed in sequential elements [28]. No overheads of power and delay
from ALCs exist in CVS. For removing the topological constraint in CVS, extended clustered
voltage scaling (ECVS) [68] inserts an ALC at a point, where a VDDL gate drives a VDDH
gate, to assign VDDL to more gates with time slack. This gives more power saving than CVS.
We apply the dual voltage technique to subthreshold supply combinational circuits. To
maximize energy saving from the time slack, a level converter is still considered essential. In
Figure 4.2, two traditional ALCs, a differential cascode voltage switched (DCVS) level con-
verter and a pass gate (PG) level converter, are shown. The PG level converter consumes less
energy than the DCVS level converter due to fewer devices in it and reduced contention [40].
Compared to the delay of a circuit operating with nominal Vdd, the delay of a subthreshold
circuit increases exponentially as supply voltage Vdd reduces [76]. This means that the time
slack is consumed quickly by assigning VDDL, quite close to VDDH, to gates on non-critical
paths. With such delay characteristic, the delay overhead of the ALC is more critical for
implementing a dual-Vdd design in the subthreshold regime.
We use the HSPICE simulator [27] to size properly for reducing the delay of two ALCs in
subthreshold region. Predictive Technology Model (PTM) for 90 nm CMOS [85] was used in
the simulations. Table 4.1 shows the delay penalty of the two optimized ALCs in a range of
28? 60? INV(FO4) delays, where INV(FO4) is the delay of a standard inverter with fanout
of four. The normal ALC delay is considered as 2? INV(FO4) delays [17] for a nominal
supply voltage. A low voltage microprocessor has ? 400? INV(FO4) delays for a single
pipeline stage. A microprocessor operating in subthreshold region would prefer a shallow
pipeline to mitigate variability and a 40? INV(FO4) delay is considered as a typical design
40
(a) Differential cascode voltage switched (DCVS) level converter.
(b) Pass gate (PG) level converter.
Figure 4.2: Two traditional level converter schematics [40].
41
Table 4.1: Delays of two optimal sized ALCs with a single INV load at VDDL = 230mV and
VDDH = 300mV in PTM 90nm CMOS.
ALCs Delay Norm. to INV(FO4)
DCVS 79.1 ns 60.4
PG 37.6 ns 28.7
Table 4.2: Multiple logic-level gate delays with a single INV load at VDDL = 230mV and
VDDH = 300mV in PTM 90nm CMOS (High PMOS Vth = 0.29V).
Multiple logic-level gates Delay Norm. to INV(FO4)
INV 1.3
NAND2 2.3
NAND3 3.1
NOR2 3.9
case [63]. To reduce the delay penalty of level converting, we need to investigate alternative
approaches to remove ALCs without topological constraints in the dual-Vdd design.
As discussed in the literature, two types of logic gate designs have the capability to
handle multiple logic levels. Among these the embedded logic level converting circuit [40]
may not be a good choice because the previous ALC structures, when integrated with logic
gates, will not reduce the overall delay penalty. A level-shifter free design using dual Vth [17]
places high Vth devices in the pull-up PMOS network of a logic gate to suppress DC static
leakage with low input signals, as shown in Figure 4.3. This causes the rise time of the gate
to increase, thus the overall level shifting logic gate delay is larger than that of a normal gate
(PMOS Vth = 0.21). As shown in Table 4.2, the delay penalty of these multiple logic-level
gates is much less than that of standard ALCs in the subthreshold region. Within some range
of low input voltages close to Vdd, a multiple logic-level INV consumes less leakage power
than a standard INV. This leakage power increases as the low input voltage goes down in
Figure 4.4. Considering the delay and power overheads, we are compelled to use the multiple
logic-level gates instead of ALCs in our dual voltage design.
42
Figure 4.3: Multiple logic-level NAND2 gate [17].
0.16 0.18 0.2 0.22 0.24 0.26 0.28 0.30
0.5
1
1.5
2
2.5
3
3.5
4
Vin in volts
Normalized leakge power
Multiple logic?level INV
Multiple logic?level NAND2
Multiple logic?level NAND3
Multiple logic?level NOR2
Figure 4.4: Multiple logic-level gate leakage power normalized to a standard INV (Vdd=Vin
= 300mV) in PTM 90nm CMOS.
43
4.2 MILP for Dual Voltage Design with Multiple Logic-Level Gates
In this section, we design minimum energy circuits with dual-Vdd assignments without
ALCs using mixed integer linear programing (MILP) [19]. Multiple logic-level logic gates
eliminate the use of ALCs and allow VDDL gates to drive VDDH gates with affordable over-
heads in terms of delay and leakage power in a combinational circuit. First, the performance
requirement (critical path delay Tc) of a system is given. Therefore, VDDH is determined to
satisfy the system speed (or clock cycle time). The MILP automatically assigns the predeter-
mined VDDH to gates on critical paths to maintain the performance and finds optimal VDDL
for gates on non-critical paths to reduce the total energy consumption (i.e., minimum energy
per cycle) by a global optimization. Inherently, CVS and ECVS are heuristic algorithms that
tend to be non-optimal, because of the backward traversal from primary outputs through
gates with time slack for assigning lower supply voltage VDDL.
Assuming that gates become active once per clock cycle, the total energy per cycle (Etot)
is given by following equations [76]:
Edyn = ?0?1 ?Cload ?V 2dd
= Csw ?V 2dd
Eleak = Ioff ?Vdd ?Tc
= Pleak ?Tc
Etot = Edyn +Eleak
= Csw ?V 2dd +Pleak ?Tc
(4.1)
where ?0?1 is the low to high transition activity for the gate output node and Cload is the
load capacitance of the gate. In (4.1), dynamic energy (Edyn) quadratically depends on
scaling the power supply voltage Vdd with the total switched capacitance Csw of a circuit,
while the leakage energy (Eleak) is linearly proportional to leakage power Pleak during a clock
cycle.
44
Before we formulate the MILP model of the optimal minimum energy VDDL assignment,
all variables and constants used in the MILP model are listed:
? Vv: supply voltage integer variable that is 1 for two selected VDDH and VDDL in a span
of scaling supply voltage v.
? Xi,v: voltage assignment integer variable that is 1 for gate i with supply voltage v.
? Fi,v: fan-in integer variable that is 1 for gate i having at least one fan-in gate that is
powered by supply voltage v.
? Pi,v: penalty integer variable that is 1 when gate i driven by low input voltage v.
? Ti: latest arrival time variable at gate i output from primary input events.
? ?i: low to high transition activity of gate i.
? Vdd,v: supply voltage value of v.
? Ci,v: load capacitance of gate i with supply voltage v.
? Pleak,i,v: leakage power of gate i with supply voltage v.
? Pleako,i,v: leakage power overhead of multiple logic-level gate i driven by low input
voltage v.
? tdi,v: gate delay of gate i with supply voltage v.
? tdoi,v: gate delay overhead of multiple logic-level gate i driven by low input voltage v.
? Ni: number of inputs for gate i.
? Tc: critical path delay of a circuit.
? Gtot: total number of gates in a circuit.
? Vnom: nominal supply voltage value (1.2V) for 90nm CMOS.
45
The optimal VDDL assignment for the minimum energy design is modeled by MILP
equations:
Minimize
bracketleftbiggsummationdisplay
i
summationdisplay
v?V
parenleftbig?
i ?Ci,v ?V 2dd,v +Pleak,i,v ?Tc
parenrightbig?X
i,v
+
summationdisplay
i
summationdisplay
v?VL
Pleako,i,v ?Tc ?Pi,v
bracketrightbigg
, ?i ? all gates
Vmin ? V ? VDDH, Vlow ? VL < VDDH
(4.2)
where Vmin is the minimum operating voltage for the correct logic function of a gate with
subthreshold supply voltage and Vlow is the lowest input voltage to keep 10% to 90% out-
put voltage swing for a logic gate when VDDH is predetermined. The timing constraints
are [55, 56]:
Ti ? Tj +
summationdisplay
v?V
tdi,v ?Xi,v +
summationdisplay
v?VL
tdoi,v ?Pi,v
?i ? all gates, ?j ? all fanin gates of gate i (4.3)
Ti ? Tc ?i ? all primary output gates (4.4)
46
Penalty condition:
summationdisplay
j
Xj,v ? Ni ?Fi,v ?j ? all fanin gates of gate i
summationdisplay
j
Xj,v ? Ni ?Fi,v ?(Ni ?1) ?i ? all gates, ?v ? VL
(4.5)
Fi,v +Xi,VDDH ? 2?Pi,v ?i ? all gates
Fi,v +Xi,VDDH ? 2?Pi,v + 1 ?v ? VL (4.6)
summationdisplay
v?V
Vdd,v ?Xi,v ?
summationdisplay
v?V
Vdd,v ?Xj,v +
summationdisplay
v?VL
Vnom ?Pi,v
?j ? all fanin gates of gate i
(4.7)
Dual supply voltages selection:
summationdisplay
v?V
Vv = 2 (4.8)
VVDDH = 1 (4.9)
summationdisplay
v?V
Xi,v = 1 ?i ? all gates (4.10)
summationdisplay
i
Xi,v ? Gtot ?Vv ?i ? all gates, ?v ? V (4.11)
As mentioned before, Tc is given by the performance requirement. Therefore, VDDH is
selected from (4.9) in scaling supply voltage span. In dual power supply constraints, MILP
only chooses two supply voltages, given VDDH and optimal VDDL, then each gate in the
circuit must be assigned to one of them from (4.11); we use a bin-packing technique [1].
Penalty condition tests the existence of a VDDH gate driven by at least one VDDL fan-in gate
from (4.5) (Boolean Or) and (4.6) (Boolean AND). The non-linear Boolean functions are
expressed as linear constraints. When penalty exists, Pi,VDDL becomes 1 and (4.7) allows low
voltage inputs to drive a VDDH gate by replacing it with a multiple logic-level gate. When
47
assigning VDDL to the time slack gate, MILP checks the timing violation against clock time
using (4.3) and (4.4) timing constraints. Cost function (4.2) favorably balances both delay
and leakage penalties of the multiple logic-level gates.
4.3 Simulation Results
All simulation results are from HSPICE using PTM 90nm CMOS at room temperature
(300K). The CMOS device threshold voltages are Vth,pmos = 0.21V and Vth,nmos = 0.29V at
nominal Vdd = 1.2V. For simplicity, we use only four types of basic standard cells, namely,
INV, NAND2, NAND3, and NOR2, to synthesize ISCAS?85 benchmark circuits. Therefore,
only four types of multiple logic-level gates are used with high PMOS threshold voltage
assigned to the pull-up PMOS network of basic cells. High PMOS threshold voltage (Vth,pmos
= 0.29) is selected.
We assume that randomly generated input signals with high input voltage VDDH drive
all primary inputs of the circuit. Two subthreshold supply voltages, VDDH and VDDL, can be
provided by a voltage scalable DC to DC converter [57]. We also assume that combinational
benchmark circuits have no restrictions in primary output voltage level, either of VDDH or
VDDL. In reality, level shifting flip-flops (LCFF) [67, 28] can be placed at low voltage primary
outputs as the sequential elements of the design.
TheMILPalgorithm ofSection4.2isappliedtofindtheoptimalVDDL forthebenchmark
circuits with given performance (i.e., VDDH) in subthreshold region. Table 4.3 shows HSPICE
simulation results for single Vdd total energy per cycle as a reference and dual Vdd optimized
energy per cycle with the optimal VDDL selection. Activity ? is the average number of low to
high transitions at circuit nodes and VDDL is the optimal low voltage supply corresponding to
VDDH. Multiple logic-level gates were not required for c432, c499 and c1355, and therefore,
there were no VDDH gates driven by VDDL gates in optimized circuits; they were the same as
in [35]. From (4.7), the MILP algorithm automatically determines whether or not a multiple
logic-level gate is to be used, based upon the benefit of energy saving. The design of c3540
48
Table 4.3: Total energy per cycle with optimal VDDL for given VDDH and performance of
ISCAS?85 benchmark circuits and 32-bit ripple carry adder.
Benchmark Total Activity VDDH VDDL VDDL Multiple logic- Esingle Edual Reduc. Reduc.[35] Freq.
Circuit gates ? (V) (V) gates (%) level gates (fJ) (fJ) (%) (%) (MHz)
c432 154 0.19 0.25 0.23 5.2 0 7.9 7.8 1.1 1.1 14.4
c499 493 0.21 0.22 0.18 9.7 0 20.2 19.8 2.0 2.0 11.9
c880 360 0.18 0.24 0.19 56.7 23 14.4 10.9 24.5 22.2 13.6
c1355 469 0.21 0.21 0.18 10.2 0 19.5 19.0 2.5 2.5 9.8
c1908 584 0.20 0.24 0.21 27.6 71 26.5 23.2 12.4 5.8 11.8
c2670 901 0.16 0.25 0.19 40.2 41 32.8 26.9 18.1 14.8 17.4
c3540 1270 0.33 0.23 0.16 40.8 69 88.0 70.8 19.5 3.8 7.2
c5315 2077 0.26 0.24 0.19 60.5 62 116.8 92.2 21.1 16.1 9.8
c6288 2407 0.28 0.29 0.19 4.7 20 165.4 159.1 3.8 2.1 9.4
c7552 2823 0.20 0.25 0.21 51.6 201 131.7 112.1 14.9 11.1 13.6
32-bit RCA 352 0.17 0.31 0.18 52.3 11 21.2 14.1 33.5 31.3 16.7
Average 32.7 14.0 10.2
shows that energy saving of the dual-Vdd circuit is improved 15.7% more than [35]. It is
evident that the optimized circuit with multiple logic-level gates utilizes more time slack as
shown in Figure 4.5.
Multiple logic-level gates remove topological constraints and allow VDDL gates to drive
VDDH gates. Thus, MILP can assign VDDL to more gates on non-critical paths and further
increase energy saving as expected. For the dual-Vdd design with multiple logic-level gates,
the best case is about 24.5% energy reduction for c880 (an 8-bit ALU). Another circuit, c6288
(a 16?16 multiplier), has only 3.8% reduction. There is little benefit of dual-Vdd design for
c432, c499, and c1355, where most paths are balanced. The optimized circuits show energy
saving of 14.0% on an average, even it includes the energy savings of path balanced circuits.
Figure 4.6 shows the gate slack distributions obtained from static timing analysis [33] of the
single-Vdd and dual-Vdd designs of c880. Clearly, it is the large number of gates with large
slack in the single-Vdd design that allows many low Vdd assignments.
The energy saving from dual voltage design depends on the time slacks of gates. In the
subthreshold region it is also affected by the number of VDDL gates driven by VDDH gates.
Leakage current of PMOS devices in a VDDL gate is suppressed by high voltage input signal
from a VDDH gate, because the source to gate voltage, Vsg, in PMOS devices is negative. The
leakage energy is comparable to dynamic energy in the subthreshold region. This leakage
49
0 20 40 60 80 100 1200
50
100
150
200
250
300
350
Slack time (nsec)
Number of gates
(a) Single-Vdd design at Vdd = 0.23V.
0 20 40 60 80 100 1200
50
100
150
200
250
300
350
Slack time (nsec)
Number of gates
(b) Dual-Vdd design without level converters at VDDH = 0.23V and VDDL = 0.14V.
0 20 40 60 80 100 1200
50
100
150
200
250
300
350
Slack time (nsec)
Number of gates
(c) Dual-Vdd design with multiple logic-level gates at VDDH = 0.23V and VDDL = 0.16V.
Figure 4.5: Gate slack distribution for minimum energy per cycle for c3540.
50
0 10 20 30 40 50 60 700
10
20
30
40
50
Slack time (nsec)
Number of gates
(a) Single-Vdd design at Vdd = 0.24V.
0 10 20 30 40 50 60 700
10
20
30
40
50
Slack time (nsec)
Number of gates
(b) Dual-Vdd design at VDDH = 0.24V and VDDL = 0.19V.
Figure 4.6: Gate slack distribution for minimum energy per cycle for c880.
51
reduction is another benefit of dual voltage design for low voltage circuits. The dual voltage
technique for a nominal voltage circuit is mainly applied for dynamic power saving, while
leakage power saving is considered negligible [39].
4.4 Summary
We presented a dual-Vdd design in which special multiple logic-level gates are used in the
subthreshold regime [34]. This approach is particularly beneficial for subthreshold voltage
operation. A new MILP is devised to find an optimal low supply voltage below a given
subthreshold supply voltage. The given supply voltage is chosen for the minimum energy
per cycle for any single voltage. When paired with the lower voltage from the MILP, the
energy is further reduced. The MILP optimally selects the boundaries between the supply
voltage domains to position multiple logic-level gates. With this MILP, ISCAS?85 benchmark
circuits could save up to 24.5% energy per cycle more than the previous MILP results in
Chapter 3. Notably, the energy per cycle for these designs is always less than the absolute
minimum energy point for the circuit with single voltage operation. Alternatively, the MILP
can trade energy reduction for speed increase without letting the energy rise.
52
Chapter 5
Process Variation Effect on Minimum Energy Design Using Dual Subthreshold Supply
5.1 Multiple Supply Voltages
Utilizing the time slack for power reduction with multiple supply voltages has been pre-
sented with nominal operating circuits in [22]. The theoretical models assume non-crossing
parallel signal paths and are developed to determine the effective number of power supply
voltages for power saving. The power reduction effect becomes saturated as supply voltages
are added to optimize a circuit. There is no reason to use more than three supply voltages for
power reduction in above-threshold operating circuits, considering power penalties induced
by multiple-Vdd.
For subthreshold circuits, we investigate the energy reduction effect from multiple-Vdd
in a real benchmark circuit, c2670. To verify the energy saving of multiple-Vdd design from
path slack as [22], we do not consider multiple voltage boundaries within the optimized
benchmark circuit. Thus, we eliminate topological constraints in MILP [35] and modify it
by allowing multiple-Vdd selections during minimizing energy consumption. Figure 5.1 shows
gate slack distribution of c2670 at a single Vdd = 0.30V. After optimizing c2670 using MILP
with up to quadruple Vdd, we obtained the results of optimal VDDL and energy saving as
shown in Table 5.1. Energy reduction effect is more quickly saturated with multiple-Vdd for
a subthreshold circuit. It is not promising to utilize all of the time slack inside a circuit
with multiple-Vdd, because gate delay exponentially depends on Vdd. Even optimized c2670
with quadruple Vdd improves more 4.3% energy saving, compared to the dual-Vdd design.
The energy saving will be further reduced when we consider energy overhead from level
converting devices to solve multiple voltage boundaries in real circuit design. Therefore, we
focus on optimizing subthreshold circuits with dual-Vdd for minimum energy design.
53
0 5 10 15 200
20
40
60
80
100
Slack time [nsec]
Number of gates
Figure 5.1: Gate slack distribution (number of gates vs. slack) for c2670 at Vdd = 0.30V;
slacks obtained by static timing analysis using gate delays for PTM 90nm CMOS.
Table 5.1: The optimal VDDL and energy saving of c2670 at VDDH = 0.30V from MILP
solutions [35] for multiple-Vdd design without topological constraints in PTM 90nm CMOS.
Multiple Vdd Optimal VDDL Energy Saving (%)
Dual 0.24V 19.6
Triple 0.25V, 0.21V 22.8
Quadruple 0.26V, 0.22V, 0.17V 23.9
5.2 Technology Scaling
When performance is not a concern for energy constrained applications, a circuit can
operate at the energy optimal voltage (Vopt) to achieve the minimum energy per cycle (Emin)
by scaling Vdd. Vopt is theoretically independent of Vth, as reduced delay by Vth offsets
increased leakage current in Eleak. The relative significance of Edyn and Eleak determines
Vopt when scaling Vdd [76]. When Eleak is larger than Edyn in Etot, then it causes Vopt value
to move up to suppress Eleak. Conversely, larger Edyn results in lower Vopt value. Thus, Edyn
and Eleak are quite close to the same value at Vopt.
54
Fortechnologyscaling, Vopt isproportionaltoS, whichisdependentonthescaling[81,7].
Without considering the slope of input signals, Vopt can be expressed as Kopt ?S, where Kopt
is a dependent parameter of the circuit structure and independent of the scaling effect [23].
Using Vopt=Kopt ?S, total energy components, Edyn and Eleak, in (3.6) are presented at the
minimum energy point as [23]
Emin,dyn = ?0?1 ?CL ?V 2opt
= (?0?1 ?K2opt)?CL ?S2
Emin,leak = K ?CL ?V 2opt ?10?VoptS
= (K ?10?Kopt ?K2opt)?CL ?S2
(5.1)
where S increases and CL decreases with technology scaling. Figure 5.2 shows the scaling
trends of Emin and Vopt for a 32-bit RCA in PTM CMOS technology. Technology scaling
apparently raises Vopt and reduces CL ? S2. Thus, minimum energy of a circuit is reduced
and its performance may improve at Vopt on the device scaling.
Before investigating technology scaling effect on the energy saving of dual Vdd design for
a subthreshold circuit, we derive the energy consumption ratio of dual-Vdd design to single
Vdd reference in terms of Edyn and Eleak. The dynamic energy ratio is given from (3.6)
Edyn,dual
Edyn,single =
?0?1 ?(CVL ?V 2DDL +CVH ?V 2DDH)
?0?1 ?Ctot ?V 2DDH
= 1? CVLC
tot
?
parenleftBigg
1?
parenleftbiggV
DDL
VDDH
parenrightbigg2parenrightBigg (5.2)
where CVL is the sum of load capacitances in VDDL cells and CVH is the sum of those in
VDDH cells. Ctot (= CVL+CVH) is total load capacitance of a circuit. VDDH is equal to a
single Vdd of the reference circuit. From (3.4), (3.6) and (3.7), the leakage energy ratio is
55
30 40 50 60 70 80 90
15
20
25
30
35
40
Energy per cycle [fJ]
0.25
0.3
0.35
0.4
Technology node [nm]
V opt
[volt]
Figure 5.2: HSPICE simulation results of minimum energy per cycle and energy optimal
voltage for a 32-bit RCA for a single-Vdd in PTM CMOS technology (? = 0.30).
given as follows
Eleak,dual
Eleak,single =
Ioff,VL ?VDDL ?Tc +Ioff,VH ?VDDH ?Tc
Ioff,tot ?VDDH ?Tc
= WVL ?10
?VDDL
S ?VDDL +WVH ?10
?VDDH
S ?VDDH
Wtot ?10?VDDHS ?VDDH
= 1? WVLW
tot
?
parenleftbigg
1? VDDLV
DDH
?10??(VDDH?VDDL)S
parenrightbigg
(5.3)
where VVL is the sum of device widths in VDDL cells and VVH is the sum of those in VDDH
cells. Wtot (= WVL+WVL) is the total device width of a circuit. Tc is a critical path delay
for a circuit.
Applying a dual-Vdd technique for a subthreshold logic circuit on Emin (Edyn?Eleak),
we need to find the optimal VDDL for minimum energy consumption with given VDDH=Vopt.
We use the MILP algorithm [35] for dual-Vdd design to optimize a 32-bit RCA operating at
Vopt. The MILP model does not allow a VDDL cell to drive a VDDH cell as its fanout gate on
56
account of topological constraint (similar to CVS). The results are shown in Figure 5.3(a).
The minimum energy per cycle for the dual-Vdd circuit is further reduced from its minimum
energy operation, while performance remains constant.
We introduce the figure-of-merit (FOM) of energy saving as Number of VDDL gates
times (VDDH ?VDDL). The FOM is well matched with the energy saving of dual-Vdd design.
Although Vopt moves to slightly higher value with technology scaling, device scaling does not
considerably affect the energy saving in Figure 5.3(b). As seen in (5.2), the dynamic energy
ratio is independent of technology scaling parameters and the scaling of load capacitance
does not affect its ratio. The leakage energy ratio has S and ? as technology parameters
in (5.3), but both parameters increase together with device scaling [3]. Thus, the term of
?
S ?(VDDH ?VDDL) does not affect significantly the leakage energy saving, where the optimal
VDDL is close to VDDH from exponential delay characteristic of a subthreshold logic circuit on
scaling Vdd. Therefore, the amount of total energy saving comes from circuit structure, rather
than technology choice. It means that the distribution of time slack in a circuit structure
is not changed by device scaling. For each technology, only small variation of FOM and
energy saving in Figure 5.3(b) may come from relatively different delay increments of logic
gates on scaling Vdd [81]. But, it does not alter considerably the time slack distribution of a
subthreshold logic circuit.
5.3 Process Variation
Subthreshold circuits are highly sensitive to Vth variation, which exponentially affects Ion
and delay. Vth variation also causes different relative strength of PMOS and NMOS devices
and thus affects functional failure of logic gates [42]. Variability of Vth comes from global
(inter-die) and local (intra-die) process variations [3]. Global variation of Vth is induced
by manufacturing process and temporal variation, but it can be compensated through the
adaptive body biasing (ABB) technique [24]. Random dopant fluctuation (RDF) is the
dominant source of local Vth variation compared to geometric variations such as Leff in the
57
30 40 50 60 70 80 900
5
10
15
20
25
30
35
40
Technology node [nm]
Energy per cycle [fJ]
Single Vdd
Dual Vdd design
VDDH=0.37V
VDDL=0.24V
VDDH=0.34V
VDDL=0.21V
VDDH=0.32V
VDDL=0.21V
VDDH=0.30V
VDDL=0.18V
(a) Minimum energy per cycle at Vdd = Vopt = VDDH.
30 40 50 60 70 80 900
0.5
1
1.5
2
Normalized FOM
0
10
20
30
40
50
Technology node [nm]
Energy saving [%]
(b) Normalized FOM and energy saving.
Figure 5.3: The optimal VDDL from MILP [35] algorithm and total energy per cycle from
HSPICE simulation of dual-Vdd design for a32-bit RCA (Fig.5.2) in PTM CMOSTechnology.
The relationship of figure of merit (FOM) to energy saving is shown for technology scaling
trend.
58
subthreshold region [82]. RDF variations have independent nature and inverse dependence
on (WL)?12. Therefore, local Vth variation can be reduced by the gate sizing and logic depth
choice through averaging variability [53, 82].
To investigate the effect of Vth variability on dual-Vdd design for a subthreshold logic
circuit, we normally randomize the vth0 parameter in the BSIM4 model card of PTM CMOS
technology [85] in the Monte Carlo simulation. For the global variations, we characterize the
standard deviation (?vth0) as 5% variation relative to its original vth0 value for both PMOS
and NMOS devices. This presents samples of logic gates through multiple dies as inter-die
process variation. As the local variation, RDF is modeled from an empirical expression [2, 3]
through normally distributed vth0 with
?vth0,RDF = 3.19?10?8 Tox ?N
0.4
chradicalbig
Weff ?Leff (5.4)
where Tox is the gate equivalent oxide thickness and Nch is the channel doping concentration.
Leff and Weff are the effective channel length and width of device, respectively. Both
?vth0 and ?vth0,RDF demonstrate entire Vth variation of a subthreshold circuit, which is still
normally distributed.
We ran a 1k-point Monte Carlo simulation using HSPICE simulator [27] with global
and local vth0 variations. Figure 5.4(a) shows the simulation result of NMOS Vth variation
for technology scaling. For subthreshold supply voltage, Vdd=0.30V, the worst 3? Vth value
is as high as 79mV than the typical Vth value in PTM 32nm NMOS compared to 62mV in
PTM 90nm NMOS. Therefore, Vth variation is higher with small feature size.
Under normally distributed Vth variation in the subthreshold region, active current Ion
variability can be modeled as lognormal random variable and exhibits lognormal distribu-
tion [82, 42] with
?Ion
?Ion =
radicalBig
e(
?Vth
mVT )
2 ?1 (5.5)
59
30 40 50 60 70 80 900.3
0.35
0.4
0.45
0.5
0.55
Technology node [nm]
V th,NMOS
[volt]
Typical V
th,NMOS
Worst 3? Vth,NMOS
(a) NMOS threshold voltage variation.
30 40 50 60 70 80 902.1
2.2
2.3
2.4
2.5
2.6
2.7
Technology node [nm]
I on,NMOS
variability (
?/?
)
(b) Active current Ion,NMOS variability.
Figure 5.4: HSPICE simulation results of NMOS Vth variation and active current Ion vari-
ability at Vdd = 0.30V from a 1k-point Monte Carlo simulation with normally distributed
vth0 parameter in PTM CMOS technology.
60
where the subthreshold slope coefficient m decreases as Vdd reduces. It causes an increase
in Ion variability in low voltage operation. As shown in Figure 5.4(b), Ion variability is up
to 2.64X from the mean value in PTM 32nm CMOS. Since Ion exponentially depends on
Vth, Ion variability is higher than Vth variation. It also induces the delay variability of a
subthreshold circuit from (3.2).
As mentioned before, the gate sizing and logic depth choice of a subthreshold circuit
reduce independent local Vth variation through averaging. Figure 5.5(a) shows the worst case
critical path delays of the single-Vdd and dual-Vdd 32-bit RCA in Figure 5.3(a) from 1k-point
Monte Carlo simulation. For single-Vdd design, worst 3? critical delay is reduced through
averaging compared to Ion variability in Figure 5.4(b).
Dual-Vdd design uses two supply voltages which can provide a chance for reducing crit-
ical path delay in a circuit. In the subthreshold region, the gate capacitance of the MOS
device may reduce when Vdd goes down [26]. The critical path delay reduces when VDDH
gates on the critical path drive VDDL gates as fanout.As shown in Figure 5.6(a), an inverter
(VDDH = 0.30V) driving four inverters (VDDL = 0.18V) reduces its output capacitance load.
Figure 5.6(b) shows the delay of the inverter reduces about 8% from the reduced output
capacitance.
From this aspect, the worst critical delay of dual-Vdd 32-bit RCA is less than that of a
single-Vdd 32-bit RCA. The worst critical delay depends on VDDL assignment to the fanout
gates of VDDH gates on the critical path.
We also measure minimum energy variability for the single and a dual-Vdd 32-bit RCA
using each 3? critical delay with Vth variation as shown in Figure 5.5(b). Compared to typical
Emin with a single-Vdd, both minimum energies increase with delay variability, which induces
more leakage energy from the extended operation time. In PTM 32nm CMOS, the worst
case of Emin for dual-Vdd design is 1.92 times typical Emin, while that of Emin for a single-Vdd
is 2.96 times. It means that the worst 3? Emin of dual-Vdd design is reduced 35.2% from the
worst case Emin with a single-Vdd. Dual-Vdd design for subthreshold circuits is more effective
61
30 40 50 60 70 80 901.9
2
2.1
2.2
2.3
Technology node [nm]
Delay variability (T
c, 3
? / T
c, typical
)
Single V
dd
Dual Vdd design
(a) Worst 3? critical path delay variability.
30 40 50 60 70 80 901
1.5
2
2.5
3
3.5
Technology node [nm]
E min
variability (E
min, 3
? / E
min, typical
)
Single V
dd
Dual Vdd design
(b) Worst 3? minimum energy variability.
30 40 50 60 70 80 9025
30
35
40
Technology node [nm]
Energy saving [%]
Typical
Worst 3? variability
(c) Energy saving with worst 3? variability.
Figure 5.5: HSPICE simulation results of critical path delay and minimum energy for a 32-bit
RCA (Fig. 5.3(a)) from a 1k-point Monte Carlo simulation in PTM CMOS technology.
62
3.6 3.65 3.7 3.75 3.8 3.850
50
100
Output capacitance (fF)
Number of Samples
3.6 3.65 3.7 3.75 3.8 3.850
200
400
600
Output capacitance (fF)
Number of Samples
INV(Vdd=0.30V) with four INV load (Vdd=0.30V)
INV(Vdd=0.30V) with four INV load (Vdd=0.18V)
(a) Output capacitance.
0 0.5 1 1.5 2 2.5 30
20
40
60
Delay (nsec)
Number of Samples
0 0.5 1 1.5 2 2.5 30
20
40
60
Delay (nsec)
Number of Samples
INV(Vdd=0.30V) with four INV load (Vdd=0.18V)
INV(Vdd=0.30V) with four INV load (Vdd=0.30V)
(b) Delay variability: td,3? = 1.51ns (0.30V,0.30V), td,3? = 1.39ns (0.30V,0.18V).
Figure 5.6: Distribution of the output capacitance and delay variability for an inverter
with fanout of four from a 1k-point Monte Carlo simulation with normally distributed vth0
parameter in PTM CMOS technology.
63
to mitigate increment of minimum energy with process variation in small feature sizes. Thus,
we expect more energy saving when variability is more of a concern. Figure 5.5(c) shows
energy savings of dual-Vdd 32-bit RCA with and without process variation for technology
scaling.
5.4 Summary
A subthreshold circuit is susceptible to process variation, which affects the delay of
gates. Dual-Vdd design may mitigate the delay variability of a circuit in the subthreshold
region, when VDDL is assigned to more fanout gates of VDDH gates on the critical path. The
worst delay reduction comes from the reduced gate capacitance of VDDL fanout gates. Thus,
we expect more energy saving when process variation is more concerned. Dual-Vdd technique
is valid and beneficial for minimum energy design.
A recent study has investigated the process variation in 45nm bulk and high-k CMOS
technologies [70, 71]. As pointed out in that study, there may be some advantages for
subthreshold circuits in the 45nm high-k technology but more detailed work is needed.
64
Chapter 6
Dual Voltage Design for Minimum Energy Using Gate Slack
In this chapter, we present a new slack-time based algorithm for dual-Vdd design with
linear-time complexity. Although a global optimum is sought, computation time is kept
low. The slack of a gate is defined as the difference between the critical path delay for the
circuit and the delay of the longest path through that gate. Positive non-zero slack gates
are classified into two groups, one in which all gates can be unconditionally assigned low
voltage and the other where only a selected subset can be assigned low voltage without
violating the positive non-zero slack requirement. Multiple voltage boundaries are given
special consideration to avoid the use of level shifting devices. The overall complexity of
this power optimization algorithm is linear in number of gates as compared to a previously
published exponential-time exact algorithm using mixed integer linear program (MILP).
Two heuristic algorithms, CVS and ECVS, for dual-Vdd design have theoretical run-time
complexity O(n2), where n is total number of gates in a circuit [13]. Most research in this field
has focused on improving power saving by implementing their own greedy algorithms [11,
12, 39]. These are still heuristic approaches and provide a suboptimal solution for dual-Vdd
assignment. Mixed integer linear programs (MILP) [19] are widely used to optimize a circuit
for minimizing power or energy consumption using sizing, multiple Vdd, multiple threshold
voltage (Vth) and combinations of those [15, 34, 35, 64]. MILP searches for a global optimal
solution for an objective function, which is designed to minimize power, considering the entire
design space. Thus, it may take huge time to optimize large circuits used in modern VLSI
systems. The time complexity of MILP optimization may not be acceptable in practice.
For dual-Vdd design, we need to find the optimal VDDL and its assignments to positive
slack gates in a circuit for minimum power. If we can quickly find all positive slack gates
65
that can be assigned to VDDL, it reduces much optimization work of dual-Vdd design and
saves computation time.
6.1 MILP for Optimal VDDL and Dual Vdd Assignment
There are two ways to find the optimal lower supply voltage VDDL and its assignments
for dual-Vdd design in the literature. First, the optimal VDDL is searched by applying a VDDL
assignment algorithm to a circuit with different VDDL values, then it selects a pair of the
optimal VDDL and its assignment for minimum power consumption [12, 67, 68]. Otherwise,
theoretical path delay model is developed to determine the optimal VDDL for maximum
power saving, then VDDL assignments are executed to achieve lowest power consumption
considering multiple voltage boundaries [22, 39]. Most dual-Vdd techniques are based on
heuristic greedy algorithms and applied to nominal operating circuits for lowering power
consumption.
For energy constrained applications, the dual-Vdd technique is applied to a subthreshold
logic circuit for further reducing the minimum energy operating point [35], where the MILP
models similar to CVS are formulated to find the best optimal VDDL and its assignments
for dual-Vdd design. This global optimum algorithm is applicable to a circuit operating at
both subthreshold and nominal supply voltage, but multiple runs are needed to consider all
available VDDL to given VDDH for searching the optimal VDDL. Now, we extend the MILP
models to select automatically the optimal VDDL and its assignments by introducing new
variables for one-time run. We briefly explain the new variables and parameters here before
presenting the MILP models.
? Xi,v: supply voltage assignment integer variable that is 1 for gate i with power supply
voltage v.
? Vv: supply voltage integer variable that is 1 for two selectedVDDH andVDDL in available
power supply voltage v.
66
? tdi,v: gate delay for gate i with supply voltage v.
? Vdd,v: power supply voltage value for v.
? Gtot: total number of gates in a circuit.
MILP models are reformulated from [35]:
Minimize
summationdisplay
i
summationdisplay
v
Etot,i,v ?Xi,v
?i ? all gates and ?v ? power supply voltage domain V
(6.1)
Etot,i,v = ?i ?CL,i,v ?V 2dd,v +Pleak,i,v ?Tc (6.2)
Subject to timing constraints:
Ti ? Tj +td,i,v ?Xi,v ?j ? all fanin gates of gate i (6.3)
Ti ? Tc ?i ? all primary output gates (6.4)
Subject to topological constraints:
summationdisplay
v?V
Vdd,v ?Xi,v ?
summationdisplay
v?V
Vdd,v ?Xj,v
?j ? all fanin gates of gate i
(6.5)
Subject to dual supply voltages selection:
summationdisplay
v?V
Vv = 2 (6.6)
VVDDH = 1 (6.7)
summationdisplay
v?V
Xi,v = 1 ?i ? all gates (6.8)
summationdisplay
i
Xi,v ? Gtot ?Vv ?i ? all gates, ?v ? V (6.9)
67
The main difference of MILP models from [35] is dual-Vdd selection conditions. Tc is critical
path delay and given by the performance requirement. VDDH is selected to hold Tc from (6.7)
in power supply domain V. Using a bin-packing technique [1] all gates must be assigned to
one of the power supply voltages in V from (6.8) and (6.9).
MILP always guarantees that a dual-Vdd circuit with the optimal VDDL and it assign-
ments achieve minimum energy consumption at the same performance. We use absolute
optimal results of MILP as a reference to check the accuracy of our slack-time based algo-
rithm that is presented in the next section.
6.2 New Slack-Time Based Algorithm for Dual-Vdd Design
In this section, we propose a new slack-time based algorithm that finds the optimal
VDDL and its assignments for dual-Vdd design. The energy saving is as much as the optimal
solution from the MILP model.
First, our algorithm generates slack time distribution for a given circuit. We have
developed an expanded version of static timing analysis (STA) [25]. For the output of gate
i, let TPI(i) be the longest time for an event to arrive from a PI and TPO(i) be the longest
time for an event to reach a PO. The delay of the longest path [45, 46] through gate i is
given by,
Dp,i = TPI(i) +TPO(i) (6.10)
The critical path delay for the circuit is,
Tc = Max{Dp,j} ? gate j (6.11)
Slack time for gate i is found as follows:
Si = Tc ?Dp,i (6.12)
68
The time for calculating slack time for all gates of a circuit is O(n), where n is total
number of gates. Figure 6.1(a) shows the slack time distribution for c2670 in ISCAS?85
benchmark circuits in PTM 90nm CMOS technology [85].
To quickly identify the possible VDDL gates on non-critical paths, we introduce an upper
slack time (Su) that guarantees that any gate with slack time larger than Su will be free
from timing violation, i.e., negative slack, irrespective of the voltage assignment for other
gates. The slack time of a VDDH gate that is equal to Su becomes zero after assigning VDDL
to all gates on the longest path through it. We find Su using (6.12). Let S?i be the slack time
of gate i after assigning VDDL to all gates on the longest path through it. Now, D?p,i is the
longest path delay through the gate i.
S?i = Tc ?D?p,i
= Tc ?? ?Dp,i
= Tc ?? ?(Tc ?Si)
(6.13)
Where ? is the ratio of Dp,i to D?p,i. It is approximated by
? = D
?
p,i
Dp,i ?
T?c
Tc (6.14)
T?c is the critical path delay when VDDL is supplied to the entire circuit. It is determined by
the static timing analysis in the same way as [25]. By substituting Su for Si in (6.13), S?i
become zero. Thus, Su is obtained as:
Su = ? ?1? ?Tc (6.15)
In Figure 6.1(b), any gate that has a positive slack time larger than Su, i.e., in the range
covered by the right arrow, is safely assigned to VDDL without timing violation. Su serves
69
0 100 200 300 400 500 6000
10
20
30
40
50
60
70
80
90
Slack time (psec)
Number of gates
(a) Slack time distribution for a single nominal Vdd = 1.2V.
0 100 200 300 400 500 6000
10
20
30
40
50
60
70
80
90
Slack time (psec)
Number of gates
VDDL gates
Su=239 psec
(b) Upper slack time Su at VDDH = 1.2V and VDDL = 0.69V.
0 100 200 300 400 500 6000
10
20
30
40
50
60
70
80
90
Slack time (psec)
Number of gates
Su=239 psecSl=7 psec
VDDL gatesPossibleVDDL gates
(c) Lower slack time Sl at VDDH = 1.2V and VDDL = 0.69V.
Figure 6.1: Procedure of slack-time based algorithm for ISCAS?85 benchmark circuit c2670
in PTM 90nm CMOS.
70
as a slack threshold. Any gate with slack above this threshold is unconditionally assigned to
VDDL irrespective of voltages of other gates on paths passing through it.
The slack time of all gates on critical paths is zero. Hence, there is no room to assign
VDDL to those gates. But, if there is a gate with a positive slack time that is close to zero,
it may be possible to assign VDDL, provided other gates on paths through it remain with
VDDH, such that no path delay exceeds Tc.
Let td be a gate delay in the circuit. After assigning VDDL, td is increased. Suppose,
it becomes t?d. The amount t?d ? td is the increase in path delay through the gate. This is
also the reduction in the slack of other gates on paths through the VDDL gate. Therefore, a
gate that has slack time larger than t?d ?td can be assigned to VDDL. Let us call this slack
time the lower slack time (Sl). Because each logic gate has a different value of t?d ?td, the
minimum value of t?d ?td is used to define Sl.
Sl = Min[(t?d ?td)gates j]
= Min[(? ?1)?td gates j] ?j ? all gates
assume t
?
d,j
td,j ?
D?p,j
Dp,j = ?
(6.16)
For simplicity, we assume that path delay is proportional to the delay of a gate on it. Timing
violations from this assumption are checked later when VDDL gates are chosen, finally.
As shown in Figure 6.1(c), Sl can be used to search possible VDDL gates between Sl and
Su, the range shown by a double arrow. The gates with positive slack time less than Sl are
unconditionally assigned to VDDH and are located near or on critical paths.
Until now, we have demonstrated how to select gates that can be assigned to VDDL using
simple two slack times, Su and Sl. A gate with slack time larger than Su is assigned to VDDL,
while a gate with slack time less than Sl is assigned to VDDH. For a gate with slack time
between Sl and Su, we need to carefully select the power supply voltage. VDDL assignment
for these gates affects the assignment of other gates on paths if we have to hold the path
71
delay within Tc. The order of VDDL assignment to these gates affects the energy saving of
the dual-Vdd design, when we consider multiple voltage boundaries. Thus, we need to use a
greedy approach depending on the type of dual-Vdd design. If we allow VDDL gates to drive
VDDH gates like ECVS, the selection order should minimize the use of level converters to
maximize energy saving. Because CVS does not use level converters, there exists topological
constraints that prevent a VDDL gate from driving a VDDH gate. Therefore, the selection
order is chosen to maximize VDDL assignment to gates with this topological constraint.
In this chapter, we use the slack time distribution to implement a dual-Vdd algorithm
like CVS. The result of the algorithm is compared to MILP solution in terms of energy saving
and run-time. To maximize VDDL assignment with topological constraints, first, higher logic
depth gates between Sl and Su should be assigned to VDDL. This priority reflects the fact
that VDDL gates do not feed into VDDH gates directly. The timing violation should be checked
when a gate between Sl and Su is assigned to VDDL. We find all VDDL gates, which do not
violate the critical path timing constraint Tc. Additionally, checking topological constraints
for these VDDL gates, we ascertain that all VDDL gates satisfy both timing and topological
constraints.
The final stage of the algorithm searches for the optimal VDDL value to give maximum
energy saving. We already know all VDDL gates for each available VDDL value from previous
procedures. Thus, we simply calculate the energy saving from VDDL gates, then select the
optimal VDDL to meet best energy saving. Figure 6.2 shows the slack time distribution of
an optimized c2670 circuit that has the optimal VDDL = 0.69V from our algorithm. In next
section, we show the results of optimization from the slack-time based algorithm for ISCAS?85
benchmark circuits, which operate in either subthreshold or nominal supply voltage.
6.3 Simulation Results
As example circuits, ISCAS?85 benchmark circuits are synthesized with four types of
basic standard cells, namely, INV, NAND2, NAND3, and NOR2. Average activity of a
72
0 100 200 300 400 500 6000
10
20
30
40
50
60
70
80
90
100
110
120
Slack time (psec)
Number of gates
Figure 6.2: Slack time distribution of an optimized c2670 with VDDH = 1.2V and VDDL =
0.69V.
synthesized circuit is found from logic simulation with randomly generated input vectors.
We extract gate delay, capacitance and leakage power of basicstandard cells through HSPICE
simulation by varying power supply voltage from 0.1V to 1.2V in 10mV steps. All HSPICE
simulations were run for room temperature (300K) using PTM 90nm CMOS process, where
CMOS device threshold voltages are Vth,pmos = 0.21V and Vth,nmos = 0.29V at nominal
Vdd = 1.2V.
For comparing the algorithm of Section 6.2 with MILP of Section 6.1, we measure the
energy consumption of benchmark circuits using HSPICE simulation [27] for a single-Vdd as
a reference. Random input vectors for each circuit in HSPICE simulation are the same as
those used in logic simulation to measure the average activity. To find the optimal VDDL and
its assignments for maximum energy saving, the MILP algorithm is applied to a synthesized
circuit. With MILP solution, the SPICE netlist of an optimized circuit is generated, where
each gate has its voltage assignment either as the given VDDH or an optimal VDDL. HSPICE
simulation runs with this netlist to measure energy consumption of the optimized dual-Vdd
73
Table 6.1: Energy saving and optimal VDDL from MILP [35] or slack-time based algorithm for
given VDDH in ISCAS?85 benchmark circuits in subthreshold region in PTM 90nm CMOS.
Both algorithms produced identical result.
Benchmark Total Activity VDDH VDDL VDDL Esingle Edual Ereduc. Freq. MILP Slack
circuit gates ? (V) (V) gates (%) (fJ) (fJ) (%) (MHz) CPU time(s)* CPU time(s)*
c432 154 0.19 0.25 0.23 5.2 7.9 7.8 1.1 14.4 0.3 2.5
c499 493 0.21 0.22 0.18 9.7 20.2 19.8 2.0 11.9 0.3 19.2
c880 360 0.18 0.24 0.18 46.4 14.4 11.2 22.2 13.6 5.8 17.9
c1355 469 0.21 0.21 0.18 10.2 19.5 19.0 2.5 9.8 0.2 13.3
c1908 584 0.20 0.24 0.21 24.3 26.5 25.0 5.8 11.8 3.2 47.6
c2670 901 0.16 0.25 0.21 46.4 32.8 28.0 14.8 17.4 35.9 134.4
c3540 1270 0.33 0.23 0.14 7.0 88.0 84.6 3.8 7.2 3.2 256.5
c5315 2077 0.26 0.24 0.19 47.1 116.8 98.0 16.1 9.8 852.3 692.0
c6288 2407 0.28 0.29 0.18 2.7 165.4 162.0 2.1 9.4 2.6 1293.7
c7552 2823 0.20 0.25 0.21 42.3 131.7 117.1 11.1 13.6 1452.2 1408.3
Average 24.1 8.2
*Intel Core 2 Duo 3.06GHz, 4GB RAM.
Table 6.2: Energy saving and optimal VDDL from MILP [35] and slack-time based algorithm
for ISCAS?85 benchmark circuit operating in nominal Vdd in PTM 90nm CMOS.
Single Vdd Dual Vdd
MILP Slack-time based algorithm
Benchmark VDDH Esingle Freq. VDDL VDDL Edual Ereduc. CPU VDDL VDDL Edual Ereduc. CPU
circuit (V) (fJ) (GHz) (V) gate (%) (fJ) (%) (s)* (V) gate(%) (fJ) (%) (s)*
c432 1.20 160.1 1.7 0.75 5.2 153.9 3.9 0.6 0.75 5.2 153.9 3.9 15.8
c499 1.20 460.6 2.3 0.79 19.5 433.4 5.9 403.8 0.79 19.5 433.4 5.9 194.4
c880 1.20 277.6 2.0 0.59 56.9 136.1 51.0 455.0 0.60 57.5 136.6 50.8 62.1
c1355 1.20 453.0 2.3 0.69 13.6 433.6 4.3 340.2 0.69 13.6 433.6 4.3 132.0
c1908 1.20 496.5 1.5 0.67 26.9 402.4 19.0 2146.9 0.67 26.9 402.4 19.0 247.8
c2670 1.20 647.6 1.8 0.69 57.9 337.9 47.8 20848.9 0.69 57.9 337.9 47.8 480.7
c3540 1.20 1844.0 1.1 0.70 11.6 1667.0 9.6 601.0 0.70 11.6 1667.0 9.6 1243.5
c6288 1.20 3066.0 0.5 1.18 53.1 2976.0 2.9 10523.7 0.47 2.9 2985.0 2.6 6128.0
Average 30.6 18.0 24.4 18.0
*Intel Core 2 Duo 3.06GHz, 4GB RAM.
circuit. The same procedure is repeated for the design obtained by the slack-time based
algorithm.
First, we apply both algorithms to benchmark circuits operating in the subthreshold
region. We assume that VDDH at minimum energy is given with the corresponding speed for
each benchmark circuit. Table 6.1 shows HSPICE simulation results from the two algorithms.
The results of the two algorithms exactly match each other. Using dual-Vdd design, total
energy saving for c880 (8-bit ALU) is 22.2% as the best case.
74
In Table 6.2, both algorithms are applied to optimize benchmark circuits operating with
the nominal supply voltage. We set 1.2V as a nominal power supply voltage for PTM 90nm
CMOS by referring to the industry standard 90nm CMOS technology. The results from
from the two algorithms do not match for c880 and c6288, but energy savings are very close.
Evidently, the result of the slack-time based algorithm is very close to the global optimization,
even though it uses a greedy heuristic to select the best VDDL gates from all VDDL gates that
pass timing constraint. For c880, energy savings from MILP and our algorithm are 51.0%
and 50.8%, respectively. Compared to the energy savings in the subthreshold region, these
energy savings are much larger. This is because lower supply voltage increases the gate
delay exponentially in the subthreshold region, while the gate delay increase for the nominal
voltage operation is polynomial according to the alpha-power law model [60, 76]. It means
that positive slack of gates in a circuit is reduced quicker by assigning VDDL in subthreshold
region. Thus, we obtain an optimal VDDL that is closer to VDDH and there are fewer VDDL
gates as well. Figure 6.3 shows slack time distributions before and after optimization by our
algorithm applied to c880 for both subthreshold and nominal voltage operations.
We measured the run-time of two algorithms based on CPU time in seconds. Our
algorithm is written in the Perl script language. Thus, it has inherently slower execution
than a program in the C language. The run time for MILP depends on the number of integer
variables, the complexity of inequalities that specify the linear constraints, and the size of
optimization space. From Table 6.1, MILP is mostly faster than our algorithm except for
c5315 and c7552. Both circuits have large slacks and have larger optimization spaces to
be searched. Also, available VDDL as an integer variable in MILP is limited by minimum
operating voltage that guarantees correct logic function for the lower supply voltage. It is
0.1V below the point at which the circuit function fails. This limitation reduces the size of
the optimization space for the MILP algorithm.
In Table 6.2, the run time of our algorithm is ?43X faster than MILP for c2670, because
a larger range for VDDL in nominal operation needs to be searched by MILP. For available
75
0 10 20 30 40 50 60 700
10
20
30
40
Number of gates
0 10 20 30 40 50 60 700
10
20
30
40
Slack time (nsec)
Number of gates
Optimized
(a) Subthreshold: VDDH = 0.24V and VDDL = 0.18V.
0 100 200 300 400 5000
10
20
30
40
Number of gates
0 100 200 300 400 5000
10
20
30
40
50
60
Slack time (psec)
Number of gates
Optimized
(b) Nominal: VDDH = 1.2V and VDDL = 0.60V.
Figure 6.3: Slack time distribution before and after optimization of slack-time based algo-
rithm for c880.
76
VDDL from power supply domain, our algorithm has linear time complexity O(n) for finding
the best energy saving by reducing time of searching for VDDL gates using the thresholds
Sl and Su. MILP recursively recursively VDDL gates from all gates inside the circuit to
obtain the best energy saving. Thus, MILP displays an exponential time complexity for
some benchmark circuits. Therefore, we can use our algorithm to optimize large circuits for
dual Vdd design within reasonable time instead of using the exponential complexity MILP.
6.4 Summary
We present here a new slack-time based algorithm for dual-Vdd design [33]. Emphasis
is on saving computation time and effort for maximizing energy saving in a given circuit.
In a dual-Vdd design, the given performance for a circuit determines the higher supply volt-
age VDDH. The method of selecting a lower supply voltage VDDL and the use of positive
slack gates are the main ideas presented in this paper. The proposed algorithm classifies
all positive slack gates into VDDH, possible VDDL, and VDDL groups, respectively, based on
the slack time of gates. After classification, the algorithm only investigates the ?possible
VDDL gates? for available VDDL considering multiple voltage boundaries in the energy op-
timization procedure. This reduces the complexity of the energy optimization process and
the computation time remains tolerable for large circuits compared to the other available
MILP methods. HSPICE simulations for ISCAS?85 benchmark circuits show energy savings
up to 22.2% in subthreshold operation and 50.8% in nominal operation, which are the same
as were obtained by the higher-complexity MILP method [35]. Computation time is reduced
up to 43X compared to MILP. Our proposed algorithm has linear time complexity of O(n)
with n being the number of gates in the circuit. This novel slack-time based algorithm is
useful because the MILP method is limited by its exponential run time cost.
77
Chapter 7
Conclusion and Future Work
This chapter provides the summary of our contribution, the conclusions of this work,
and some suggestions for the future work.
7.1 Conclusion
With rigid energy budget in energy constrained systems, subthreshold circuit design
has become a predominant technique in recent years. The battery life of remote or portable
devices may not be affordable to the system demands. In an extreme case, micro-sensor
networks may require very little energy consumption to be supplied by electrical energy
converted from the ambient energy, such as energy harvesting or energy scavenging. These
challenges are solved by designing the systems with respect to a very low supply voltage
below Vth, but performance penalty still remains for subthreshold circuits. Without the per-
formance requirement, we can focus on minimum energy operation as a primary goal. On the
other hand, some energy efficient systems have a wide range of speed requirements, therefore
the operation of systems may occur at a non-minimum energy point. The contribution of
this dissertation utilizes the time slack using dual-Vdd to further lower energy budget for
energy constrained systems that have speed requirement or not. Using dual voltage design
for subthreshold circuits, minimum energy is always less than the absolute minimum energy
point for single voltage design when the system does not require a certain speed. Alterna-
tively, using dual-Vdd the energy constrained systems can operate several times faster than
single-Vdd operation without increasing its energy consumption.
We proposed the MILP algorithm of dual voltage design for minimum energy design
without level converting devices in Chapter 3. The MILP determines globally the energy
78
optimized circuit by assigning an extra supply voltage VDDL to gates on non-critical paths.
The topological constraints eliminate lever converters that have unacceptable delay overhead
in subthreshold regimes.
In Chapter 4, we proposed another MILP algorithm for subthreshold circuits using dual
subthreshold supplies in which level converters are eliminated and special multiple logic-
level gates are used instead. The MILP optimally substitutes multiple logic-level gates into
VDDH gates at the places where VDDL gates feed into VDDH gates considering the benefit for
energy saving. From eliminating topological constraints by multiple logic-level gates, this
MILP improves energy saving up to 15.7% for ISCAS?85 benchmark circuits compared to
the previous proposed MILP.
We investigated validation of dual-Vdd design for subthreshold circuits with process
variation and technology scaling in Chapter 5. Subthreshold circuits are susceptible to Vth
variation that exponentially affects delay. A subthreshold circuit using dual-Vdd is more
immune to the delay variation induced by Vth variation, where worst delay variability is
reduced by lower gate capacitance of VDDL gates as load capacitance for VDDH gates on
critical paths. Technology trends with smaller feature size improve the speed of subthreshold
circuits, but energy saving is not solely affected by technology choice. Only the leakage
energy saving component in total energy saving is dependant on technology parameters, the
ratio of DIBL coefficient ? and subthreshold swing S. These two parameters simultaneously
increase with technology scaling, thus total energy saving eventually remains quite similar.
The amount of time slack inside a circuit determines dominantly total energy saving.
Applying the proposed framework for dual-Vdd techniques to subthreshold circuits, we
can extend the eligibility of subthreshold circuit design to more energy constrained applica-
tions in future markets.
In Chapter 6, we proposed a linear-time algorithm for dual-Vdd design using gate slack.
For an n-gate circuit, previous heuristic algorithms have theoretical time complexity O(n2) to
utilize time slack for low power consumption, where static timing analysis takes O(n) time to
79
check timing violations for each gate. Using two slack times, upper slack time (Su) and lower
slack time (Sl), we can unconditionally classify all gates into three groups, VDDL, possible
VDDL, and VDDH groups, for dual-Vdd techniques. The optimization procedure only makes
an effort to search VDDL gates in the possible VDDL group for minimum power or energy.
By reducing the search space, the time of optimization is drastically reduced for modern
VLSI circuits. We compared our slack-time based algorithm and the proposed MILP in
Chapter 3 for computation run-time and energy saving. The computation run-time for our
algorithm using gate slack is up to 43 times faster than the MILP for ISCAS?85 benchmark
circuits. Also, the energy saving from our algorithm is close to the global optimal solution
from MILP. The method of gate slack analysis can be applicable for low power design that
utilizes positive slack time inside a circuit.
7.2 Future Work
7.2.1 Minimum Energy Design with Process Variations Using Dual-Vdd
In the proposed MILP algorithms, we do not take into account process variations. As
mentioned before, subthreshold circuits are highly sensitive to Vth variation. The gate delay
and leakage current exponentially depend on Vth in the subthreshold region. The proposed
MILP algorithms utilize positive time slack based on the deterministic gate delay using
dual-Vdd and find a minimum energy point considering the deterministic leakage energy. If
we consider process variations, the gate delay and leakage current should be characterized
statistically during the optimization process. The MILP with process variations will give
more reliable global solutions for minimum energy design in newer CMOS technologies with
smaller feature sizes.
7.2.2 Level Converter for Multi-Vdd Design in Subthreshold Regime
Present level converters in industrial standard cell libraries do not show suitable choice
for multi-Vdd design in the subthreshold region. The main problem with these level shifting
80
devices is not the output voltage level for logic high ?one?, but unacceptable performance
overhead compared to the delay overhead in nominal operation. This huge delay of level
converterspreventsinsertingthemonpositiveslackpathsforefficientenergysaving. Without
proper level converters for subthreshold design, we introduced topological constraints or
multiple logic-level gates to remove use of level converts inour work. In chip design industries,
standard cell libraries are well characterized with a clean and fast input that goes fully rail to
rail [31]. Without proper level converter cells, signals may experience significant rise and fall
time degradation between the driver and receiver cells in different voltage domains. These
cause timing closure problems in chip design procedures. To solve these problems, new level
converter cells should be designed for subthreshold circuit blocks in multi-Vdd domains.
7.2.3 A New Hybrid (MILP + Gate Slack Analysis) Linear-Time Algorithm
for Low Power Design Using Multi-Vdd
The proposed slack-time based algorithm in Chapter 7 has linear-time complexity O(n)
tooptimize agivenn-gate circuit for minimumenergy. This algorithm usesgate slack analysis
to group all gates into three groups in a simple and fast way for dual-Vdd design and finds the
best solution close to the global optimum. But, gates in a possible VDDL group are tested
and then assigned to VDDL based on the heuristic priority chosen by higher logic depth for
CVS structure. VDDH gates always feed into VDDL gates in CVS, thus the heuristic algorithm
is very simple and straightforward for implementation. For the ECVS structure, we should
consider the power and delay overheads of level converters during the optimization process
for low power. Heuristic algorithms may not be affordable to find nearly global optimal
solutions. MILP algorithm always guarantees the global optimum for low power, but can
not handle very large circuits due to exponential run-time. If we reduce the optimization
space using gate slack analysis, MILP can drastically reduce its exponential run-time and
find the global optimal solution. Using both benefits from MILP and gate slack analysis, we
can efficiently and accurately solve the optimization problem for multi-Vdd design.
81
Bibliography
[1] M. Anis, S. Areibi, M. Mahmoud, and M. Elmasry, ?Dynamic and Leakage Power Reduction in
MTCMOS Circuits using an Automated Efficient Gate Clustering Technique,? in Proceedings
of 39th Design Automation Conference, 2002, pp. 480?485.
[2] A. Asenov, A. Brown, J. Davies, S. Kaya, and G. Slavcheva, ?Simulation of Intrinsic Parameter
Fluctuations in Decananometer and Nanometer-Scale MOSFETs,? IEEE Trans. on Electron
Devices, vol. 50, no. 9, pp. 1837?1852, Sept. 2003.
[3] D. Bol, R. Ambroise, D. Flandre, and J.-D. Legat, ?Interests and Limitations of Technology
Scaling for Subthreshold Logic,? IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, vol. 17, no. 10, pp. 1508?1519, Oct. 2009.
[4] D. Bol, D. Flandre, and J.-D. Legat, ?Technology Flavor Selection and Adaptive Techniques
for Timing-Constrained 45nm Subthreshold Circuits,? in Proceedings of 14th ACM/IEEE In-
ternational Symposium on Low Power Electronics and Design, 2009, pp. 21?26.
[5] D. Bol, D. Kamel, D. Flandre, and J. D. Legat, ?Nanometer MOSFET Effects on the
Minimum-Energy Point of 45nm Subthreshold Logic,? in Proceedings of 14th International
Symposium on Low Power Electronics and Design, 2009, pp. 3?8.
[6] A. Bryant, J. Brown, P. Cottrell, M. Ketchen, J. Ellis-Monaghan, and E. J. Nowak, ?Low-
Power CMOS at Vdd = 4kT/q,? in Proceedings of Device Research Conference, 2001, pp.
22?23.
[7] B. H. Calhoun and A. Chandrakasan, ?Characterizing and Modeling Minimum Energy Op-
eration for Subthreshold Circuits,? in Proceedings of International Symposium on Low Power
Electronics and Design, 2004, pp. 90?95.
[8] B. H. Calhoun and A. P. Chandrakasan, ?Ultra-Dynamic Voltage Scaling (UDVS) Using Sub-
Threshold Operation and Local Voltage Dithering,? IEEE Journal of Solid-State Circuits,
vol. 41, no. 1, pp. 238?245, 2006.
[9] B. H. Calhoun, A. Wang, and A. Chandrakasan, ?Modeling and Sizing for Minimum Energy
Operation in Subthreshold Circuits,? IEEE Journal of Solid-State Circuits, vol. 40, no. 9, pp.
1778?1786, Sept. 2005.
[10] B. H. Calhoun, A. Wang, N. Verma, and A. Chandrakasan, ?Sub-Threshold Design: The
Challenges of Minimizing Circuit Energy,? in Proceedings of International Symposium on Low
power Electronics and Design, Oct. 2006, pp. 366?368.
[11] C. Chen, A. Srivastava, and M. Sarrafzadeh, ?On Gate Level Power Optimization Using Dual-
Supply Voltages,? IEEE Trans. Very Large Scale Integration (VLSI) Systems, vol. 9, no. 5,
pp. 616?629, 2001.
82
[12] J. C. Chi, H. H. Lee, S. H. Tsai, and M. C. Chi, ?Gate Level Multiple Supply Voltage
Assignment Algorithm for Power Optimization Under Timing Constraint,? IEEE Trans. Very
Large Scale Integration (VLSI) Systems, vol. 15, no. 6, pp. 637?648, 2007.
[13] D. Chinnery and K. Keutzer, Closing the Power Gap Between ASIC & Custom: Tools and
Techniques for Low Power Design. Springer, 2007.
[14] D. G. Chinnery and K. Keutzer, ?Closing the Gap Between ASIC and Custom: An ASIC
Perspective,? in Proceedings of 37th Design Automation Conference, 2000, pp. 637?642.
[15] D. G. Chinnery and K. Keutzer, ?Linear Programming for Sizing, Vth and Vdd Assignment,?
in Proceedings of International Symposium on Low power Electronics and Design, 2005, pp.
149?154.
[16] R. Corless, G. Gonnet, D. Hare, D. Jeifrey, and D. Knuth, ?On the Lambert W Function,?
Advances in Computational Mathematics, vol. 5, pp. 329?359, 1996.
[17] A. U. Diril, Y. S. Dhillon, A. Chatterjee, and A. D. Singh, ?Level-Shifter Free Design of Low
Power Dual Supply Voltage CMOS Circuits Using Dual Threshold Voltages,? IEEE Trans. on
VLSI Systems, vol. 13, no. 9, pp. 1103?1107, Sept. 2005.
[18] R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, ?Near-Threshold
Computing: Reclaiming Moore?s Law Through Energy Efficient Integrated Circuits,? Proceed-
ings of IEEE, vol. 98, no. 2, pp. 253?266, Feb. 2010.
[19] R. Fourer, D. M. Gay, and B. W. Kernighan, AMPL: A Mathematical Programming Language.
Brooks/Cole-Thomson Learning, 2003.
[20] H. Fuketa, M. Hashimoto, Y. Mitsuyama, and T. Onoye, ?Transistor Variability Modeling
and its Validation With Ring-Oscillation Frequencies for Body-Biased Subthreshold Circuits,?
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 18, no. 7, pp. 1118?
1129, jul 2010.
[21] R. Graybill and R. Melhem, Power Aware Computing. Kluwer Academic/Plenum Publishers,
2002.
[22] M. Hamada, Y. Ootaguro, and T. Kuroda, ?Utilizing Surplus Timing for Power Reduction,?
in Proceedings of IEEE Conference on Custom Integrated Circuits, 2001, pp. 89?92.
[23] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, ?Nanometer Device Scaling in Subthreshold
Logic and SRAM,? IEEE Trans. on Electron Devices, vol. 55, no. 1, pp. 175?185, 2008.
[24] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhan-
dali, T. Austin, D. Sylvester, and D. Blaauw, ?Exploring Variability and Performance in a
Sub-200-mV Processor,? IEEE Journal of Solid-State Circuits, vol. 43, no. 4, pp. 881?891,
Apr. 2008.
[25] R. B. Hitchcock, Sr., ?Timing Verification and the Timing Analysis Program,? in Proceedings
of 19th Design Automation Conference, 1982, pp. 594?604.
[26] http://www device.eecs.berkeley.edu. BSIM4.6.1 MOSFET Model.
[27] http://www.synopsys.com. HSPICE User Guide: Simulation and Analysis.
[28] F. Ishihara, F. Sheikh, and B. Nikolic, ?Level Conversion for Dual-Supply Systems,? IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 12, no. 2, pp. 185?195, feb
2004.
83
[29] M. Jamal Deen, M. H. Kazemeini, and S. Naseh, ?Ultra-Low Power VCOs - Performance
Characteristics and Modeling (invited),? in Proceedings of the Fourth IEEE International
Caracas Conference on Devices, Circuits and Systems, 2002, pp. C033?1?C033?8.
[30] M. R. Kakoee, A. Sathanur, A. Pullini, J. Huisken, and L. Benini, ?Automatic Synthesis
of Near-Threshold Circuits with Fine-Grained Performance Tunability,? in Proceedings of
International Symposium on Low Power Electronics and Design, aug 2010, pp. 401?406.
[31] M. Keating, D. Flynn, R. Aitken, A. Gibbons, and K. Shi, Low Power Methodology Manual
for System-on-Chip Design. Springer, 2007.
[32] C. H. I. Kim, H. Soeleman, and K. Roy, ?Ultra-Low-Power DLMS Adaptive Filter for Hear-
ing Aid Applications,? IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 11, no. 6, pp. 1058?1067, 2003.
[33] K. Kim and V. D. Agrawal, ?Dual Voltage Design for Minimum Energy Using Gate Slack,? in
Proceedings of IEEE International Conference on Industrial Technology & 43rd IEEE South-
eastern Symposium on System Theory, Mar. 2011, pp. 405?410.
[34] K. Kim and V. D. Agrawal, ?Minimum Energy CMOS Design with Dual Subthreshold Supply
and Multiple Logic-Level Gates,? in Proceedings of 12th International Symposium on Quality
Electronic Design, Mar. 2011, pp. 689?694.
[35] K. Kim and V. D. Agrawal, ?True Minimum Energy Design Using Dual Below-Threshold
Supply Voltages,? in Proceedings of 24th International Conference on VLSI Design, Jan. 2011.
[36] M. Kulkarni, ?A Reduced Constraint Set Linear Program for Low-Power Design of Digital
Circuits,? Master?s thesis, Auburn University, Dept. of ECE, Auburn, Alabama, Dec. 2010.
[37] M. Kulkarni and V. D. Agrawal, ?A Tutorial on Battery Simulation - Matching Power Source
to Electronic System,? in Proceedings of 14th IEEE VLSI Design and Test Symposium, July
2010.
[38] M. Kulkarni and V. D. Agrawal, ?Energy Source Lifetime Optimization for a Digital Sys-
tem through Power Management,? in Proceedings of 43rd IEEE Southeastern Symposium on
System Theory, Mar. 2011, pp. 75?80.
[39] S. H. Kulkarni, A. N. Srivastava, and D. Sylvester, ?A New Algorithm for Improved VDD
Assignment in Low Power Dual VDD Systems,? in Proceedings of International Symposium
on Low Power Electronics and Design, 2004, pp. 200?205.
[40] S. H. Kulkarni and D. Sylvester, ?High Performance Level Conversion for Dual VDD Design,?
IEEE Transactions on VLSI Systems, vol. 12, no. 9, pp. 926?936, 2004.
[41] V. Kursun and E. G. Friedman, Multi-Voltage CMOS Circuit Design. Wiley, 2006.
[42] J. Kwong and A. Chandrakasan, ?Variation-Driven Device Sizing for Minimum Energy Sub-
threshold Circuits,? in Proceedings of International Symposium on Low Power Electronics and
Design, Oct. 2006, pp. 8?13.
[43] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, ?A 65 nm Sub-Vt Microcon-
troller With Integrated SRAM and Switched Capacitor DC-DC Converter,? IEEE Journal of
Solid-State Circuits, vol. 44, no. 1, pp. 115?126, Jan. 2009.
[44] R. Lyon and C. Mead, ?An Analog Electronic Cochlea,? IEEE Transactions on Acoustics,
Speech and Signal Processing, vol. 36, no. 7, pp. 1119?1134, July 1988.
84
[45] A. K. Majhi, V. D. Agrawal, J. Jacob, and L. M. Patnaik, ?Line Coverage of Path Delay
Faults,? IEEE Trans. VLSI Systems, vol. 8, pp. 610?614, Oct. 2000.
[46] A. K. Majhi, J. Jacob, L. M. Patnaik, and V. D. Agrawal, ?On Test Coverage of Path Delay
Faults,? in Proceedings of 9th International Conference on VLSI Design, Jan. 1996.
[47] D. Markovic, C. Wang, L. Alarcon, T.-T. Liu, and J. Rabaey, ?Ultralow-Power Design in
Near-Threshold Region,? Proceedings of the IEEE, vol. 98, no. 2, pp. 237?252, feb 2010.
[48] J. Meindl and J. Davis, ?The Fundamental Limit on Binary Switching Energy for Terascale
Integration (TSI),? IEEE Journal of Solid-State Circuits, vol. 35, no. 10, pp. 1515?1516, 2000.
[49] J. D. Meindl and R. N. Swanson, ?Potential Improvements in Power-Speed Performance of
Digital Circuits,? Proceedings of the IEEE, vol. 59, no. 5, pp. 815?816, May 1971.
[50] D. Nguyen, A. Davare, M. Orshansky, D. Chinnery, B. Thompson, and K. Keutzer, ?Mini-
mization of Dynamic and Static Power Through Joint Assignment of Threshold Voltages and
Sizing Optimization,? in Proceedings of International Symposium on Low Power Electronics
and Design, 2003, pp. 158?163.
[51] K. Nose and T. Sakurai, ?Optimization of VDD and VTH for Low-Power and High-Speed
Applications,? in Proceedings of ACM/IEEE Design Automation Conference, 2000, pp. 469?
474.
[52] G. Ono and M. Miyazaki, ?Threshold-Voltage Balance for Minimum Supply Operation,? in
Symposium on VLSI Circuits Digest of Technical Papers, 2002, pp. 206?209.
[53] M. Pelgrom, A. Duinmaijer, and A. Welbers, ?Matching Properties of MOS Transistors,?
IEEE Journal of Solid-State Circuits, vol. 24, no. 5, pp. 1433?1439, Oct. 1989.
[54] J. M. Rabaey, A. Chandrakasan, and B. Nikolic, Digital Integrated Circuits. Prentice-Hall,
second edition, 2003.
[55] T. Raja, ?A Reduced Constraint Set Linear Program for Low-Power Design of Digital Cir-
cuits,? Master?s thesis, Rutgers University, Dept. of ECE, New Brunswick, New Jersey, Mar.
2002.
[56] T. Raja, V. D. Agrawal, and M. L. Bushnell, ?Minimum Dynamic Power CMOS Circuit
Design by a Reduced Constraint Set Linear Program,? in Proceedings of 16th International
Conference on VLSI Design, Jan. 2003, pp. 527?532.
[57] Y. Ramadass and A. Chandrakasan, ?Voltage scalable switched capacitor dc-dc converter for
ultra-low-power on-chip applications,? in Proceedings of Power Electronics Specialists Confer-
ence, 2007, pp. 2353?2359.
[58] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, ?Leakage Current Mechanisms and
Leakage Reduction Techniques in Deep-Submicrometer CMOS Circuits,? Proceedings of the
IEEE, vol. 91, no. 2, pp. 305?327, 2003.
[59] K. Roy, L. Wei, and Z. Chen, ?Multiple-Vdd Multiple-Vth CMOS (MVCMOS) for Low Power
Applications,? in Proceedings of IEEE Int. Symposium on on Circuits and Systems, 1999, pp.
366?370.
[60] T. Sakurai and A. Newton, ?Alpha-Power Law MOSFET Model and Its Applications to CMOS
Inverter Delay and Other Formulas,? IEEE Journal of Solid-State Circuits, vol. 25, no. 2, pp.
584?594, Apr. 1990.
85
[61] G. Schrom and S. Selberherr, ?Ultra-Low-Power CMOS Technologies,? in Proceedings of In-
ternational Semiconductor Conference, volume 1, Oct. 1996, pp. 237?246 vol.1.
[62] M. Seok, S. Hanson, Y. S. Lin, Z. Foo, D. Kim, Y. Lee, N. Liu, D. Sylvester, and D. Blaauw,
?The Phoenix Processor: a 30pW Platform for Sensor Applications,? in Proceedings of IEEE
Symposium on VLSI Circuits, 2008, pp. 188?189.
[63] M. Seok, D. Sylvester, and D. Blaauw, ?Optimal Technology Selection for Minimizing Energy
and Variability in Low Voltage Applications,? in Proceedings of 13th International Symposium
on on Low Power Electronics and Design, 2008, pp. 9?14.
[64] A. Srivastava, D. Sylvester, and D. Blaauw, ?Power Minimization using Simultaneous Gate
Sizing, Dual-Vdd and Dual-Vth Assignment,? in Proceedings of 41st Design Automation Con-
ference, 2004, pp. 783?787.
[65] V. Sundararajan and K. K. Parhi, ?Synthesis of Low Power CMOS VLSI Circuits Using Dual
Supply Voltages,? in Proceedings of 36th Design Automation Conference, 1999, pp. 72?75.
[66] R. Swanson and J. Meindl, ?Ion-Implanted Complementary MOS Transistors in Low-Voltage
Circuits,? in IEEE International Solid-State Circuits Conference Digest of Technical Papers,
Feb. 1972, pp. 192?193.
[67] K. Usami and M. Horowitz, ?Clustered Voltage Scaling Technique for Low-Power Design,? in
Proceedings of International Symposium on Low Power Design, 1995, pp. 3?8.
[68] K. Usami, M. Igarashi, F. Minami, T. Ishikawa, M. Kanzawa, M. Ichida, and K. Nogami,
?Automated Low-Power Technique Exploiting Multiple Supply Voltages Applied to a Media
Processor,? IEEE Journal of Solid-State Circuits, vol. 33, no. 3, pp. 463?472, 1998.
[69] R. Vaddi, S. Dasgupta, and R. P. Agarwal, ?Device and Circuit Design Challenges in the
Digital Subthreshold Region for Ultralow-Power Applications,? VLSI Design, vol. 2009, pp.
1?14, Jan. 2009.
[70] M. Venkatasubramanian, ?Energy Effciency and Process Variation Tolerance of 45 nm Bulk
and High-k CMOS Devices,? Master?s thesis, Auburn University, Dept. of ECE, Auburn,
Alabama, May 2011.
[71] M. Venkatasubramanian and V. D. Agrawal, ?Subthreshold Voltage High-k CMOS Devices
Have Lowest Energy and High Process Tolerance,? in Proceedings of 43rd IEEE Southeastern
Symposium on System Theory, Mar. 2011, pp. 100?105.
[72] N. Verma, J. Kwong, and A. Chandrakasan, ?Nanometer MOSFET Variation in Minimum
Energy Subthreshold Circuits,? IEEE Transactions on Electron Devices, vol. 55, no. 1, pp.
163?174, jan 2008.
[73] E. Vittoz and J. Fellrath, ?CMOS Analog Integrated Circuits Based on Weak Inversion Op-
erations,? IEEE Journal of Solid-State Circuits, vol. 12, no. 3, pp. 224?231, June 1977.
[74] E. Vittoz, B. Gerber, and F. Leuenberger, ?Silicon-Gate CMOS Frequency Divider for Elec-
tronic Wrist Watch,? IEEE Journal of Solid-State Circuits, vol. 7, no. 2, pp. 100?104, Apr.
1972.
[75] E. A. Vittoz, ?The Electronic Watch and Low-Power Circuits,? IEEE Solid-State Circuits
Newsletter, vol. 13, no. 3, pp. 7?23, 2008.
[76] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-Threshold Design for Ultra Low-Power
Systems. Springer, 2006.
86
[77] A. Wang and A. Chandrakasan, ?A 180mV FFT Processor Using Subthreshold Circuit Tech-
niques,? in IEEE International Solid-State Circuits Conference Digest of Technical Papers,
2004, pp. 292?529.
[78] A. Wang, A. P. Chandrakasan, and S. V. Kosonocky, ?Optimal Supply and Threshold Scaling
for Subthreshold CMOS Circuits,? in IEEE Computer Society Annual Symposium on VLSI,
2002, pp. 5?9.
[79] L. Wei, K. Roy, and C.-K. Koh, ?Power Minimization by Simultaneous Dual Vth Assignment
and Gate-Sizing,? in Proceedings of the IEEE Custom Integrated Circuits Conference, 2000,
pp. 413?416.
[80] N. H. E. Weste and D. M. Harris, CMOS VLSI Design. Boston: Addison-Wesley, fourth
edition, 2009.
[81] B. Zhai, D. Blaauw, D. Sylvester, and K. Flautner, ?Theoretical and Practical Limits of
Dynamic Voltage Scaling,? in Proceedings of 41st Design Automation Conference, 2004, pp.
868?873.
[82] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, ?Analysis and mitigation of variability in
subthreshold design,? in Proceedings of International Symposium on Low Power Electronics
and Design, Aug. 2005, pp. 20?25.
[83] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, ?A Variation-Tolerant Sub-200 mV 6-T
Subthreshold SRAM,? IEEE Journal of Solid-State Circuits, vol. 43, no. 10, pp. 2338?2348,
Oct. 2008.
[84] B. Zhai, S. Pant, L. Nazhandali, S. Hanson, J. Olson, A. Reeves, M. Minuth, R. Helfand,
T. Austin, D. Sylvester, and D. Blaauw, ?Energy-Efficient Subthreshold Processor Design,?
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 17, no. 8, pp. 1127?
1137, aug 2009.
[85] W. Zhao and Y. Cao, ?New Generation of Predictive Technology Model for Sub-45 nm Early
Design Exploration,? IEEE Trans. Electron Devices, vol. 53, no. 11, pp. 2816?2823, 2006.
87