Polynomial-Time Algorithms for Designing Dual-Voltage Energy E cient
Circuits
by
Mridula Allani
A thesis submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Master of Science
Auburn, Alabama
December 12, 2011
Keywords: Energy e cient design, Dual-Vdd design, Level converter, Gate slack, Static
timing analysis (STA)
Copyright 2011 by Mridula Allani
Approved by
Vishwani D. Agrawal, Chair, James J. Danaher Professor of
Electrical and Computer Engineering
Victor P. Nelson, Professor of Electrical and Computer Engineering
Adit Singh, James B. Davis Professor of Electrical and Computer Engineering
Abstract
In this work, we propose a technique to use dual supply voltages in digital designs in
order to get a reduction in energy consumption. Three new algorithms are proposed for nd-
ing and assigning low voltage in dual voltage designs. Given a circuit and a supply voltage,
the rst algorithm nds an optimal lower supply voltage and the other two algorithms assign
that lower voltage to individual gates. A linear time algorithm described in the literature is
used for computing slacks for all gates in a circuit for a given supply voltage.
For the computed gate slacks and the lower supply voltage, the gates in the circuit
are divided into three groups. No gate in the rst group can be assigned the lower supply.
All gates in the second group can be simultaneously set to lower supply while maintaining
positive slack for all gates. The gates in the third group are assigned low voltage in small
subgroups. The gate slacks are recalculated after each such voltage assignment. Thus, the
overall complexity of this reduced power dual voltage assignment procedure is O(n2). But
in practice, it is observed that the computation time is close to linear in the circuit size.
SPICE simulations of ISCAS?85 benchmark circuits using the PTM model for 90-nm bulk
CMOS technology results show up to 60% energy savings.
ii
Acknowledgments
This thesis would not have been possible without the support from my advisor, Dr.
Vishwani Agrawal. I am thankful for his guidance and encouragement. It has been a
pleasure working with him.
I thank my advisory committee members, Dr. Victor Nelson and Dr. Adit Singh. Also,
I am grateful to the Auburn transition Leadership Institute (ATLI) and the Department of
Electrical Engineering at Auburn University for supporting my graduate study.
I sincerely appreciate Mrs. G. Kanaka Durga and Mrs. B. Sarala for their guidance
during my college. I also thank all my teachers who have inspired me through my life.
Graduate study at Auburn University has been a learning experience. I thank my
colleagues Farhana, Jia, Kim, Manish, Nitin, Priya and Suraj for their valuable inputs when
required. Without them my graduate experience would not have been this enriching.
I am indebted to my parents, brother, sister and all friends for their love and support.
Also, I am very thankful to my husband, Abhishek Bichal, for being with me through the
thick and thin.
Finally, I thank everyone who directly or indirectly helped and inspired me during the
course of my graduate study.
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Unique Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1.1 Power Dissipation in CMOS Circuits . . . . . . . . . . . . . . . . . . 6
2.2 Power Reduction in CMOS Circuits . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Technology Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Transistor Sizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Supply Voltage Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.4 Clock gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.5 Power gating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.6 Dual/multi-threshold designs . . . . . . . . . . . . . . . . . . . . . . 20
2.2.7 Variable threshold CMOS (VTCMOS) . . . . . . . . . . . . . . . . . 21
2.2.8 Dynamic threshold scaling . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.9 Variable supply and threshold voltages . . . . . . . . . . . . . . . . . 22
2.2.10 Transistor stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
iv
2.2.11 Minimum leakage vector method . . . . . . . . . . . . . . . . . . . . 23
2.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Overview of Static Timing Analysis Algorithms . . . . . . . . . . . . . . . . . . 25
3.1 Algorithm 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4 Slack-Based Algorithm For Finding an Optimal Lower Supply Voltage . . . . . . 35
4.1 Algorithm 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5 Slack-Based Algorithm For Dual Voltage Assignment Without Using Level Con-
verters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 An Example of a Chain of Inverters . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Algorithm 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
6 Asynchronous Level Converters in Dual-Voltage Design . . . . . . . . . . . . . . 65
7 Slack-Based Algorithm For Dual Voltage Assignment Using Level Converters . . 77
7.1 An Example of a Chain of Inverters . . . . . . . . . . . . . . . . . . . . . . . 78
7.2 Algorithm 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
8 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.1 Using the Proposed Algorithms at Higher Abstraction Levels . . . . . . . . . 87
8.2 Accommodate Level Converter Overheads . . . . . . . . . . . . . . . . . . . 88
8.3 Dual Threshold Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.4 Multiple-Supply Voltage Design . . . . . . . . . . . . . . . . . . . . . . . . . 89
8.5 Simultaneous Multiple VDD and Multiple Vth Designs . . . . . . . . . . . . . 89
8.6 Leakage Energy Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.7 Process Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
8.8 Short Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
v
List of Figures
2.1 Switching current in an inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Short circuit current in an inverter. . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Short circuit current characteristics of an inverter. . . . . . . . . . . . . . . . . . 10
2.4 Leakage currents in an nMOS transistor. . . . . . . . . . . . . . . . . . . . . . . 11
2.5 Scaling of an nMOS transistor. . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Clock gating of a ip- op. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Clock gating of a clock-tree. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.8 Power gating implementation using a header and a footer sleep transistors. . . . 18
2.9 A stack of nMOS transistors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.1 Generalization of a synchronous circuit. . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Example of a combinational circuit . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Traditional STA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 New STA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Power reduction versus ratio of low voltage and high voltage, [56, 73]. . . . . . . 36
4.2 Optimum supply voltages and minimum power dissipation for multiple VDD val-
ues, [56, 73]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
vi
4.3 Delay increment versus slack for gates of c880 circuit, VH = 1:2V; VL = 0:58V. . 38
4.4 Su Vs. VL for ISCAS?85 benchmark circuits. . . . . . . . . . . . . . . . . . . . . 42
4.5 Energy savings per gate vs. VL for c880 when level converters are not allowed.
G and P+G show the number of gates. Energy savings per gate is (V2H V2L)=V2H. 42
4.6 Energy savings per gate vs. VL for c880 when level converters are allowed. G
and P+G show the number of gates. Energy savings per gate is (V2H V2L)=V2H. 43
4.7 Energy versus voltage for c880 when level converters are not allowed. . . . . . . 51
4.8 Energy versus voltage for c880 when level converters are allowed. . . . . . . . . 51
4.9 Energy versus voltage for c1355 when level converters are not allowed. . . . . . 52
4.10 Energy versus voltage for c1355 when level converters are allowed. . . . . . . . . 52
5.1 A chain of ten inverters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2 Energy and delay measurements at various values of V1 and V2 for the inverter
chain of Figure 5.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.3 Delay increment versus initial slack for gates of c880 circuit, VH = 1:2V; VL =
0:49V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.4 Delay increment versus nal slack for gates of c880 circuit, VH = 1:2V; VL = 0:49V. 59
5.5 Delay increment versus initial slack for gates of c880 circuit when Tc is increased
by 5%, VH = 1:2V; VL = 0:67V. . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.6 Delay increment versus nal slack for gates of c880 circuit when Tc is increased
by 5%, VH = 1:2V; VL = 0:67. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
vii
5.7 Delay increment versus initial slack for gates of c499 circuit, VH = 1:2V; VL =
0:91V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.8 Delay increment versus nal slack for gates of c499 circuit, VH = 1:2V; VL = 0:91V. 61
5.9 An illustration conforming to the topological constraints on gates with slacks
greater than Su. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.10 An illustration violating the topological constraints on gates with slacks greater
than Su. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 Experimental setup for level converter. . . . . . . . . . . . . . . . . . . . . . . . 66
6.2 Equivalent setup for level converter experiments. . . . . . . . . . . . . . . . . . 66
6.3 Energy per cycle versus VL. VH = 1.2V. . . . . . . . . . . . . . . . . . . . . . . 67
6.4 Dual voltage with level converter inserted before each fan-in of a high-voltage gate. 68
6.5 Dual voltage with level converter inserted after the output of a low-voltage gate. 68
6.6 Dual Cascode Voltage Switch (DCVS) level converter [75, 96]. . . . . . . . . . . 69
6.7 Pass transistor logic level converter [31, 50, 51]. . . . . . . . . . . . . . . . . . . 70
6.8 Conventional Type II level converter [49, 97] . . . . . . . . . . . . . . . . . . . . 71
6.9 Conventional single supply level converter [5, 31, 53]. . . . . . . . . . . . . . . . 72
6.10 Recently published single supply level converter [5]. . . . . . . . . . . . . . . . . 73
6.11 Contention mitigated level converter [23, 53, 89] . . . . . . . . . . . . . . . . . . 74
6.12 Level shifting NAND2 with one high voltage and one low voltage inputs. M1 has
higher threshold voltage than M2 [19]. . . . . . . . . . . . . . . . . . . . . . . . 75
viii
7.1 Energy and delay measurements at various values of V1 and V2 for the inverter
chain of Figure 7.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.2 A chain of ten inverters with a level converter. . . . . . . . . . . . . . . . . . . . 79
7.3 Delay increment versus initial slack for gates of c880 circuit, VH = 1:2V; VL =
0:58V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.4 Delay increment versus nal slack for gates of c880 circuit, VH = 1:2V; VL = 0:58V. 82
7.5 Delay increment versus initial slack for gates of c880 circuit when Tc is increased
by 5%, VH = 1:2V; VL = 0:6V. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.6 Delay increment versus nal slack for gates of c880 circuit when Tc is increased
by 5%, VH = 1:2V; VL = 0:6V. . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.7 Delay increment versus initial slack for gates of c499 circuit, VH = 1:2V; VL =
0:91V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.8 Delay increment versus nal slack for gates of c499 circuit, VH = 1:2V; VL = 0:91V. 84
8.1 Trends in parameter variations [70]. . . . . . . . . . . . . . . . . . . . . . . . . . 90
ix
List of Tables
4.1 Optimal lower supply voltage values (VL), VL1;VL2, their arithmetic mean and
their geometric mean and energy savings estimate when level converters are not
allowed for ISCAS?85 benchmark circuits. VH = 1:2V. . . . . . . . . . . . . . . 47
4.2 Optimal lower supply voltage values (VL), VL1;VL2, their arithmetic mean and
their geometric mean and energy savings estimate when level converters are al-
lowed for ISCAS?85 benchmark circuits. VH = 1:2V. . . . . . . . . . . . . . . . 48
4.3 Optimal lower supply voltage values (VL) and energy savings estimate using Algo-
rithms 2 when level converters are not allowed for ISCAS?85 benchmark circuits.
VH = 1:2V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.4 Optimal lower supply voltage values (VL) and energy savings estimate using Al-
gorithms 2 when level converters are allowed for ISCAS?85 benchmark circuits.
VH = 1:2V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1 Optimal lower supply voltage values (VL) and energy savings using Algorithms 2
and 3 for ISCAS?85 benchmark circuits, VH = 1:2V. . . . . . . . . . . . . . . . . 57
5.2 Optimal lower supply voltage values (VL) and energy savings using Algorithms 2
and 3 for ISCAS?85 benchmark circuits, when TC is increased by 5%, VH = 1:2V. 57
6.1 DCVS level converter, VH = 1:2V. . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Pass transistor logic level converter, VH = 1:2V. . . . . . . . . . . . . . . . . . . 70
6.3 Conventional Type II level converter, VH = 1:2V. . . . . . . . . . . . . . . . . . 71
6.4 Conventional single supply level converter, VH = 1:2V. . . . . . . . . . . . . . . 72
6.5 Recently published single supply level converter, VH = 1:2V. . . . . . . . . . . . 73
6.6 Contention mitigated logic level converter, VH = 1:2V. . . . . . . . . . . . . . . 74
7.1 Optimal lower supply voltage values (VL) and energy savings using Algorithms 2
and 4 for ISCAS?85 benchmark circuits using Contention Mitigated Level Con-
verter, VH = 1:2V. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Optimal lower supply voltage values (VL) and energy savings using Algorithms 2
and 4 for ISCAS?85 benchmark circuits using Contention Mitigated Level Con-
verter, when Tc increased by 5%, VH = 1:2V. . . . . . . . . . . . . . . . . . . . 81
x
Chapter 1
Introduction
Power and performance are the main design constraints in modern VLSI design. An
e ective design is obtained with the optimization of these two parameters. Decreasing the
supply voltage results in decreased performance. Hence, a trade o is necessary between
power consumption and circuit delay. The use of multiple supply voltages to reduce energy
consumption is a very commonly used technique in CMOS circuits. This results from the
fact that the dynamic power of a CMOS circuit is directly proportional to the square of its
supply voltage [9, 73]. The underlying idea of this technique is to trade timing slacks for
power reduction under given timing requirements. Generally, the gates in the critical path
are kept at high supply voltage and the gates in non-critical paths are put to lower supply
voltage, thus avoiding timing violations.
Multiple supply voltage design using the time slack is a popular technique for power
reduction, [8, 10, 13, 21, 27, 41, 42, 43, 63, 78, 80, 82, 84, 86, 100]. Several references [12,
27, 62, 63] discuss dual voltage designs in FPGAs. In this work we use a linear time slack
analysis algorithm [41] to calculate gate slacks.
Apart from using multiple supply voltages we can combine multiple supply voltage
technique with multiple thresholds and transistor sizing to achieve additional power savings.
Such work is described in articles [6, 18, 30, 37, 78, 83, 85, 86]. The trade-o s of energy
savings and increase in delay when using gate sizing and multiple supplies is studied in [87].
Authors of [102] propose an algorithm that assigns voltage islands generated using a slack
distribution and voltage partitioning algorithms in order to reduce the power consumption
and peak temperature. Article [64] proposes a linear time approximation algorithm for
1
partitioning of circuit components into various voltage islands during high-level synthesis.
Other works [11, 15, 26, 57, 58, 98] deal with the voltage partitioning problem.
In some instances not meeting the timing requirements of an application is not catas-
trophic. These are called soft deadlines. Hua and Qu [29] describe an algorithm for energy
e cient designs using dual voltage to meet such soft deadlines.
1.1 Motivation
There are many algorithms for assigning a xed low voltage value to the gates of a
circuit, but relatively fewer algorithms to nd the lower voltage. Many of these works
related to voltage assignment have used a low voltage, VL value of 70% of the high voltage,
VH [6, 21, 50, 56, 73]. The works described in [10, 50, 84] claims that the optimal value of
VL for minimizing total power is 50% of VH. Authors in [101] describe an algorithm to nd
the lowest feasible supply voltages according to their slacks from a set of given voltages. An
algorithm to nd an optimum VL value is described in [41]. The authors assign a low voltage
value to a group of gates based on a modi cation of CVS algorithm and then calculate energy
over a set of low voltages. The VL value resulting in minimum energy is chosen to be the
lower voltage. This algorithm requires the voltage assignment to be done for each voltage
value and is exhaustive in nature. We understand that the lower voltage for minimum energy
operation of a dual-supply design is dependent on the circuit topology and is not a xed
value for all circuits. These reasons motivate us to propose a new algorithm that nds the
VL accurately and with linear complexity.
Authors in [92, 94] propose two ways of assigning a lower voltage in a dual-voltage
design. The algorithm in [92] is called Clustered Voltage Scaling (CVS). This method puts
a topological restriction on the resultant dual-voltage design which does not allow a low
voltage gate at the input of a high voltage gate. Following this an Enhanced Clustered
voltage Scaling (ECVS) algorithm was proposed in [94]. This algorithm allows a low voltage
gate to drive a high voltage gate with the inclusion of an asynchronous level converter at the
2
boundary. Reference [93] describes the methodology to synthesize circuits for the CVS and
ECVS structures. Authors of [88] describe three algorithms for dual voltage designs based
on linear programming models. Several other algorithms have been proposed which aim
at power reduction using dual voltage designs. Not many aim at energy-e cient designs.
Also, the complexity of these algorithms is often polynomial. Thus, we are motivated to
propose two quadratic time algorithms for dual-voltage assignment one of which allows level
converters and the other which does not allow level converters.
1.2 Unique Contributions
An algorithm that nds a lower supply voltage maximizing the energy savings from a
speci c group of gates in a given circuit is proposed. This algorithm is proportional to the
number of gates in the circuit as each gate slack is compared to the di erence between the
low voltage delay and high voltage delay of the gate. Thus, its complexity is linear, i.e.
O(n), where n is the number of the gates. Two other algorithms are proposed which assign
the lower supply voltage found using the gate slacks. The gate slacks are recalculated after
each iteration of voltage assignment. Both the voltage assignment and the slack calculating
algorithms are linear in time. Thus, the overall complexity of this method quadratic, i.e,
O(n2), where n is the number of gates. But in practice, it is observed that the computation
time is close to linear-time. All algorithm is written in perl programming language. Energy
savings of up to 60% is seen for ISCAS?85 benchmark circuits and when the critical timing
is relaxed, savings of up to 70% are observed. Such high savings have not been reported
in the earlier work. Su cient theoretical and experimental work has been done to validate
these results.
1.3 Problem Statement
The aim of this work is to:
3
1. Develop a linear time algorithm that nds the optimal lower voltage for the dual-Vdd
design for CMOS circuits.
2. Develop new algorithms for voltage assignment in dual-Vdd design for given high
and low voltages using linear-time gate slack analysis to reduce computation time.
1.4 Organization
The proposed work is organized as follows. Chapter 2 provides the background infor-
mation on the various existing low power/energy-e cient design techniques. The sources of
energy consumption in CMOS designs have also been detailed.
Chapter 3 gives an overview of the static timing analysis algorithms and illustrates the
algorithms proposed in [24, 41] with examples.
In Chapter 4, we propose a new linear time algorithm to nd the lower supply voltage
using gate slacks for dual-voltage designs. Relevant theorems are proved and experimental
data has been discussed.
Chapter 5 proposes a new quadratic time algorithm for low voltage assignment based
on gate slacks which uses certain constraints that put restrictions on circuit topology ass
proposed in [92]. Relevant results have been discussed.
In Chapter 6, various level converter designs have been studied for energy and delay.
The level converter with least energy consumption is selected as we are concentrating on
energy-e cient designs.
In Chapter 7, we propose another new quadratic time algorithm for low voltage as-
signment based on gate slacks which uses the selected level converter to lift the topological
constraints imposed earlier.
Finally, Chapter 8 concludes the work with suggestions for future research. The need to
include level converter overheads in the algorithm is realized from the results of Chapter 7.
Hence, we propose this for the future work.
4
Chapter 2
Background
2.1 Introduction
Power and performance are the main design constraints in modern VLSI design. An
e ective design is obtained with the optimization of these two parameters. This requirement
for higher performance and lower power directly results from the increasing demand for
portable electronic devices. Fortunately, the VLSI technology has been able to integrate
millions of gates into a single chip of small area, thus contributing towards reduction of
size and increase in portability of the electronic devices incorporating these chips. Also,
the humongous integration on a small area has increased the speed of the devices owing to
greater current drive at the transistor level and smaller propagation paths at the gate level
and several other factors. But the continuous scaling has led to increased power dissipation
in CMOS circuits. Hence, we need to nd an optimum design where the power dissipation
is acceptable and the computational speeds are not disappointing.
The power and timing e ciency can be captured by the following metrics [73]:
1. Average power
2. Power per MIPS (million instructions per second)
3. Energy
4. Energy-delay product
5. Energy-delay squared
6. Peak power
The choice of the design metric used for design optimization depends on the application
and its performance speci cations and power budgets. The various mechanisms causing
5
power dissipation in CMOS circuits and the popular power reduction techniques are discussed
in the following sections of this chapter.
2.1.1 Power Dissipation in CMOS Circuits
Power dissipation in CMOS digital circuits can be classi ed as: static power and dynamic
power. The static dissipation arises due to the leakage current through the resistive paths
from the power supply to ground during steady state. The static power exists even in
the inactive blocks of the digital circuits to hold the logic states between the switching
events [73]. The leakage current through the transistors results from substrate injection and
sub-threshold e ects; and is primarily determined by fabrication technology considerations.
Dynamic power is due to the switching of capacitive loads and the short circuit current
resulting when both the pFET and nFET of a CMOS circuits are ON during switching
transitions.
The above-mentioned sources of power consumption are summed up in the equation
P = CVDD Vf +ISCVDD +IleakVDD (2.1)
where is the activity factor, C is the total capacitance, VDD is the supply voltage, f is the
clock frequency, V is the input voltage swing [9, 73], ISC is the short circuit current and
Ileakageis the leakage current. In equation 2.1, the rst term corresponds to the switching
component of power, the second is due to the short circuit current and the last term de nes
the power dissipated by the leakage current.
When the input voltage swing is equal to VDD, the switching power will be
PSwitching = CV2DDf (2.2)
This switching component of power is due to the logic transition at the gates as shown in
Figure 2.1. When the input goes from 1-to-0, the pMOS transistor is switched ON and
6
Figure 2.1: Switching current in an inverter.
the capacitance at the output node, C, is charged from the supply VDD. When the input
transitions from 0-to-1, the nodal capacitance is discharged into the ground through the
nMOS transistor. Thus, during each transition power given by equation 2.2 is expended.
7
The short circuit current ows from the supply to the ground during a logic transition
for a short duration for which both the nMOS and pMOS trees of the gate are turned ON.
It is shown in Figure 2.2. The graphs in Figure 2.3 show the characteristics of the short
circuit current. We see that during a rise transition at the input of an inverter, when Vin is
at 0V, the pMOS transistor is ON and the nMOS transistor is OFF. Once the input becomes
more than VDD Vtn the nMOS transistor turns ON. As the input keeps rising and becomes
greater than VDD Vtp the nMOS transistor turns OFF. We notice that during the period
when the input is VDD Vtn and when the input is VDD Vtp, both the nMOS and pMOS
transistors are ON. This causes a current to ow from the supply to the ground through
the ON-resistances of the two transistors. Similarly, we can interpret the fall transition in
the graph 2.3. Hence, the short circuit current depends on the rise and fall times of the
inputs. The short circuit power of an unloaded inverter shown in gure 2.2 is found to be
approximately
PSC = 12(VDD Vth)3 f (2.3)
where is the transistor coe cient, Vth is the threshold voltage of the transistor assuming
both pMOS and nMOS have the same threshold voltage, is the rise/fall time and f is the
frequency [72, 77].
From the equations 2.1, 2.1 2.3 we observe that the dynamic power consumed by digital
CMOS circuits can be reduced by adjusting the supply voltage, threshold voltages, node
capacitance, activity factor, clock frequency, low voltage swing signaling techniques, etc.
Finally, the third component of power is the static power. This power is consumed
to maintain the logic levels at the gate outputs when the inputs are in steady state. The
main sources of leakage currents are the source/substrate or drain/substrate p-n junction
band-to-band tunneling leakage current (ID), gate direct tunneling current (IG), which is
due to the tunneling of electron or a hole from the substrate to the gate through the gate
oxide, and the subthreshold leakage current from drain to source through the channel of an
OFF transistor (ISUB) [35, 72, 73, 74].
8
Figure 2.2: Short circuit current in an inverter.
The subthreshold leakage or weak inversion conduction occurs when the gate voltage
is below Vth. This is due to the di usion current of the minority carriers in the channel.
Consider a CMOS inverter. When the input is low, the nMOS transistor is turned OFF
and the output voltage is high. Even when VGS is 0V, there is still a current owing in
the channel of the OFF nMOS device due to the VDD potential between the drain and the
source. This component of leakage is the dominant component due to the low threshold
voltages being used. The subthreshold current is given the following equation
ISUB = Ae( qnkT)(VGS Vth0 0VSB+ VDS)(1 e qVDSkT ) (2.4)
A = 0C0ox WL
eff
(kTq )2e1:8 (2.5)
where 0 is the linearized body-e ect coe cient, VSB is the source-to-bulk voltage of the
transistor, VGS is the gate-to-source voltage of the transistor, is the DIBL coe cient, VDS
is the drain-to-source voltage of the transistor, Cox is the gate oxide capacitance, 0 is the
zero-bias mobility, Vth0 is the zero-body-bias threshold voltage and n is the subthreshold
swing coe cient of the transistor [35, 72, 73, 74].
9
Figure 2.3: Short circuit current characteristics of an inverter.
The gate-direct tunneling current density is given by the equation
JDT = A(Voxt
ox
)2e
B(1 (1 Vox ox)
32
)
Vox
tox (2.6)
10
Figure 2.4: Leakage currents in an nMOS transistor.
where Vox is the potential drop across the thin oxide, ox is the barrier height of tunneling
electron, and tox is the oxide thickness. The various components of the gate direct tunneling
current are the edge direct tunneling which goes into the source and drain extension regions
(Igso and Igdo), the gate to channel current (Igc) which gets divided into the gate-to-source
(Igsc) and the gate-to-drain (Igdc) currents and the gate-to substrate leakage current (Igb)
[72]. The gate direct tunneling current increases exponentially with the reduction in oxide
thickness and supply voltage. This component of leakage is fast becoming the dominant
part. In 90nm devices, gate leakage can be 1/3 of the subthreshold leakage. It can become
equal to subthreshold leakage in 65nm devices [35]. Using a high-k dielectric for gate oxide
can help reduce the gate leakage.
The band-to-band tunneling (BTBT) current from the drain and source to substrate is
the reverse-biased pn-junction current consisting of the di usion/drift near the edge of the
depletion region and the current due to the electron-hole pair generation in the depletion
region. In the case of an inverter, when the input is low the pMOS is ON, the nMOS is
11
OFF, and the output is high. This high-voltage at the output makes the drain-to-substrate
voltage of the nMOS transistor equal to VDD. This results in leakage from the drain to the
substrate. This component of leakage current increases signi cantly when the wells or the
substrate are highly doped or the drain and source at very high potential than the substrate.
The band-to-band tunneling leakage can be reduced by forward biasing the substrate or by
reducing the doping near the drain-substrate and source-substrate junctions [72, 73, 74, 76].
The BTBT current density is given by
JBTBT = A(EVrevE1=2
g
)e( B
E3=2g
E ) (2.7)
A =
p2m q3
4 3h2 (2.8)
B = 4
p2m
3qh (2.9)
E =
vu
ut2qNaNd(Vrev +Vbi)
si(Na +Nd) (2.10)
where Na and Nd are the doping in the p and n sides, si is the permittivity of silicon, m is
the e ective mass of the electron, q is the charge of an electron, h is 1=2 times the Planck?s
constant, E is the electric eld at the junction, Eg is the energy-band gap, Vrev is the applied
reverse bias, and Vbi is the built-in voltage across the junction [76].
In addition to the above discussed leakage mechanisms, other secondary e ects also exist.
Gate induced drain leakage IGIDL is the current owing from drain to substrate because of
the high electric eld e ect in the drain region due to the high drain-to-gate voltage [35, 76].
In short-channel devices, due to the proximity of the drain and the source, the depletion
regions at the drain-substrate and source-substrate junctions extend into the channel. As
the channel length is reduced, if the doping is kept constant, the separation between the
12
depletion region boundaries decreases. An increase in the reverse bias across the junctions
also pushes the junctions nearer to each other. When the combination of channel length and
reverse bias leads to the merging of the depletion regions, punchthrough currents Ipunchthrough
ow from drain to source [76].
Leakage currents are exponentially increasing with the scaling of CMOS. They increase
with the decrease in threshold voltages and the length of the transistors. The operating
temperature increases during device operation. This causes an exponential increase in the
leakage currents.
The power dissipated by the CMOS digital circuits can be reduced while retaining the
required functionality and performance. An outline of few of the techniques proposed for
the power reduction in CMOS circuits is given in the following section.
2.2 Power Reduction in CMOS Circuits
Minimizing power consumption in digital CMOS circuits can be done either by slightly
altering the make of the transistors at the fabrication level, or by making changes in the
implementation level, or the architecture of the basic building blocks of a complex CMOS
system.
At the designing level, power reduction can be achieved by making low-power dissipation
the key objective of the design. This allows a simpler design with the reduction of the
number of primitive gates and the resistive paths connecting them. This reduction in the
number of dissipating components decreases the overall power dissipation of the system while
performing the same required operations.
2.2.1 Technology Scaling
This technique proposes to scale all the voltages and linear dimensions by a constant
factor (< 1) as shown in Figure 2.5 [73, 74, 76]. Since the electric eld in the device
remains constant the device current and wire capacitances are all scaled by the same factor.
13
Figure 2.5: Scaling of an nMOS transistor.
The energy scales by a factor 3, as both voltage and current are scaled by . The delay
is improved by a factor . Hence the energy-delay product decreases by a factor 4. But
this method requires all voltages to be scaled down, including the threshold voltage. But
the threshold voltage is limited by the leakage current through the OFF transistors and
allowable static power [25, 73, 74].
In recent technologies, the supply voltage has reached 1V. This has imposed physical
limitations to scaling. The built-in potentials of the device and Si bandgap energy do not
change with scaling. They can be adjusted to required levels by increasing the doping or
forward biasing the substrate. Because of thermodynamic limitations the threshold voltage of
a device cannot be scaled further while keeping the leakage currents manageable. Reaching
this ultimate level of scaling is stalled by increasing the electric eld in the device by a
factor (> 1). But this increases the power dissipation and decreases the reliability of the
device. Hence, there is a limit to this scaling process due to various physical phenomena and
alternative methods should be chosen to overcome this [73].
14
2.2.2 Transistor Sizing
We can reduce the junction capacitance and the overall gate capacitance of the transis-
tors, which cause loading on the input, by making the transistors much smaller than the load
capacitance applied [25, 72]. Reduction in size reduces the current drive of the transistors,
thereby decreasing the system performance. This technique provides a trade-o between en-
ergy and speed as lower sizing results in lower energy and greater delay. Gates in the critical
paths are sized according to the performance requirements and the gates in the non-critical
paths are sized to minimize area to reduce power dissipation [72].
2.2.3 Supply Voltage Scaling
This method employs a reduction in supply voltage. Decreasing the supply voltage
results in large savings in power because switching power is proportional to the square of the
supply voltage. Supply voltage scaling also reduces the leakage power since the gate leakage
and GIDL and DIBL leakage components are also reduced [72]. As the capacitance and
the threshold voltage are kept constant, the performance of the system decreases. At larger
voltages, reducing the supply voltage reduces energy for an acceptable change in delay. At
voltages near the device threshold, small supply changes cause a large change in delay for
a modest change in energy. Hence, this technique is best suitable at higher voltage ranges.
For voltages comparable to the threshold voltage this method cannot be used.
When using a lower VDD, we can use parallel or/and pipelined architectures to maintain
decent circuit speeds [9, 25, 73]. Parallelism allows one to build N functional units, and
perform N operations at the same time. This can decrease the time required for the operation
of these N functions when compared with the same system having only one output.
Besides saving energy, parallelism decreases the system delay. Parallelism allows a
greater number of functions to be performed simultaneously at the same time. In systems
with single output function, these operations must be carried out sequentially and thus,
require a greater time. Thus we can decrease the supply voltages of each of these blocks
15
while keeping the delays equal to that of the single block delay and eventually decrease the
energy consumption of the system. Additionally, parallelism allows us to shut down some
blocks when not being used. Hence, an energy-e cient system can be designed.
But all systems cannot be designed to extract parallelism. The system design, in some
cases, may not give the desired level of parallelism, i.e., the number of functions being per-
formed simultaneously might not be as large as required. Besides, designing such alternative
systems requires a lot of creative insight and may also extend the time required for designing.
We can use di erent voltages to di erent functional blocks based on the performance
requirements of those speci c blocks. There are many algorithms proposed in the literature
to do this. The various voltage scaling schemes can be classi ed as follows [35]:
1. Static Voltage Scaling (SVS) - di erent blocks and subsystems are given di erent,
xed supply voltages. Higher supply voltage is used for blocks falling on the critical path of
the circuit and lower supply voltage on non-critical paths [72]. If two voltages are used, it is
called dual-VDD technique and when more than two supplies are used, it is called multi-VDD
technique.
2. Multi-level Voltage Scaling (MVS) - is an extension of the static voltage scaling case
where a block or subsystem is switched between two or more voltage levels. Only a few,
xed, discrete levels are supported for di erent operating modes.
3. Dynamic Voltage and Frequency Scaling (DVFS) - an extension of MVS where a
larger number of voltage levels are dynamically switched to follow changing workloads. The
highest VDD is used when the functional block is required to perform at the fastest frequency.
When the performance demand is less, a lower VDD is used for the same block [72]. The
voltages must be limited to the range over which delay and voltage track monotonically [35].
A minimum clock speed that meets the workload requirements is determined and then the
lowest supply voltage that will support the clock speed is determined [35]. Articles [14,
28, 45, 46, 67, 69] talk about di erent techniques used for employing Dual/multiple supply
voltage scaling.
16
Figure 2.6: Clock gating of a ip- op.
Figure 2.7: Clock gating of a clock-tree.
4. Adaptive Voltage Scaling (AVS) - an extension of DVFS where a control loop is used
to adjust the voltage. A feedback system is implemented between the voltage scaling power
supply and delay-sensing performance monitor [35].
Also, the supply voltage scaling used to reduce dynamic power increases the gate delay.
To mitigate this performance degradation, the threshold voltages of the transistors are also
lowered with the supply voltage. This increases the leakage currents in the CMOS devices.
2.2.4 Clock gating
In synchronous CMOS circuits, at the block level if the clock is gated to the functional
blocks, the inactive blocks are e ectively turned OFF with the stop in the clock pulse. This
transition avoids the switching of the nodes in inactive blocks of the system while maintaining
17
Figure 2.8: Power gating implementation using a header and a footer sleep transistors.
their logic values [72, 73]. Consider the circuit in Figures 2.6 and 2.7. When the enable is
1, the circuit works as usual. When it is 0, the clock signal is cut o from the circuit and
the switching activity in the ip- op or clock-tree is stopped. Clock gating decreases the
switching activity in the ip- ops, gates in the fanout of the ip- ops, and the clock-tree
and thus, decreases the power. The performance impact of clock gating is small [73].
2.2.5 Power gating
Power-gating is a technique used to reduce the subthreshold leakage power of CMOS
circuits. It does not reduce gate leakage current [39]. A high-Vt pMOS transistor (header) or
a high-Vt nMOS transistor (footer) controlled by a sleep bar or sleep signal is used to isolate
the supply rails, VDD or VSS, respectively, from the rest of the logic when operating in sleep
mode. This prevents the oating nodes at the logic outputs thus stopping the ow of short
circuit currents into the active blocks connected to the power-gated block outputs [72, 81].
The two implementations of power gating are shown in Figure 2.8. Usually, any one of the
schemes is implemented. Largely, pMOS header is used as compared to nMOS footer to
reduce the leakage currents.
18
The sleep transistor has to be designed carefully so that the voltage drop across that is
not very large during the active state. This ensures that the e ective supply voltage to the
logic block is maintained at VDD [73]. Also, there is a dynamic power consumption associated
with the turning ON/OFF of the sleep transistor. The power savings associated with the
stand-by mode of a power gated logic block should be more than the power consumed while
turning-ON/OFF of the sleep transistor [73]. The grouping of logic blocks for power gating
implementation is also important. We should make sure to minimize the number of sleep
transistors used while increasing the leakage energy savings [73]. Several algorithms have
been published in the literature to address this problem.
The high-Vt sleep transistors decrease the subthreshold leakage currents. In technologies
less than 60nm, gate leakage is expected to become larger than the subthreshold leakage
current [39, 72]. In footer transistor implementation, once the logic block output has settled
to VDD, the footer induces reverse biased gate leakage current due to its bias condition. Also,
nMOSFETs connected to primary inputs are other sources of gate leakage current (input
gate leakage current). Assuming that the primary inputs are driven by logic blocks that
are active in, the footer induces large reverse biased gate leakage current (70% of forward
biased one [39]) due to the large voltage di erence between its gate and source/drain. The
reverse biased gate leakage current of pMOSFET is negligible; implying that a header would
be less leaky. In the header implementation, the nMOS transistors driven by logic 1 inputs
experience a forward biased input gate leakage current which might imply that a footer which
passes a reverse biased current is superior to a header [39, 60]. Though di erent mechanisms
of leakage exist in each of the power gating schemes and to di erent extents, when total
leakage current is considered, using a header transistor proves to be bene cial [39].
A header is recommended when a switch is sized such that lower delay penalty can be
tolerated, but a footer is superior when delay penalty is larger [39]. When a low-Vt sleep
transistor is used instead of a high-Vt, a footer is always better than a header. This is because
the sub-threshold leakage current of a switch now makes up large portion of total leakage
19
currents due to its exponential dependency on the threshold voltage, and the sub-threshold
leakage current is smaller in a footer than in a header due to the smaller size of a footer with
the same delay penalty [39].
The power consumption of power gating circuits can be further reduced by controlling
primary inputs. By providing logic 1 to all primary inputs of a power-gated circuit with a
footer (similarly logic 0 in the case of a header), input gate leakage currents can be virtually
eliminated. However, gate leakage current of a footer goes up due to elevated voltage of a
virtual ground. This is because the additional transistors in input control circuits induce
current that ows through a footer in addition to the current from the logic block [39]. Also,
when the logic is cut o from the ground using a leakier footer transistor, the di erence in
the potentials between this virtual ground and the VSS will cause ground noise causing signal
integrity problems [35, 81].
Thus, depending on the application requirements, a header or a footer sleep transistor
is used. For designs with greater area constraints, an nMOS footer is a better option.
Availability of less leaky transistor design in a given technology is important for power
gating implementation. A thick oxide transistor can be used where an increase in area can
be tolerated. A high-Vt thin oxide sleep transistor is preferred where area has higher priority;
the higher leakage of the thin oxide can be reduced by longer gate and/or reverse body bias
techniques [35].
2.2.6 Dual/multi-threshold designs
The threshold voltage of a transistor is given by
Vth = Vth0 + (
q
j 2 F +VSBj
q
j2 Fj) (2.11)
20
where Vth0 is the zero source-bulk bias threshold voltage, F is the substrate Fermi potential,
is the body-e ect coe cient. This shows that reverse body biasing a transistor increases
threshold voltage and thus decreases the leakage currents [73].
In this method, as with dual-VDD technique, slower transistors with high-Vth are used
in non-critical paths and low-Vth transistors are used in critical paths to achieve desired
performance. Increasing the channel length or its doping density increases the threshold
voltage. A thicker oxide layer can be used to increase the threshold voltage., but this reduces
only subthreshold leakage and not the gate direct tunneling [72].
2.2.7 Variable threshold CMOS (VTCMOS)
In this technique a zero body bias is used in the active mode and a high reverse body
bias is applied in the standby mode to increase the threshold voltage, thereby decreasing
the standby leakage. This capability of reverse body biasing technique to reduce the leakage
current lowers as the technology scales [36, 72]. This is because of the exponential increase
in the band-to-band tunneling current at the source-substrate and drain-substrate junctions
in scaled technologies [36, 72].
Authors in [91] have described a technique which reduces the standby leakage by using
high-Vth devices and during active mode a forward body bias is applied to reduce the Vth.
This reduces the short channel e ects, limiting the reduction of the leakage current. It has
been shown that forward body biasing with high-Vth and reverse body biasing reduces leakage
20 times while only reverse body biasing with low-Vth reduces leakage only 3 times [72].
2.2.8 Dynamic threshold scaling
Body biasing techniques are used to change the threshold voltage according to the
performance requirements. When performance demand is low, clock frequency is lowers
and the threshold voltage is increased to reduce run-time leakage. In standby mode the
threshold voltage is increased to its maximum limit to reduce the standby leakage. A zero
21
body bias with lowest possible threshold is used when the performance requirements are the
highest [72].
2.2.9 Variable supply and threshold voltages
The threshold voltage and the supply voltage can be varied simultaneously to get re-
quired power and timing e ciency. The ON current of a transistor is given by
IDS = CoxWL (VGS Vth)
2
2 (2.12)
where is the mobility, Cox is the gate capacitance, Vth is the threshold voltage, VGS is the
gate-to-source voltage. As we reduce the supply voltage, the input voltage swing, VGS, is
also reduced. To maintain the performance, the current drive, IDS, has to be maintained.
For this the threshold voltage also has to be decreased. If we use low-VDD and low-Vth, the
power dissipation will increase because lowering Vth results in exponential increase in the
subthreshold leakage current. Hence, we require lower VDD and higher Vth on non-critical
paths to reduce power and maintain performance on the critical paths. We can also use
dynamic supply scaling and dynamic threshold scaling together to reduce power [73, 74].
2.2.10 Transistor stacking
Using a stack of transistors in the OFF state reduces the leakage current because of
negative gate-to-source biasing, body-e ect induced threshold voltage increase, and increased
threshold voltage due to reduced drain-to-source voltage. The time required for the leakage
current in the transistor stacks to converge to its nal value depends on the intermediate
node capacitances and the threshold voltages of the transistors [99]. Turning o more than
one transistor in the stack increases the source voltage of the stack and thus, reverse biases
the source. Consider a stack of four nMOS transistors as shown in Figure 2.9 [72]. All four
of these transistors are in the OFF state. If only one transistor is OFF, the voltage at the
22
Figure 2.9: A stack of nMOS transistors.
source node of that transistor will be zero as the other transistor are ON and act as a short
circuit path to ground. When all the transistors are OFF, the source voltages of the OFF
transistors, except the one directly connected to the ground, will be non-zero causing the
self-reverse biasing of the transistors. This reverse bias reduces the leakage currents through
the transistors [72].
2.2.11 Minimum leakage vector method
The leakage current of a logic block depends on the inputs as this determines the number
of ON and OFF transistors. The minimum leakage is seen when the resistance between the
supply and ground through the transistors is maximum [73]. Hence, there is a combination
of inputs for which the leakage power is the least. This input combination is called the
minimum leakage vector (MLV). Various algorithms have been described in the literature to
nd the MLV of a circuit [2, 3]. After determining the MLV for a logic block, it is applied
to its inputs in the standby state to reduce the standby leakage.
23
2.3 Conclusion
A reduction of power dissipation is important to save energy. However, it is not the
only concern. Lower power dissipation decreases the heat generated within the chip, in
between the individual transistors. This reduction in heat allows further increase in the level
of integration and thus, contributing towards further reduction of size. Besides, the lower
dissipation allows the packaging to be done using light and inexpensive plastics, rather than
much costlier and heavier ceramic cases. Further, in larger devices like PCs and servers, fans
are provided to cool the system during operation. This is not practical in mobile devices
like laptops and personal digital assistants (PDAs). Hence, to increase the portability of the
electronic devices power dissipation must be lowered. This allows their manufacturing to be
done at lower costs and also reduces the chip area.
24
Chapter 3
Overview of Static Timing Analysis Algorithms
Static timing analysis has been rst described in [4, 20, 24, 34, 55]. The author in [55]
describes a linear time algorithm that nds the shortest/longest paths in order of their total
delay. Static Timing Analysis (STA) is a technique to verify whether a digital design meets
the given timing speci cations. An alternate approach used to verify the timing is timing
simulation, which can verify the functionality as well as the timing of the design. The term
timing analysis is used to refer to either of these two methods - static timing analysis, or
timing simulation, [7]. Timing Analysis (TA) is de ned by [24] as \a design automation
program which provides an alternative to the hardware debugging of timing problems".
In STA, the analysis of the design is carried out statically and does not depend upon
input vectors. In simulation based timing analysis, a stimulus is applied on primary inputs,
resulting behavior is observed and veri ed, then time is advanced with new inputs applied,
and the new behavior is observed and veri ed, [7]. This makes STA faster than the simulation
based methods for large circuits.
Any synchronous circuit can be seen as consisting of combinational and sequential blocks
such that the combinational block receives the data from a clocked ip- op and outputs
data to another clocked ip- op, Figure 3.1. The input and output ip- ops can operate at
di erent clocks. The timing analysis program veri es whether all combinational paths meet
the required timing speci cations, that is, that data signals arriving at storage elements do
not cause any set-up or hold-time violations. The setup time is the time taken by the the
data signal to arrive at a ip- op. This should be within the given clock period. The hold
time is the time for which the data should not change after a clock transition so that there
is no unexpected pass-through of data through a ip- op. That is, the data needs to be
25
Figure 3.1: Generalization of a synchronous circuit.
held constant for at least a minimum time ensure that a ip- op captures the intended data
correctly. These checks ensure that the proper data is ready and available for capture and
latched in for the new state [7].
The timing analysis of a circuit produces ?slack? at each gate to provide a measure of
any timing violations. The slack is de ned in [24] as the di erence between the required
arrival time and the actual arrival time. Negative slack indicates that the data signal takes
?slack? units longer time to arrive at the storage elements than the given speci cations. A
positive slack corresponds to an early signal which comes at the storage element ?slack? units
earlier than the speci ed timing. In order to meet the speci cations, the data signal should
not be too late that it does not meet the set-up time requirements and not too fast that it
violates the hold time of the previous signal. Thus, the entire design is analyzed and the
required timing checks are performed for all possible paths of the design. Thus, STA is a
complete and exhaustive method for verifying the timing of a design [7] as opposed to the
simulation based timing analysis which does not verify the parts of the circuit which are not
excited by input stimulus.
The basic timing analysis algorithm is analogous to the Program Evaluation and Review
Techniques (PERT) [68] algorithm when run non-probabilistically. The application of PERT
to logic design is described in [47]. A combinational circuit can be viewed as a Directed
Acyclic Graph (DAG). In this DAG, the various nodes represent the logic blocks and edges
represent the interconnects. For example, the combinational circuit shown in Figure 3.2 can
be redrawn as a DAG shown in Figures 3.3 and 3.4. In these gures, each node is associated
26
with a weight, which represents the largest delay of the gate, and the edges do not have any
weights associated with them. The delays of the gates can be speci ed as minimum and
maximum delays or as rise and fall delays or in any other manner. Always the maximum
of the delays is considered for the longest path analysis and the minimum for the shortest
path analysis. When the interconnect delays are speci ed, then the maximum/minimum of
the sum of a node and its output edge weights is considered. In the proposed work, longest
path analysis is considered. The primary inputs (PI) and primary outputs (PO) as assumed
to emerge from the source and sink nodes respectively. The source and sink nodes do not
have any weights associated with them.
In the static timing analysis program proposed in [24], we traverse from the primary
inputs to the primary outputs through each gate such that we assign the longest path delay
through this gate from the primary inputs as its weight. Once the weights of all nodes are
assigned, we traverse in the reverse direction, i.e., from the primary outputs to the primary
inputs to assign the slack of each node. The slack of each node is calculated as the minimum
di erence between the required arrival time at the gate output so that the primary outputs
meet the timing constraints and the actual arrival time of the gate output. This is best
described by Figure 3.3.
In Figure 3.3, PI represent the primary inputs, PO the primary outputs and A through J
represent logic gates. Each logic gate is characterized by a pair of delays, (mindelay, maxdelay),
which represent the minimum and maximum delays through the gate, respectively. The
actual arrival times, required arrival times and the slacks of each gate are computed. The
results are recorded at each node. For example, for node E, minimum delay is 1, maximum
delay is 2, actual arrival time is 7, actual arrival time is 11 and the slack is 4. These values
are noted at the top of node E in the DAG of Figure 3.3 as f(1,2), 7, 11, 4g.
Now, consider gate E. We have to nd the longest path delay through this gate. We
begin with the primary inputs and assume that the start of the algorithm represents the time
0 units. Gate E has outputs of gates A and B as its inputs. It is observed that output of gate
27
Figure 3.2: Example of a combinational circuit
A arrives 3 units earlier than gate B, which arrives at 5 time units from the start. Hence, the
longest path from the primary inputs to the inputs of gate E would be 5 time units. Now,
the maximum delay of gate E is 2 units. Hence, the maximum time from the start till the
time that the output of E takes to stabilize is the sum of 5 and 2, i.e., 7 units. Hence, the
path delay till the output of node E from the primary inputs is 7 units. The path delays of
each node are calculated thus. From the outputs of E we can take two di erent paths to the
primary outputs, shown in green and red colors. But, the path that goes through the gate
G and I, green, yields the longest path with path delay of 13 units. The longest path delay
of the circuit, shown in yellow, is 17 units.
Now, to nd the slacks of each gate, suppose the required arrival times of both the
primary outputs are 17, the longest path delay of the circuit. The output of node I arrives
at 13 units, i.e., its slack is 17 13 = 4 units. Similarly, the output of node J arrives at 17
units and its slack is 0 units. Since the maximum delay of node I is 4, the inputs of I should
28
Figure 3.3: Traditional STA.
arrive at 13 units of time. The path delay of gate G is 9, i.e., its actual arrival time is 9
units. Hence, the slack of gate G is 13 9 = 4 units. Now, we can reach node E from node
J, or node I or node G from the sink node. Required arrival time at node J?s input is 14
units, and node E?s output arrives at 7 units, hence, node E?s slack is 7 units. But node I?s
required arrival time is 13 units and this makes the slack of node E 6 units. Now, node G?s
required arrival time is 11 units and hence, the slack of node E is 4 units. Thus, we obtain 3
values for the slack of node E. We take the minimum of these three values because we need
to meet all the three timing requirements. Hence, the actual slack of gate E is 4 units.
The slacks of each gate are found as described above. It is observed that the gates on
the longest path of the circuit, also called the critical path, have 0 slack when the required
time at the primary output is equal to the critical path delay. The two primary outputs
in the example are assumed to have the same required arrival time. But, in practice, they
can have di erent arrival times. This adds minor changes to the slack calculation. If the
required arrival time at node I is 15 and node J remains at 17. Then the slack of gate I will
be 2 units instead of 4. This algorithm requires us to traverse the graph two times, once
29
forward to calculate the path delays and once backwards to calculate the gate slacks. Thus,
the time taken is proportional to the number of gates and the complexity of the algorithm
is linear, O(n) where n is the number of gates.
3.1 Algorithm 1
We use the O(n) slack calculation algorithm proposed in [40, 43, 41, 42, 44] to nd out
the gate slacks for a given circuit. The aim is to nd the nd the longest path delay from
the primary inputs to the primary outputs through a given gate. And then the gate slacks
are calculated as the di erence of critical path delay and the longest path delay through the
gate. The step by step approach to implement this algorithm is described below using the
example shown in Figure 3.2. The directed acyclic graph (DAG) for this circuit is shown in
Figure 3.4.
A given circuit is levelized. We start from the primary inputs and traverse to the outputs
of each gate such that we always choose the longest delay path. Simultaneously we traverse
in the opposite direction to nd the longest delay path from the primary outputs to the
respective gate output. The sum of the longest delay from the primary inputs to the gate
output and the longest delay from the primary outputs to the gate output will give the
longest path delay through that gate. Let TPI(i) be the longest time for the data signal to
arrive from a primary inputs to the output of gate i and TPO(i) be the longest time for the
data signal to reach a primary outputs from the output of gate i. The delay of the longest
path [41, 42, 43] through gate i is given by,
Dp;i = TPI(i) +TPO(i) (3.1)
The critical path delay of the circuit is given by
Tc = MaxfDp;jg8gate j (3.2)
30
Figure 3.4: New STA.
The slack for gate i is then calculated as
Si = Tc Dp;i (3.3)
Consider Figure 3.4. Here, PI represent the primary inputs, PO the primary outputs
and A through J represent logic gates. As in Figure 3.3, for node E, minimum delay is 1,
maximum delay is 2, TPI is 7, TPO is 6 and the slack is 4. These values are noted at the top
of node E in the DAG of Figure 3.3 asf(1,2), 7, 6, 4g. Note that in this case, we record the
path delay from the primary outputs as well. Also the slack of gate E is the same as the
slack obtained from the algorithm described in [24].
To begin with we levelize the circuit. For this, the source node and primary inputs are
considered to be in Level 0. Then all the nodes that have primary inputs as their inputs are
grouped in Level 1. So, nodes A, B, C, D and H form Level 1. Now the gates to which the
outputs of Level 1 gates are fed are considered to be in Level 2. That is, nodes E, F, and G
are in Level 2. Again, the outputs of Level 2 gates feed into Level 3 gates. Thus, nodes G,
H, I and J form Level 3. Note that G falls in Levels 2 and 3. We take the maximum of both.
31
Hence, G falls in Level 3. Similarly, H also falls in Level 3. Then Level 4 gates are gates that
have Level 3 gates as their inputs. This makes node I and J as Level 4. Finally, the primary
outputs and the sink node form Level 5, the highest level. These levels are separated by
dashed vertical lines in Figure 3.4.
Again, as in the previous algorithm, consider gate E. To nd the longest path delay
through this gate, begin with the primary inputs and assume that the start of the algorithm
represents the time 0 units. Gate E has outputs of gates A and B as its inputs. It is observed
that output of gate A arrives 3 units earlier than gate B, which arrives at 5 time units from
the start. Hence, the longest path from the primary inputs to the inputs of gate E would
be 5 time units. Now, the maximum delay of gate E is 2 units. Hence, the maximum time
from the start till the time that the output of E takes to stabilize is the sum of 5 and 2, i.e.,
7 units. Hence, the path delay till the output of node E from the primary inputs is 7 units.
The path delays of each node are calculated thus. From the outputs of E we can take two
di erent paths to the primary outputs, shown in green and red colors. But, the path that
goes through the gate G and I, green, yields the longest path with path delay of 13 units.
The longest path delay of the circuit, shown in yellow, is 17 units.
As we traverse from the primary inputs to the gate,concurrently, we also traverse from
the primary outputs to the gate to nd the longest path delay from the primary outputs
to the gate output. Thus, to nd the longest path delay from the primary outputs to the
output of gate E, we start from the sink node and try to reach node E. We can reach E from
node J or from node I or from node G . The path from node J, red line, results in a delay of
3 units at E, the maximum delay of gate J. Similarly the path from node I, red line, results
in delay of 4 units at E. The path through node G, green line, starts from the sink, goes
through node I, then through node G and then end at node E. The delay of this path will be
6, the sum of maximum delays of nodes I and G. Out of these three paths leading to node
E, the longest path delay will be 8 units. The longest path delay from the primary inputs
to the primary outputs through gate E is thus, 7 + 6 = 13 units.
32
Now, to nd the slacks of each gate, we subtract the longest path delay thus obtained
from the critical path delay of the circuit. The critical path delay is found similarly and is
17 units. Thus, the slack of gate E is 17 13 = 4 units.
Similarly, the slacks of each gate are found. Again, it is observed that the gates on the
longest path of the circuit, also called the critical path, have 0 slack when the required time
at the primary output is equal to the critical path delay. Also, the slacks of all gates that
have the same longest path have the same slack. This is in contrast to the slacks derived
from the previous algorithms because, here, slack is calculated as the di erence of critical
path delay and the longest path delay through a gate. G gates having the same longest path
will have the same path delay, whereas the slack de ned in [24] is the di erence between the
required arrival time and the actual arrival time each gate output. From Figures 3.3 and 3.4,
we observe that the slacks of all gates derived in both ways are the same.
The algorithm described in the previous section requires complete forward and backward
traversal. But this new algorithm requires only partial traversal of the graph, up to the node
output to calculate the slack of one gate. In order to nd the slacks of all the gates, we
require complete traversal of the graphs in both directions, but to calculate the slack of each
gate we need not wait until forward traversal is complete. Traversal in both directions can
be done simultaneously. Thus, the time taken for this algorithm is lower than the time taken
by the previous algorithm. The complexity of the algorithm is still linear, O(n) where n is
the number of gates, as the time is still proportional to the number of gates.
This algorithm nds the slacks corresponding to the longest path delay through a gate.
Similarly, the slacks corresponding to the shortest path delay can be calculated by considering
the minimum gate delay and the paths yielding the shortest delay. The shortest delay through
the gate E considered in the earlier example would then be 3 units starting from the source
and 1 from the sink. The shortest path problem is considered when we are required to check
for the signals arriving earlier than the required time.
33
The linear complexity of this algorithm makes it e ective for large machines in contrast
to other techniques such as delay simulation, which requires large numbers of test patterns,
and path tracing.
34
Chapter 4
Slack-Based Algorithm For Finding an Optimal Lower Supply Voltage
As discussed in Chapter 2, using multiple supply voltages is a widely used technique to
reduce power consumption in electronic circuits. In such designs, a level converter is required
to boost the logic ?1? level at the output of a low voltage gate, to the high voltage logic ?1?
level, when it has a high voltage gate at its fan-outs. If we never allow a low voltage gate
to drive a high voltage gate, then a level converter is not required. Level converters are
discussed in detail in Chapter 7. The authors in [56, 73] describe a method to nd the lower
supply voltage in a dual-voltage design. The power reduction ratio R is found as the ratio of
power dissipated in a dual-voltage design and a single voltage design with the higher supply
voltage and is expressed as
R = 1 (CVLC )f1 (VLV
H
)2g (4.1)
where CVL is the total capacitance of the low voltage cells and C is the total capacitance of
the single voltage circuit [56]. If p(t) is the number of paths whose delay is t when VL = VH
and path-delay t is the path-delay normalized to the cycle time, then the power reduction
for various distribution functions of p(t) is shown in Figure 4.1 [56].
Figure 4.1 shows that the minimum value of R depends on p(t) and it is minimum when
VL falls between 0.6VH and 0.7VH. This means that VL should always be between 0.6VH and
0.7VH to minimize power dissipation.
Same authors in [21] derive a relationship between the optimum voltage ratios in
multiple-VDD circuits. For dual-supplies fV1;V2g, V2V1 = 0:5 + 0:5VthV1 . For three supplies
fV1;V2;V3g, V2V1 = V3V2 = 0:6 + 0:4VthV1 . And for four supplies fV1;V2;V3;V4g, V2V1 = V3V2 = V4V3 =
0:7 + 0:3VthV1 . The authors claim that this rule of thumb gives the optimum supplies which
reduce power to a range within 1% of the optimum minimum. These results show that the
35
Figure 4.1: Power reduction versus ratio of low voltage and high voltage, [56, 73].
Figure 4.2: Optimum supply voltages and minimum power dissipation for multiple VDD
values, [56, 73].
power savings tend to saturate as the number of supplies is increased and also that the
savings decrease as the supply voltage is scaled and as Vth=VDD is higher.
36
An algorithm to nd the optimum VDD for each node is described in [10]. It is estimated
by nding
maxX
v
(V2H V2L) C(v) E(v) kv (4.2)
where, kv = 1, if VDD(v) VL or 0, if VDD(v) >VL.
The works described in [10, 50, 84] claims that the optimal value of VL for minimizing
total power is 50% of VH. Several linear programming algorithms have been proposed in [8,
32, 33, 41, 79] to select the optimum value for the lower voltage. Authors in [101] describe
an algorithm to nd the lowest feasible supply voltages according to their slacks from a set
of given voltages. Another slack-based algorithm has been proposed in [41]. The authors
have proposed a method to nd the optimum low voltage value based on the estimation of
minimum energy state for the circuit.
In our work, we propose an algorithm to nd the lower supply voltage using the gate
slack. The slack of a gate is de ned as the di erence of the critical path delay of the circuit
and the delay of the longest path through a gate. Thus, each gate has its own slack and the
gates with same slack fall on the same path unless there are two paths with equal delays.
Before describing our algorithm for assigning low-voltage, we propose three theorems
to categorize the gates in any given circuit based on the gate slacks. The slack of each gate
in a given circuit is plotted against the di erence of its respective low voltage delay and the
high voltage delay, as shown in Figure 4.3.
Theorem 4.1 The gates that fall above the 45o line in the ?delay increment versus slack?
plot cannot be assigned lower supply voltage without violating the positive slack constraint.
These gates fall in group 1.
37
Figure 4.3: Delay increment versus slack for gates of c880 circuit, VH = 1:2V; VL = 0:58V.
Proof
The slacks used in the ?Delay increment versus slack? plot are the slacks when all gates
are at higher supply voltage. That means that the gate delays are high voltage delays, dhi,
where i is the gate number.
When we put a gate to low voltage, its delay becomes the low voltage delay, dli, where
i is the gate number. Then the slack of the gate, is the di erence between the critical path
delay and the sum of gate delays through the longest path through the gate. This can be
written as in the following equation
slackh = Tc
nX
i=1
dhi (4.3)
38
where n is the number of gates in the circuit. When this gate is put to low voltage, its slack
will become
slackl = Tc (
nX
i=1
dhi dhi +dli) (4.4)
)slackl = Tc
nX
i=1
dhi (dli dhi) (4.5)
)slackl = slackh (dli dhi) (4.6)
The above equation suggests that the slack is reduced by dli dhi amount when a gate i is
put to low voltage. Now suppose,
slackh