Energy E ciency and Process Variation Tolerance of 45 nm Bulk and
High-k CMOS Devices
by
Muralidharan Venkatasubramanian
A thesis submitted to the Graduate Faculty of
Auburn University
in partial ful llment of the
requirements for the Degree of
Master of Science
Auburn, Alabama
May 9, 2011
Keywords: Low-power circuits, subthreshold voltage operation, high-k CMOS
technology, process variation
Copyright 2011 by Muralidharan Venkatasubramanian
Approved by
Vishwani D. Agrawal, Chair, James J. Danaher Professor of Electrical and
Computer Engineering
Adit D. Singh, James B. Davis Professor of Electrical and Computer Engineering
Charles E. Stroud, Professor of Electrical and Computer Engineering
Abstract
With transistor sizes being reduced to sub 45nm ranges, we have seen an im-
provement in speed, better performance, and deeper integration of digital circuits.
However, there has been a corresponding increase in power consumption, along with
greater energy dissipation. The reason is because of increased leakage current in the
channel. A proposed solution is a shift towards high-k materials and metal gate from
poly-silicon gate of yesteryear. Reduced feature sizes also su er from greater para-
metric process variations during lithography and cause identical circuits to behave
di erently.
With high-k technology overshadowing bulk technology ever since transistor sizes
hit 45nm, a greater understanding of how the properties of high-k technology will
a ect digital devices especially their speed, power consumption, and energy dissipated
upon voltage scaling is needed. Also, a better estimation of e ects of parametric
variations on circuits designed in high-k technology can provide valuable information
which can be used to improve current designs.
ii
Acknowledgments
First of all, I would like to thank my advisor, Dr. Vishwani Agrawal, for his
tremendous help and support he has provided during the pursuit of my thesis. His
knowledge, guidance, and patience were immensely bene cial, without which this
research would not have been completed successfully. I also would like to thank Dr.
Adit D. Singh, and Dr. Charles E. Stroud for being on my thesis committee. Their
courses provided me the theoretical knowledge which was used to pursue this research,
and their corrections and input was invaluable in the writing of this thesis.
I also would like to thank Manish Kulkarni, Kyungseok Kim, and Sarthak Kakkar
for being easily available for help when I had queries in my simulations tools like
HSPICE and MATLAB. Last but not the least, I would like to thank all my friends
and family whose never gave up on me, and whose moral support helped me pursue
my research successfully.
iii
Table of Contents
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1 Technology Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Voltage Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.1 Test Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.2 IC Design and Simulation Tools . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Leonardo Spectrum . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.2 Design Architect . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2.3 HSPICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Circuit Design and Simulation Techniques . . . . . . . . . . . . . . . 20
3.3.1 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.2 Monte Carlo Analysis . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 Predictive Technology Model . . . . . . . . . . . . . . . . . . . . . . . 21
4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.1 Test Circuit Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4.2 Minimum Energy Point Estimation . . . . . . . . . . . . . . . . . . . 24
4.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
iv
5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.1 Inverter Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.2 Minimum Energy Point Estimation . . . . . . . . . . . . . . . . . . . 29
5.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
v
List of Figures
3.1 Schematic of a 32-bit ripple carry adder. . . . . . . . . . . . . . . . . . . 18
3.2 Comparison of PTM and industry?s technology model for Vdd and Vth
scaling vs. e ective length (Leff) for for a range of technology nodes [64]. 22
3.3 Comparison of PTM and industry?s technology model for channel doping
concentration Nch vs. e ective length (Leff) for for a range of technology
nodes [64]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 Energy per cycle vs. Vdd for 32-bit ripple carry adder simulated in 45nm
bulk and high-k CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
5.2 E ect of Process variation on critical path delay for adder operating at
0.9 V designed in 45nm high-k technology. . . . . . . . . . . . . . . . . . 34
5.3 E ect of Process variation on critical path delay for adder operating at
0.3 V designed in 45nm high-k technology. . . . . . . . . . . . . . . . . . 35
5.4 E ect of Process variation on critical path delay for adder operating at
0.3 V designed in 45nm bulk technology. . . . . . . . . . . . . . . . . . . 36
5.5 Comparison of energy/cycle for di erent adder circuit operations when
threshold parameter (vth0) undergoes process variation. . . . . . . . . . 39
5.6 Comparison of energy/cycle for di erent adder circuit operations when
oxide thickness (tox) undergoes process variation. . . . . . . . . . . . . . 39
vi
5.7 Comparison of energy/cycle for di erent adder circuit operations when
both vth0 and tox undergo process variation. . . . . . . . . . . . . . . . . 40
6.1 Comparison of gate oxide and gate design between a bulk MOSFET (top),
and high-k MOSFET (bottom) [13]. . . . . . . . . . . . . . . . . . . . . 42
vii
List of Tables
5.1 Comparison of various currents and clock period of a CMOS inverter op-
erating at 0.4 V for 45nm bulk and high-k technologies. . . . . . . . . . . 29
5.2 Simulated performance of 32-bit ripple carry adder designed in 45nm bulk
technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.3 Simulated performance of 32-bit ripple carry adder designed in 45nm high-
k technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 Comparison of mean and standard deviation of critical path delays for 30
and 1000 random samples. . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.5 Yield of circuit designed in 45nm bulk and high-k technologies when af-
fected by process variations. . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.6 Comparison of average energy/cycle and clock period with and without
process variations for a 32-bit ripple carry adder. . . . . . . . . . . . . . 38
viii
Chapter 1
Introduction
Gordon Moore, co-founder of Intel, famously stated in 1965 \The amount of
transistors which can be inexpensively placed on an Integrated Circuit doubles every
18 months." This statement has been dubbed as Moore?s law and scaling down of
transistors has been the trend of the industry ever since [45]. We have come a long
way since 1971 when the semiconductor manufacturing process was 10  m, now we
are adopting 32nm technology and research is being done to implement 22nm tech-
nology and beyond. Evolving nanometer CMOS technologies provide better function-
ality, higher performance and greater levels of integration but su er from increased
subthreshold leakage and excessive process variation. With the industry and mar-
ket emphasizing on "performance per watt" and "performance per joule", there is a
growing need for new power and energy saving techniques for the increased power
and energy dissipation caused due to scaling down of transistors.
This thesis work examines the 45 nm bulk and high-k metal gate technologies.
Aggressive voltage scaling techniques described in previous research [18, 19, 41, 54, 60]
was used to evaluate how a chosen circuit?s (32-bit ripple carry adder) power and
energy consumption varies with a change in supply voltage (Vdd). After obtaining
the optimum Vdd at which the minimum energy per cycle occurs, the results were
compared the for both processes. The performance of a 32-bit ripple-carry adder
circuit was evaluated for the entire range of supply voltages over which it displays a
correct functionality. Lowering voltage increases delay, reducing the maximum clock
frequency. We use the maximum permissible clock rate and the energy per cycle at
that clock rate as two performance criteria.
1
The same 32-bit ripple carry adder circuit was designed in both 45 nm bulk and
high-k technologies in order to compare which technology is better suited for a low
power and higher energy e cient design. The minimum energy per cycle operation
occurs at a subthreshold voltage for both designs. For minimum energy, the bulk
technology has a very low performance ( 7 MHz). However, high-k technology
works at a much higher 250 MHz clock. Faster clock rate reduces the leakage energy
making high-k almost twice as energy e cient compared to bulk.
This thesis also examines the relationship between energy per cycle versus supply
voltage and how the minimum energy point behaves against speed and energy devia-
tions due to process related parametric variations for di erent technologies provides
a stable equilibrium. These deviations can be expected to be lower for high-k tech-
nology compared to those circuits designed in bulk technology that are commonly in
use. These deviations are also lower compared to those at higher supply voltages that
are commonly in use. Monte Carlo simulations for various parameters like threshold
parameter (vth0), oxide thickness (tox), and mobility (u0) of the technology model
 les [5] were conducted, and the variations were compared with the ideal scenario (no
process variations) to see how total power and energy varied with the ideal number.
We conclude that there is a signi cant improvement in performance when the
process is changed from bulk to high-k technology. The circuit modeled in high-k
showed an operating frequency of 250 MHz which is a signi cant jump from bulk
CMOS technology while retaining the advantage of low energy consumption. Fur-
thermore, from the nature of the energy versus Vdd graph, we hypothesize that the
operation at subthreshold Vdd is more resilient to process variation than that at the
normal Vdd for both high-k and bulk technologies.
This thesis is divided further into  ve more chapters. Chapter two is the back-
ground chapter, and it gives a brief summary of all the important work done in the
area of this thesis, and work that has been an inspiration to pursue this research.
2
This chapter has a section which explains about technology scaling, and why there
has been a shift from bulk to high-k technology. In the next section, it gives a back-
ground about di erent voltage scaling techniques used in this thesis research. The
last section talks about various kinds of process variation, and how each one can a ect
the threshold voltage and current of a digital circuit.
Chapter three talks about various tools and techniques used to conduct the ex-
periments of this thesis. The working of various design tools like Leonardo Spectrum
[4], Design Architect [1], HSPICE [3] etc. is explained, and how voltage scaling and
process variation techniques are applied in this particular experiment is elaborated.
Chapter four elaborates the methods by which the various tools and techniques dis-
cussed in the previous chapter are used to conduct the simulation of the circuit. It
explains how the experiment was conducted, and gives a step by step procedure so
as to provide the reader an easy guide to repeat the experiment if necessary.
Chapter  ve discusses the results of the experiment conducted using the methods
mentioned in the previous chapter. Using the literature review as reference, this
section validates the obtained results and explains the meaning of the data obtained
from the experiment. Finally, Chapter six concludes the thesis by summarizing all
the previous chapters, discusses the practical applications of this thesis, and gives an
overview about the future direction this research could lead to.
3
Chapter 2
Background
In the 1980s, there was a switch to CMOS logic from other forms like TTL,
NMOS logic etc. CMOS had a lot of advantages like high noise immunity, ability to
integrate higher logic functions on a chip, and low power consumption [65]. The total
power (Ptotal) dissipated in a CMOS logic gate consists of static power (Pstatic) and
dynamic power (Pdynamic). In a typical CMOS circuit, most of the power dissipated is
dynamic power while static power makes up a small part of the total power dissipated.
Scaling down of transistors every two years [45] showed a reduction in total power
dissipation because of a reduction in dynamic power as the transistors switched faster.
In 1971, Meindl and Swanson concluded that CMOS circuits o ered an advantage
of 10 to 1000 times in power-speed product when compared to a bi-polar junction
transistor (BJT) [43]. They expanded on the work done by Keyes [30] and derived
that \fundamental limits on power-speed performance are imposed by the uncertainty
energy, the thermal energy, and the minimum high speed switching power." They
identi ed the advantages of CMOS over BJT transistors like zero standby power drain,
reduced load capacitance, and lower supply voltage of a CMOS digital circuit. All this
was achieved without permitting degradation of fan-in and fan-out, and introducing
noise immunity to a logic gate. They also showed the relation between the delay in a
logical state and various circuit design parameters as shown below:
Td 10 LW tox"
ox
1
 n
CL
Vdd (2.1)
4
where
Td = Delay per logical state
L = Channel length
W = Channel width
tox = Oxide thickness
"ox = Oxide permittivity
 n = Electron surface mobility
CL = Load capacitance
Vdd = Supply voltage
However, when transistor sizes shrunk to 90 nm and below, two new trends
began to emerge. The  rst one was that the industry literally \ran out of atoms"
to insulate the transistor gate [13]. Basically, because of continuous scaling down of
transistors following Moore?s law [45], the SiO2 layer insulating the gate had become
only a few atoms thick and any further scaling would have caused a breakdown of the
transistor because of the heat due to high power dissipation. The scientists at Intel
came up with an innovative solution to counter this problem. They used materials
with high dielectrics (high-k) like metal and metal oxides to build the transistor
gates [2]. Other researchers were also researching into high-k transistor designs to
achieve greater power and energy savings [11, 36, 44, 48]. Kim et al. [35] highlighted
two components of leakage current. One is the sub-threshold leakage current (Isub),
which is a weak inversion current in the device, and the other is gate leakage current
(Iox) which is a tunneling current through the gate oxide insulation.
2.1 Technology Shift
Chandrakasan et al. [22] derived equations on how the leakage current com-
ponents (Isub and Iox) depend on various parameters like threshold voltage, supply
5
voltage, and oxide thickness. Sub-threshold current (Isub) is de ned as:
Isub = K1We
 Vth
nV 
 
1 e
 Vdd
V 
 
(2.2)
where
W = Gate width
V = Thermal voltage
Vth = Threshold voltage
Vdd = Supply voltage
K1 and n are experimentally derived parameters
Roy et al. [50] said that the subthreshold conduction is dominated by the di usion
current caused due to weak inversion. This weak inversion current de nes the o 
state leakage because of low Vth. The authors in [50] de ned a characteristic called
subthreshold slope which indicates how e ectively a transistor can be turned o when
Vdd is below Vth and is de ned as:
St = 2:3kTq
 
1 + CdmC
ox
 
(2.3)
where
Cdm = Depletion layer capacitance
Cox = Gate oxide capacitance
Ideally, the value of the slope in equation 2.3 should be as low as possible. With a
shift towards high-k technology, due to the increase in gate oxide capacitance because
of the use of high-k dielectric materials, transistors when operated in the subthreshold
region switch faster due to a larger gate oxide capacitance resulting in a faster rate
of decrease of Ioff.
Another component of subthreshold current is the Drain Induced Barrier Lower-
ing (DIBL) current. In a short channel device, the threshold voltage and subthreshold
6
current varies with the drain bias. It occurs when the energy barrier at the surface
between the source and drain preventing electrons from  owing to the drain reduces,
causing an increase in subthreshold current due to the lowering of the threshold volt-
age. Roy et al. showed that DIBL does not change the subthreshold slope but only
a ects the threshold voltage [50].
From equation 2.2, we can see there are two ways to reduce subthreshold current.
The  rst one is to reduce the supply voltage hereby, reducing the exponential term
in the equation and hence reducing the current. The other technique is to increase
the threshold voltage (Vth), because it appears as a negative exponent, and hence
can cause a dramatic change in current even in small changes. However, since the
frequency of the circuit depends on the operating voltage and threshold voltage:
f/ (V  Vth)
 
V (2.4)
where
f = Frequency
 = Activity factor
An increase in Vth would cause a decrease in performance of the circuit which is
undesirable.
The second equation derived by Chandraksan et al. [22] which illustrates the
factors a ecting gate-oxide leakage current is:
Iox = K2W
 V
Tox
 2
e  Tox=V (2.5)
where
W = Channel Width
Tox = Oxide Thickness
K2 and  are experimentally determined.
7
It is seen clearly that a reduction in the oxide thickness will cause an increase in
the  eld across the gate-oxide. The high electric  eld along with a low oxide thickness
caused the electrons to tunnel through the oxide layer resulting in gate oxide leakage
current. There are two mechanisms of tunneling through the gate oxide: Fowler-
Nordheim (FN) tunneling, and direct tunneling [50]. The authors showed that the
tunneling probabilities are di erent in these two cases leading to two di erent types
of gate leakages.
In FN tunneling, electrons tunnel into the conduction band of the oxide layer.
In [50], Roy et al. derive that the FN current represents the tunneling through the
triangular potential barrier and is only valid for Vox >  ox, where Vox is the voltage
drop across the oxide, and  ox is the barrier height for electrons in the conduction
band. The authors noted that the measured value of FN tunneling is very small, and
can be easily neglected when the device is in normal operating mode.
In direct tunneling, electrons tunnel directly to the gate through the forbidden
energy gap of the oxide layer. This phenomena occurs in very thin oxide layers, namely
on the order of 3-4 nm. Direct tunneling occurs when Vox <  ox as electrons tunnel
through the trapezoidal potential barrier instead of the triangular barrier [50]. Direct
tunneling has three mechanisms: electron tunneling from conduction band (ECB),
electron tunneling from valence band (EVB), and hole tunneling from valence band
(HVB) [20]. In NMOS devices, ECB controls the gate to channel tunneling current,
while EVB controls gate to body tunneling in depletion-inversion, and ECB controls
it in accumulation. In PMOS devices, HVB controls the gate to channel tunneling,
while gate to body leakage is controlled by EVB in depletion-inversion, and ECB in
accumulation [20, 50]. The authors in [50] showed that tunneling associated with HVB
is much less than tunneling associated with ECB, leading to lower leakage current in
PMOS compared to NMOS.
8
As high-k gates can be thicker when compared to \bulk" or SiO2 gates, gate
leakage was reduced hence causing the devices to run cooler. These new transistors
worked so well that Intel has started incorporating them in their new microprocessor
designs starting from the Penryn chip lineup [13].
However, the shift in transistor design did not completely solve the primary
problem faced by chip makers which was increased power and energy dissipation due
to leakage at sub 90 nm technologies. With a reduction in transistor size, it was seen
that although scaling caused a reduction in dynamic energy per cycle due to reduced
capacitances in the circuit, there was an increase in leakage current of the circuit due
to scaling down of the threshold voltage causing a signi cant increase in the static
power dissipation [15]. Hence, there is a high interest in developing design techniques
for power and energy e cient circuits using high leakage nanometer technologies.
2.2 Voltage Scaling
The speed of digital circuits is currently limited by the energy density. Shrinking
feature sizes will continue to have the advantage of higher degree of integration,
resulting in lower cost, provided energy density can be kept in control. Another
characteristic that will assume increasing signi cance is tolerance to larger process
variation of smaller features. The supply voltage has the strongest in uence on all
components of power and energy of a digital CMOS circuit.
Meindl and Swanson mathematically showed that to obtain the greatest power
saving and the least power-speed product, the circuit must be operated at the lowest
supply voltage practically possible by the design technology [57]. Their calculations
showed that CMOS transistors did not abruptly turn o below the threshold voltage
but acted as weak inversion devices. They determined that the smallest theoretical
supply voltages at which circuits could function is approximately 8kT=q 0:2V at
T = 300 Kelvin, where k is the Boltzmann constant, T is absolute temperature, and
9
q is the electron charge. They also experimentally noticed that reduced operating
temperatures permitted lower supply voltages theorized by [30]. One technique high-
lighted in their paper was ion implantation of boron for adjusting the turn-on voltages
for both p and n transistors, achieving an operation close to their derived theoretical
limit. However, because of very low performance for technologies in use at that time
such low voltage operation was not adopted in practical systems.
Another approach has been to examine the energy minimization for circuits oper-
ating in the sub-threshold region. Studies have shown subthreshold operations have
a number of advantages, namely, improved gain, noise margin, and greater energy
e ciency at lower frequencies than the standard CMOS [54]. The authors simulated
a chain of inverter gates forming a ring oscillator and noticed the following:
 The power consumption is linearly dependent with the operating frequency at
higher frequencies due to the dominance of dynamic power component.
 The power consumption becomes independent of operating frequency at lower
frequencies as static power is more dominant.
 Subthreshold circuits consume less power than strong inversion circuits at the
same operating frequency.
The authors in [54] also simulated subthreshold pseudo-NMOS circuits and com-
pared the results with its CMOS counterpart. They found that pseudo-NMOS has
comparable in its power dissipation and robustness with CMOS but with less area, ca-
pacitance, and has an improved performance. However, very careful sizing of PMOS
to NMOS ratio is needed in order to ensure the proper functioning of the circuit.
Calhoun and Chandrakasan further examine solutions for optimum supply volt-
age (Vdd) and threshold voltage (Vth) to minimize energy in subthreshold operations
of digital circuits [19]. The authors identi ed that there is a maximum achievable
frequency for a given circuit operating in the subthreshold region. They observed
10
that any work done on strong inversion optimization did not account for gate leakage
even though it is a signi cant contributing factor in deep submicron technologies.
Their calculations showed that parameters like gate current, gate-induced drain leak-
age (GIDL), and pn junction leakage are negligible when compared to sub-threshold
current because they roll o much faster with Vdd. Their paper highlights the depen-
dence of minimum energy point on technology, design characteristics of the circuit,
and operating conditions like temperature, duty cycle, workload etc. They showed
that in sub-threshold region, the optimum Vdd changes by several hundred milli volts
when the above parameters are changed leading us to infer that circuits are very sen-
sitive to process variations in subthreshold voltage operations. They also conclude
that the current standard cell libraries also show reduced energy per operation for a
minimum sized device.
In a follow up paper, Calhoun and Chandrakasan successfully showed test chips
fabricated in 90nm technology operating at 330 mV supply voltage while obtaining
energy savings on the order of 9X compared to other reduced performance scenarios
[18]. They proposed a technique called \Ultra-dynamic voltage scaling" where the
circuit will work at normal operating voltage when speed of circuit or performance
is the primary criteria and at sub-threshold voltage when energy conservation is the
main motive. This technique made sense as for a majority of circuits, sub-threshold
operations was only needed when a major section of the chip was in \OFF" mode,
and needed to \wake up" or if the entire circuits were in sub-threshold region (e.g.
microsensor mode). This gave the users  exibility to operate the circuits either in an
energy e cient mode or performance mode depending upon their need.
Kwong and Chandrakasan highlight two major challenges faced by sub-threshold
voltage designs and can potentially impact circuit functionality [41]. The  rst one was
that the drive-current (Ion) is lower in sub-threshold region when compared to strong
inversion. Hence, the ratio of active to idle leakage current (Ion/Ioff) is lower. This
11
means that idle leakage may counter the active current and the output of the device
may not pull completely to Vdd or ground. Another problem faced by sub-threshold
voltage operations highlighted by [41] was process variations. Global variations can
a ect the entire circuit and its operations throughout the voltage scale. In sub-
threshold regions, it is seen at skewed P/N corners with either strong PMOS/weak
NMOS or vice versa. However, local  uctuations mainly, random dopant  uctuations
(RDF) cause random shifts in threshold voltage (Vth). These shifts can cause the
shifting of the minimum energy operating point and hence should be accounted in
circuit modeling as well. The authors also concluded that optimum Vdd need not
occur at the lowest voltage at which the circuit functions correctly. This result was
quite signi cant as it disproved the conclusion drawn by Meindl and Swanson [43].
The reason was the increased leakage of the sub-micron devices.
Zhai et al. highlighted the challenges of subthreshold voltage operation in SRAM
designs [63]. They highlighted three key challenges. First was a reduced Ion/Ioff cur-
rent ratio which led to a di culty in distinguishing between the read current of an
accessed cell and the leakage current in the unaccessed cell. Another key problem
highlighted by [63] was the change in gate sizing requirements in low voltage opera-
tions. The read and write stability of any conventional SRAM are heavily dependent
upon the pull-up, pull-down, and pass transistors whose strengths can be drastically
a ected due to skewed PMOS to NMOS Vth ratios. The most important challenge
to low voltage SRAM designs is the increased sensitivity to process variations. Even
small variations have known to cause mismatches hence causing functional failure
[63]. The authors presented a novel 6 transistor SRAM design in 0.13  m capable
of overcoming these challenges and successfully operating at subthreshold voltages.
Their results showed that the proposed design works successfully between 1.2 V to
193 mV while providing a 36% improvement in energy over other SRAM proposed
12
designs with less area overhead. Hanson et al. have also designed a processor ca-
pable of working in the subthreshold voltage region of transistors [25]. Their design
was used for sensor applications and showed correct operations at 350 mV operating
voltage while consuming only 3.5pJ of energy per cycle.
Dual voltage design in the subthreshold voltage range has recently been studied
and shown to have energy and speed advantages [33, 34]. In [34], Kim and Agrawal
obtained a point which they call the \true minimum" by using dual sub-threshold
voltage supplies. Using these dual supplies, the authors were able to lower the energy
per cycle to a point below the known minimum energy energy point. They avoided
the use of level converters which are usually needed in any dual level voltage design by
implementing mixed integer linear programs (MILP) hereby negating the disadvan-
tages of level converters such as delay insertion and power consumption. The authors
were successfully able to achieve a saving of 23% for a 16 bit ripple carry adder and
5% for a 4  4 multiplier which was a worst case scenario in their case. In their
follow up paper [33], they achieved an energy savings of 25% for various ISCAS?85
benchmark circuits.
Subthreshold voltage operation may also have an advantage in extending the
battery lifetime in portable and mobile electronics [40]. In this paper, Kulkarni and
Agrawal examined the energy consumption of a circuit and observed the impact
of the e ciency of the battery. They observed the need for controlling the power
consumption in order to control the size of the battery. They demonstrated that for
most circuits, the e ciency of the battery reduces for higher currents and operating
the battery at sub-threshold voltages (0.3 V in their case) vastly improved the battery
lifetime, which is critical for today?s portable electronic devices.
Abouzeid et al. developed a 45nm CMOS cell library which was optimized for
ultra-low power applications. They developed a decoder circuit, which operated at a
speed of 457 kHz when operated at 0.35 V [6]. That point was the minimum energy
13
point and they achieved a total energy consumption of 3.9 fJ per cycle [6]. Tran
and Baas designed a 32 bit fast adder which functioned successfully at subthreshold
voltage regions. They showed that their design performed successfully while being
most energy e cient at 0.37 V with a frequency of 100 MHz [58]. However, their
circuit was designed in PTM 45nm bulk technology. Since, the shift towards high-k,
a study is needed to see how the shift towards high-k would a ect circuit performance
in terms of speed and energy e ciency.
2.3 Process Variation
Till now, we have seen a lot of mentions of the term \process variation". It is the
natural variation occurring in the parameters of transistors (like threshold parameter,
oxide thickness, channel width and length, mobility etc.) during the fabrication of
integrated circuits. William Shockley  rst discovered random variation in semicon-
ductor devices during his analysis of random  uctuations in junction breakdown [53].
He theorized that the e ects of spatial  uctuations of donor and acceptor ions are
randomly distributed according to a Poisson distribution. Keyes expanded on Shock-
ley?s work by studying the e ect of randomness of impurity atoms on the electrical
characteristics of a MOSFET [31]. From his models, he concluded that threshold
voltages are normally distributed in a square transistor.
In 1974, Schemmert and Zimmer used the conclusion drawn by Bauer et al. [12]
that threshold voltage (Vth) depends upon the depth of penetration of ions during ion
implantation, among other parameters and introduced a procedure for minimizing
threshold-voltage sensitivity of ion-implanted MOSFETs due to di erent process pa-
rameters [51]. Their results showed a maximum deviation of  10% for tox. A Monte
Carlo analysis on a small MOSFET conducted by Alvarez and Akers in 1981 showed
14
that controlling the process variation parameters to  10% yielded a threshold volt-
age variation of  15% [8]. They also noticed that the distribution was normal, and
almost 95% of the variance was around 100 mV about the mean threshold voltage.
Agrawal and Nassif further classify process variation into two sub-categories:
random variation and systematic variations [7]. They further classify systematic vari-
ations into across- eld and layout dependent variations. The authors in [7] explain
that across- eld variations can cause identical devices at di erent locations of the
reticle to behave di erently. They classify the sources of error are caused due to
photolithographic and etching sources (dose, focus, expose variations etc.), lens aber-
rations, mask errors, and variations in etch loading [16, 17, 27, 62]. The authors
characterize layout dependent variations as the one causing di erent layouts of the
same same device to have di erent characteristics even when they are close to each
other. They note that these variations are predictable and can be modeled according
to di erent deterministic factors such as layout structure and topological environment
surrounding the device layout.
Agrawal and Nassif [7] characterize random variation as unpredictable random
uncertainties in the fabrication process like  uctuations in the number and location
of dopant atoms, and poly-silicon gate line-edge roughness. According to authors in
[9, 23, 24, 32], line-edge roughness and line-width roughness can cause an increase in
sub-threshold current and a degradation in the threshold voltage.
Random variations can cause device mismatch of identical and adjacent devices and
the deviation of threshold voltage caused due to these variations is represented by an
equation derived by Stolk et al. [55]:
 Vt =
 4p4q3 
Si B
2
!
:Tox 
ox
:
4pN
q
WeffLeff (2.6)
15
where
Tox = Gate oxide thickness
N = Channel Dopant concentration
Weff and Leff = E ective channel width and length
 Siand ox = Permittivity of silicon and oxide
 B = 2kBT ln(N=ni) (with kB Boltzmann?s constant, T the absolute temperature,
and ni the intrinsic carrier concentration)
The above equation illustrates that mismatch reduces with a decrease in doping
(N) and gate oxide thickness (Tox) and increases when e ective length and width
decreases.
Kuhn et al. from the Technology and Manufacturing Group at Intel cited that
high-k metal gates are also subject to variations in oxide thickness,  xed charge, and
interference traps [39]. They note that these physical changes result in parametric
variations in drive current, gate tunneling current, or threshold voltage. Studies
show that intrinsic threshold voltage  uctuations induced by local oxide thickness
variations become comparable to voltage  uctuations introduced by Random Dopant
Fluctuations (RDF) in deep submicron MOSFETs [10]. By evaluating gate-tunneling
leakage current theoretically and experimentally for 1.2 - 2.8 nm SiO2 gate oxides in
MOSFETs, Koh et al. showed that when the gate oxide tunnel resistance becomes
comparable to the gate poly-Si resistance, the statistical distribution of gate-tunnel
leakage current causes large  uctuations in Vth [37]. Kaushik et al. studied the e ects
of  xed charge in the high-k layer and concluded that mobility and uniformity of
threshold voltages were a ected by variations in the  xed charge [29].
Another concern highlighted by [39] is mobility degradation and Vth instability
due to fast transient charging (FTC) in electron traps. Investigation of e ects of FTC
by studying the impacts metal gate electrodes on mobility degradation suggest that
the increase in FTC can be attributed to the higher densities of the oxygen atom
16
vacancies in the dielectric caused due to dielectric induced scavenging processes [61].
Various optimization techniques have been shown to reduce the charge trapping pro-
cess [38, 49].
Management of process variation is playing a greater important role in technol-
ogy scaling and CMOS literature has always shown process variation as a critical
element in semiconductor fabrication. Until better fabrication and post-lithography
techniques are designed to minimize process variations, it must be considered in all
circuit and design simulations in order to accurately guess how a real world model
would actually function.
17
Chapter 3
Tools and Techniques
This section gives an introduction to the various tools and techniques used to
conduct the experiments of this thesis. There are di erent tools for circuit modeling,
netlist generation, simulation, process variation, and result analysis. Also, there are
di erent techniques to estimate the minimum energy operating point, and simulation
of circuit by varying di erent process parameters.
3.1 Test Circuit
The  rst step to performing any experiment is to choose a test circuit. After a
speci c test circuit is chosen, the decided tools will be used to apply the appropriate
technique for conducting the experiment. Usually, a simple replicable circuit or a
benchmark circuit where performance and working can be easily monitored is chosen.
For this thesis, a 32-bit ripple carry adder was chosen for its simple design yet it has
a su cient logic depth for the proper utilization of the design technique.
Figure 3.1: Schematic of a 32-bit ripple carry adder.
18
Figure 3.1 shows the basic schematic of a 32-bit ripple carry adder. A[1:32] and
B[1:32] are two 32-bit inputs to the adder, Ci is the carry in to the  rst adder, Co is
the carry out from the last adder, and S[1:32] are the sum outputs of each full adder
cell.
A ripple carry adder consists of a chain of full adders where the carry output of
the least signi cant bit (LSB) adder goes into the next adder. This way, the carry
signal "ripples" through the chain of adders hence the term, ripple carry adder. In a
32-bit ripple carry adder, the carry signal must propagate through 32 iterations of 1
bit full adders before the next set of input vectors can be applied. Thus, the critical
path delay of the adder is de ned as total path delay between the carry in (Ci) signal
given to the  rst adder and the carry out (Co) of the last adder.
3.2 IC Design and Simulation Tools
This section discusses the various tools used for designing and simulating the
test circuit.
3.2.1 Leonardo Spectrum
Leonardo Spectrum [4] is a logic synthesis tool from Mentor Graphics Corp.
Logic synthesis is the process of translating a Hardware Description Language (HDL)
into a technology speci c gate-level description. Leonardo Spectrum [4] o ers de-
sign capture, VHDL and Verilog entry, register transfer level debugging for logic
synthesis, constraint based optimization, timing analysis, encapsulated place-and-
route, and schematic viewing for Complex Programmable Logic Devices (CPLD, Field
Programmable Gate Arrays (FPGAs), and Application Speci c Integrated Circuits
(ASICs).
19
3.2.2 Design Architect
Design Architect [1] is a scalable design de nition environment provided by Men-
tor Graphics Corp. Since, it can interface easily with Leonardo Spectrum, this tool
can import the netlist generated by Leonardo Spectrum, and display the Register
level or transistor level design of the desired circuit. It can model digital, analog or
mixed-signal blocks, and can quickly simulate the entire hierarchal design.
3.2.3 HSPICE
Simulation Program with Integrated Circuit Emphasis (SPICE) is a general pur-
pose electronic circuit simulator used to check the integrity of circuit design and
predict circuit behavior [47]. HSPICE is a circuit simulator tool derived from SPICE
and designed by Synopsys Inc. in order to predict the timing, functionality, power
consumption, and yield of their designs. HSPICE takes a text netlist describing the
circuit elements like transistors, resistors, capacitors etc. and their connections, and
translate this description into solvable equations, and produce the  nal result. It
is common to use SPICE simulators to simulate Monte Carlo Simulations to doc-
ument the e ect of process variations on any circuit, hence providing an accurate
approximation of the yield of the circuit when fabricated.
3.3 Circuit Design and Simulation Techniques
This section explains how the circuit was modeled using a HDL before being
optimized by the tools explained in the previous section. It also explains how process
variation for various design parameters was modeled using Monte Carlo simulations
in SPICE.
20
3.3.1 VHDL
A popular high level description language for system and circuit design is VHDL.
The language has various levels of abstraction and supports behavioral, structural,
and data ow descriptions. Although behavioral statements are executed sequentially,
the structural and data ow descriptions in VHDL display a concurrent behavior i.e,
all statements written in that format are executed concurrently. Hence, the order of
the statements are not important.
3.3.2 Monte Carlo Analysis
Monte Carlo experiments can be de ned as a collection of computational algo-
rithms that compute results by repeated random sampling. This method is most
often used when it is impractical or impossible to compute an exact result because
of reliance on random numbers. Monte Carlo simulations are particularly useful
in studying process variations, more speci cally, how variations in process parame-
ters (like vth0, mobility, oxide thickness etc.) of transistors can a ect the various
functional parameters (like delay, drive current, threshold voltage, power dissipation,
energy etc.) of the circuit. Designers use this method to correctly estimate 3 sigma
corners and optimize their circuits to get the best yields.
3.4 Predictive Technology Model
Predictive Technology Models are customizable and predictive model  les for
transistor and interconnect technologies. They are compatible with SPICE, easily
scalable for a wide range of process variations, and provide accurate models from 180
nm to sub-45 nm technologies [5]. In today?s fast paced scaling of MOSFET tech-
nology, research and circuit design must begin before a future generation of MOS-
FET technology is fully implemented [21]. Challenges like process variations, leakage
current, and reliability must be properly addressed for each technology before being
21
embraced fully [64]. Hence, it is important for researchers to work with fully customiz-
able and accurate transistor models for each technology. Almost all semiconductor
companies guard their models closely, and do not disclose the data of their models
in order to prevent industrial espionage. Hence, it is critical for researchers to use
models which are not only available in open source but also provide accurate results
when compared to benchmark circuits designed using industrial models.
Figure 3.2: Comparison of PTM and industry?s technology model for Vdd and Vth
scaling vs. e ective length (Leff) for for a range of technology nodes [64].
22
Figure 3.3: Comparison of PTM and industry?s technology model for channel doping
concentration Nch vs. e ective length (Leff) for for a range of technology nodes [64].
The authors in [64] have successfully developed technology models for a range
of 130nm to sub-45nm. By analyzing Figures 3.2 and 3.3 which were drawn from
their paper, the conclusion shows that results obtained from their model matches
closely with data obtained from the industry. PTM has especially shown excellent
predictions for 45 nm technology node, along with better scalability for a wide range
of process and design conditions. Hence, it is highly preferable to use PTM models for
use in modeling and simulation of circuits when industrial models are not available.
23
Chapter 4
Methodology
4.1 Test Circuit Modeling
The 32-bit ripple carry adder circuit was  rst designed using VHDL. The VHDL
model was then imported into Leonardo Spectrum tool [4], which can create a simulat-
able netlist for the VHDL model. A circuit netlist can be created for any technology.
For this thesis, the circuit was modeled in TSMC 0.18 micron technology. Leonardo
Spectrum generated a verilog  le which contained the properly synthesized netlist.
This synthesized verilog  le was then imported into the Design Architect tool [1],
which gave the schematic of the 32-bit ripple carry adder using the standard TSMC
cell libraries.
The Design Architect tool has an internal SPICE simulator which can internally
generate a SPICE netlist. This SPICE netlist was further modi ed by changing the
width of all transistors from 0.18 m to 45 nm while preserving the width over length
(W/L) ratio. Instead of using the TSMC libraries as used by the Design Architect,
we used the Predictive Technology Model (PTM) for both 45 nm bulk and high-k
technologies [5]. This was done because Design Architect did not provide 45 nm
libraries, and the research required us to simulate circuits in the latest transistor
technologies.
4.2 Minimum Energy Point Estimation
To calculate the voltage at which the circuit operates at minimum energy, we use
a technique called "Dynamic Voltage Scaling" used by Calhoun and Chandrakasan
24
[19]. This technique consists of changing the operating voltage step by step, measuring
the critical path delay, and power dissipated by the circuit at each voltage step, and
calculating the energy dissipated by multiplying the power and the delay.
Eavg = Pavg t (4.1)
Pavg = Vdd Iavg (4.2)
where
Eavg = Average dissipated energy
t = Critical path delay
Pavg = Average dissipated power
Vdd = Operating voltage
Iavg = Average current drawn by the circuit
At each voltage step, there is a change in path delay as well as drawn current.
In other words, when voltage is decreased, there is an decrease in drawn current but
an increase in critical path delay. Hence, to  nd the minimum energy dissipated by
the circuit, the delay and current at each voltage step needs to be measured.
To calculate the delay at each voltage, the critical path needs to be activated.
Therefore, the following vectors were applied. First, all the inputs (A, B, and Ci)
were initialized to 0. This sets all the sum outputs and the carryout to value 0. In
the second vector, all A inputs (A[1:32]) were set to 1, while keeping all B inputs
(B[1:32]) to 0. All sum outputs thus became 1, but there was no change in the carry
signal and there was no rippling of bits through the carry signals. A third vector then
set at Ci at 1 to activate the critical path. As a carry was propagated through all 32
full adders, two critical paths were simultaneously activated. While the carry bits in
all the 32 full adders changed to 1, sum outputs were simultaneously brought back
to 0. The time delay between the initializing of the 3rd test vector, and changing of
25
the output signals of the  nal adder was measured.
t1 = tCo tCi (4.3)
t2 = tS32 tCi (4.4)
where
t1 and t2 = Path delays
tCi = Time when Ci switched from 0 to 1
tCo = Time when Co switched from 0 to 1
tCi = Time when S32 switched from 1 to 0
The largest time delay out of t1 and t2 is deemed the critical path delay. The
critical path determines the frequency of test vector application. This frequency
changes for each voltage point and needs to be measured each time there is a voltage
step change.
After  nding the frequency, 100 random vectors were applied to the inputs of
the 32-bit ripple carry adder at the maximum operating frequency at that voltage
point. On conducting the SPICE simulations using HSPICE [3], the average current
consumed by the circuit was measured. It was then multiplied by voltage to give the
average power dissipated by the test circuit as given in equation 4.2. To determine
the average Energy per cycle, the average power was multiplied with the delay of the
circuit as shown in equation 4.1. The average energy per cycle for each voltage step
was calculated, tabulated and graphed.
4.3 Process Variation
The results obtained using the above described technique is only applicable for
an ideal circuit. However, in real life, process variations can cause changes in various
transistor parameters like threshold parameter, mobility, oxide thickness etc. Hence,
26
it is important to investigate how a circuit?s characteristics like threshold current,
delay etc. changes with process variation.
We use Monte Carlo analysis to model process variations in the circuit. For this
circuit, we perform three types of variations. In the  rst one, we change the threshold
parameter (vth0) by 16%. In the second one, the oxide thickness (tox) is varied by
a factor of 20%. The last one consists of a variance of both vth0 and tox, and then
calculating the mean and sigma values of all three cases. We compare two cases to
test the e ect of process variations on the circuit.
First, we compare the operation of the circuit at 0.3 V for both bulk and high-k
technologies. More speci cally, the critical path delays are measured under the e ect
of process variation, and then the mean value is used to run the adder circuit to
measure the average current drawn. The average power dissipated, and energy per
cycle are calculated using equations 4.1 and 4.2, and the means are compared with the
ideal scenario. The second case consists of comparing the operation of circuit designed
in high k technology at 0.9 V and 0.3 V. We calculate how process variations a ect
the critical path delays and energy per cycle for both voltage points, and compare
the means.
27
Chapter 5
Results
5.1 Inverter Simulation
Current  owing in a circuit has two components: static current and dynamic
current. Furthermore, static (or leakage) current has two major components: sub-
threshold leakage and gate oxide leakage. Due to reduced feature size of the gate
oxide in bulk MOSFET designs, there is an increase in gate oxide leakage which can
a ect the delay of a circuit because of electron tunneling through the oxide layer.
This issue was addressed by a switch to high-k designs. However, with high-k, due
to the presence of a larger dielectric, the oxide capacitance increases leading to larger
dynamic current  owing through the circuit. Secondly, high-k designs have a thicker
oxide layer compared to bulk designs which led to a greater sub-threshold current
 owing through the transistor.
Hence, it is evident that high-k designs will have more dynamic and leakage
current  owing through the circuit. However, because of greater gate oxide leakage
in bulk designs, the delays of circuits designed in bulk technology will be signi cantly
larger compared to high-k designs. Therefore, we expect the energy per cycle for
high-k designs to be lower compared to bulk designs inspite of high-k having a higher
current  ow because of the tremendous gain in speed.
Before we performed SPICE simulations on the 32-bit ripple carry adder, we
simulated a single inverter designed in both 45nm bulk and high-k technologies to
understand how current, delay, and energy varies with a switch in technology. We
operated the inverter for 10 clock cycles at 0.4 V. Within those 10 cycles, there were
2 transitions occurring 0!1, and a 1!0 transition. The other 8 cycles were idle
28
cycles i.e no transitions were occurring. During idle periods, the only current  owing
through the circuit will be leakage current which will lead to static power dissipation.
During the transition cycles, both leakage and drive current will be  owing, hence the
power consumed in that period would be the sum of both static and dynamic power.
To calculate dynamic current, we can subtract the static current  owing during idle
periods the current  owing during the transition cycles.
Table 5.1 shows the values of dynamic current, static current, average current over
10 clock cycles, and clock period (gate delay) for both technologies.
Table 5.1: Comparison of various currents and clock period of a CMOS inverter
operating at 0.4 V for 45nm bulk and high-k technologies.
Static Dynamic Average Clock Energy
Technology Current Current Current Period per cycle
 10 7 (A)  10 5 (A)  10 6 (A)  10 12 (s)  10 18 (J)
45nm bulk 0.11 0.82 0.83 25.9 8.59
45nm high-k 3.02 5.48 5.72 3.63 8.30
From Table 5.1, it is clearly seen that energy per cycle for high-k designs is lower
compared to bulk design even though there is a greater current  owing through the
inverter designed in high-k. For circuits with greater critical path delay, we expect
the gap between the energy per cycles to further increase as increased gate oxide
leakage would cause larger circuits to run much slower.
5.2 Minimum Energy Point Estimation
From the Tables 5.2 and 5.3, it is evident that when there is a decrease in
operating voltage, there is a simultaneous decrease in average drive current and an
increase in critical path delay. However, it is seen that with a drop in voltage, the
decrease in current is greater than the increase in delay. Hence, there is a gradual
reduction in energy per cycle with every voltage drop. We also see that, at a particular
voltage (0.3 V), the energy dissipated per cycle is minimum for the circuit, and
29
for voltages below that point, the energy starts to increase. The reason for this is
because, as voltage decreases further below that point, the savings in current cannot
compensate the huge increase in delay which causes the energy per cycle to increase.
Also, the circuit works faster when designed in high-k technology rather than in
bulk technology. From Tables 5.2 and 5.3, we  nd the frequency of operation at the
optimum energy (minimum energy/cycle) point is 250 MHz (critical path delay is 4
ns) for high-k technology while for bulk technology the corresponding frequency for
minimum energy/cycle operation is just above 7 MHz (critical path delay is 137 ns).
The reason is because there is more drive current  owing through the circuit, hence
causing the transistors to switch faster, and the critical path delay is reduced due to
reduced gate current leakage in high-k.
Notably, it is seen that circuits modeled in high-k technology has the advantage
of greater energy e ciency as seen in Figure 5.1. In high-k technology, the mini-
mum energy obtained is lower at the same voltage than that for the bulk technology.
Comparing the minimum energy operations for the two technologies, we  nd that for
high-k energy per cycle is 40% lower compared to that for the bulk technology. The
minimum energy point occurs at 0.3 V for both high-k and bulk technologies. Again,
the reason is because although there is a higher drive current in the circuit designed
in high-k technology, the improvement in delay is more than enough to accommodate
the increase in the drive current, hence causing energy savings.
30
Table 5.2: Simulated performance of 32-bit ripple carry adder designed in 45nm bulk
technology.
Operating Average Average Critical path Average
Voltage (V) Current Power delay energy/cycle
 10 5 (A)  10 6 (W)  10 9 (s)  10 14 (J)
1 18.6 186 0.939 17.5
0.9 12.7 114 1.11 12.7
0.8 8.97 71.7 1.38 9.89
0.7 5.63 39.4 1.88 7.41
0.6 2.96 17.8 3.01 5.36
0.5 1.15 5.74 6.52 3.74
0.4 2.76 1.1 23.4 2.58
0.35 0.119 0.416 54.3 2.26
?0.3 0.053 0.16 137 2.19
0.2 0.017 0.035 923 3.19
Table 5.3: Simulated performance of 32-bit ripple carry adder designed in 45nm high-k
technology.
Operating Average Average Critical path Average
Voltage (V) Current Power delay energy/cycle
 10 5 (A)  10 6 (W)  10 9 (s)  10 14 (J)
1 34.9 349 0.45 15.6
0.9 25.7 231 0.47 10.9
0.8 20 152 0.51 8.10
0.7 15.5 109 0.57 6.16
0.6 10.5 62.9 0.67 4.19
0.5 6.38 31.9 0.87 2.78
0.4 3.20 12.8 1.42 1.82
0.35 1.84 6.42 2.12 1.36
?0.3 1.09 3.28 3.71 1.22
0.2 0.382 0.764 18.7 1.43
? Highlighted row indicates minimum energy voltage point
31
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110?14
10?13
10?12
Voltage (V)
Energy/cycle (J)
 
 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
10?10
10?9
10?8
10?7
10?6
Critical Path Delay (s)
 
45 nm bulk45 nm high?k
45 nm bulk45 nm high?k
Figure 5.1: Energy per cycle vs. Vdd for 32-bit ripple carry adder simulated in 45nm
bulk and high-k CMOS.
5.3 Process Variation
All circuits su er from process variation in the real world. Hence, it is important
to understand how process variation will a ect voltage scaling or more speci cally, the
minimum energy point with the nominal operating point. On analyzing the graphs in
Figure 5.1, we infer that circuits designed in 45nm high-k technology should be more
resilient to process variations because the energy-delay curve is lower when compared
to circuits designed in 45nm bulk technology and that minor changes would not cause
any drastic e ect on e ciency or performance. Two parameters, threshold parameter
(vth0) and oxide thickness (tox) are varied separately, and then together. vth0 is
varied by a factor of 16% because that is the deviation cited by the ITRS roadmap
[26]. Oxide thickness is varied by a factor of 20% as calculated by the authors in [46].
The delay was measured after performing a Monte Carlo analysis of a 1000
samples of the circuit for the voltage points of 0.9 V and 0.3 V in high-k technology,
and for the point of 0.3 V designed in bulk technology. The delays obtained by the
analysis of the 1000 samples was compared with the delays obtained by the analysis
32
of 30 random samples. Critical path delay was measured for each sample through
HSPICE [3] simulation using a vector pair that activated the critical path.
The means (tm) and standard deviations ( ) of the critical path delay for circuits
operating at 0.3 V designed in 45nm bulk and high-k technologies, and 0.9 V at high-
k technology are tabulated in Table 5.4. It is seen that the means and standard
deviations are closely comparable for the 30 and 1000 random samples establishing
the fact that 30 random samples can be used to model process variations.
Table 5.4: Comparison of mean and standard deviation of critical path delays for 30
and 1000 random samples.
30 samples 1000 samples
Operating Process Mean Standard Mean Standard
Voltage Variations (tm) Deviation ( ) (tm) Deviation ( )
 10 9 s  10 9 s  10 9 s  10 9 s
0.9 V high-k
vth0 0.488 0.045 0.475 0.048
tox 0.465 0.023 0.474 0.032
Both 0.477 0.055 0.48 0.062
0.3 V high-k
vth0 6.36 4.45 6.29 8.57
tox 4.61 1.52 4.23 1.65
Both 6.15 4.95 6.85 9.25
0.3 V bulk
vth0 274.4 210.32 225.7 207.32
tox 279.6 171.7 204.6 237.4
Both 192.3 202.03 241.1 238.65
The following  gures (Figures 5.2 - 5.4) compare the histograms of the delays for
the 30 and 1000 random samples. It can be clearly seen that the two histograms over-
lap closely meaning that simulations done using 30 random samples is equivalent to
simulations done using 1000 random samples. This experiment was done to establish
the above stated fact since all the following results were done using 30 random sam-
ples. Experiments using 1000 random samples were unfeasible because calculating
the energy can take a duration of almost 3 days for one voltage point, and secondly,
there was not enough memory in the computer to store the output from the SPICE
 le.
33
3 3.5 4 4.5 5 5.5 6 6.5x 10?100
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(a) Variation of vth0 by 16%
3 3.5 4 4.5 5 5.5 6x 10?100
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(b) Variation of tox by 20%
2 3 4 5 6 7 8x 10?100
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(c) Variation of both vth0 and tox
Figure 5.2: E ect of Process variation on critical path delay for adder operating at
0.9 V designed in 45nm high-k technology.
34
0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10?70
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(a) Variation of vth0 by 16%
0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10?80
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(b) Variation of tox by 20%
0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10?70
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(c) Variation of both vth0 and tox
Figure 5.3: E ect of Process variation on critical path delay for adder operating at
0.3 V designed in 45nm high-k technology.
35
0 0.2 0.4 0.6 0.8 1x 10?60
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(a) Variation of vth0 by 16%
0 0.2 0.4 0.6 0.8 1x 10?60
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(b) Variation of tox by 20%
0 0.2 0.4 0.6 0.8 1x 10?60
0.2
0.4
0.6
0.8
1
Critical Path Delay (s) 
 1000 samples
30 samples
(c) Variation of both vth0 and tox
Figure 5.4: E ect of Process variation on critical path delay for adder operating at
0.3 V designed in 45nm bulk technology.
36
Table 5.5: Yield of circuit designed in 45nm bulk and high-k technologies when
a ected by process variations.
Operating Process Yield
Voltage (V) Variations (%)
0.9 V high-k
vth0 100%
tox 98.9%
Both 99%
0.3 V high-k
vth0 99.6 %
tox 98.9%
Both 99.7%
0.3 V bulk
vth0 90.7%
tox 97.5%
Both 79.8%
Table 5.5 tells us how many samples out of 1000 function correctly after being
a ected by process variation, i.e yield of the circuit. It is seen that circuits designed
in high-k technology are more resilient to process variations, has a very low failure
rate. Bulk technology, on the other hand, is seen to have a lower yield, and when both
parameters undergo process variation at the same time, the yield drops drastically to
less than 80% unlike high-k, which still maintains an almost 100% yield. Hence, it
could be hypothesized that as more parametric parameters undergo process variation,
the yield will be a ected as well.
The corresponding sum of mean and 3 gives the worst case delay for a circuit
operating at 0.3 V for each technology. This worst case delay was used as clock period
to feed 100 random vectors to 30 random Monte Carlo samples of the 32 bit adder
circuit and the current drawn from Vdd for each sample was measured. The average
current of a circuit sample was multiplied by the current operating voltage to obtain
the power, which when multiplied by the clock period gave us the energy/cycle for
each random sample.
Table 5.6 compares the average values of energy/cycle and the clock period with
and without process variations for various technologies and operating voltages. Al-
though the clock period almost doubles due to process variations for subthreshold
37
Table 5.6: Comparison of average energy/cycle and clock period with and without
process variations for a 32-bit ripple carry adder.
Operating Process Clock Period Energy/cycle
Voltage (V) Variations  10 9 (s)  10 14 (J)
0.9 V high-k
No Variation 0.47 10.9
vth0 (16%) 0.619 12.4
tox (20%) 0.57 120
Both 0.666 87
0.3 V high-k
No Variation 3.71 1.22
vth0 (16%) 32 8.15
tox (20%) 9.18 24
Both 36.4 43.2
0.3 V bulk
No Variation 137 2.19
vth0 (16%) 847.66 19.6
tox (20%) 916.8 50.4
Both 957.05 62.7
voltages, it is clearly seen that the circuit?s energy consumption is not that far from
the nominal energy/cycle. Since we assumed all samples to have a clock period cor-
responding to the worst (3 ) delay, it is possible that some circuits may be able to
run faster and, for those cases, their individual energy/cycle may come closer to the
nominal values or even perform better than that. The graphs in Figures 5.5 - 5.7
highlight the variations in energy/cycle for the circuit operating at 0.3 V and 0.9
V designed in both bulk and high-k technologies. From the table and graphs, it is
evident that a combinational circuit designed in high-k technology is more resilient
to process variation, has a smaller critical path delay and a lower energy/cycle.
38
0 5 10 15 20 25 3010?14
10?13
10?12
10?11
No. of samples
Energy/cycle (J)
 
 0.3 V bulk
0.3 V high?k0.9 V high?k
Figure 5.5: Comparison of energy/cycle for di erent adder circuit operations when
threshold parameter (vth0) undergoes process variation.
0 5 10 15 20 25 3010?14
10?13
10?12
10?11
10?10
No. of samples
Energy/cycle (J)
 
 0.3 V bulk
0.3 V high?k0.9 V high?k
Figure 5.6: Comparison of energy/cycle for di erent adder circuit operations when
oxide thickness (tox) undergoes process variation.
39
0 5 10 15 20 25 3010?14
10?13
10?12
10?11
10?10
No. of samples
Energy/cycle (J)
 
 0.3 V bulk
0.3 V high?k0.9 V high?k
Figure 5.7: Comparison of energy/cycle for di erent adder circuit operations when
both vth0 and tox undergo process variation.
A deviation in the process parameters causes a change in the drive current and
critical path delay. This change usually causes the energy/cycle to increase as current
and delay are not exactly inversely proportional to each other. However, there are rare
instances (in high-k) where their relationship has caused the energy/cycle to decrease
from the nominal value resulting in a circuit that runs faster. By analyzing the
graphs in Figures 5.5 - 5.7, it is clearly seen that even with process variations, circuits
operating at 0.3 V are considerably more energy e cient than circuits operating at
0.9 V.
40
Chapter 6
Conclusion
The results presented in this thesis are believed to be accurate and portray a
picture of how a device will behave when fabricated in these technologies as the PTM
models have shown a trend of closely following the actual fabrication trends. They
have also shown better physical scalability over a wide range of process and design
conditions [64].
Results indicate that the average power dissipated by the circuit decreases steadily
with voltage scaling. This is true for both bulk and high-k designs. Simultaneously,
it is also seen that the critical path delay increases or in other words, the speed of
the circuit decreases. However, due to a greater drop in power compared to speed,
the average energy per cycle of the circuit for both designs also decreases steadily. It
is seen that the circuit has a minimum energy at an operating point of 0.3 V below
which, the circuit started to dissipate more energy compared to higher voltages. The
reason for this was that the drop in power dissipated was not enough to compensate
the increase in delay of the circuit leading it to take more energy per cycle to run
successfully.
Similar work was done by Tran and Baas, and their results showed their fast
adder circuit functioning properly at 0.37 V while consuming 34 fJ per cycle [58].
Their design was based on 45 nm bulk PTM model, and since our 45 nm bulk model
design also got similar results, [58] validates our results and a rms the conclusions
drawn in this thesis.
Results also show that high-k technology runs faster, and more energy e ciently
when compared to bulk technology. Although, the minimum energy point occurs at
41
the same voltage for both bulk and high-k, the value of average energy per cycle is
40% lower for high-k when compared to bulk. Also, high-k design operated at 250
MHz while bulk design operated at just above 7 MHz showing that high-k designs
are faster at sub-threshold voltages as well.
Figure 6.1 explains why high-k technology is better than bulk technology. High-k
technology has a thicker gate oxide when compared to bulk leading to lower current
leakage through the gate oxide via tunneling. Secondly, presence of a metal gate
instead of a polysilicon gate allows a better  ow of charge in the channel, leading to
a larger drive current and hence causing circuits designed in high-k to run faster.
Figure 6.1: Comparison of gate oxide and gate design between a bulk MOSFET (top),
and high-k MOSFET (bottom) [13].
Recent research has shown that process variation can greatly a ect the function-
ality of logic gates [56]. It can also bring in uncertainties in the circuit logic. Shifts
in the threshold voltage Vth can drastically a ect the Ion and Ioff in sub-threshold
regions causing an exponential shift in the minimum energy point [41]. By analyzing
the data from our results, we theorize that high-k technology designs at the minimum
42
energy point will be more resilient to process variations when compared to bulk tech-
nology because high-k technologies provide a higher drive current in the sub-threshold
region along with a reduction in gate oxide leakage for the same drive current when
compared to the bulk technology [13, 52].
It is seen that process variation has a large e ect on the yield of the circuit
designed in bulk technology compared to high-k technology. Changes in threshold
parameter (vth0) and oxide thickness (tox) caused the yield of the circuit designed
in 45nm bulk technology to drop to less than 80%. However, high-k designs showed
more resilience and the yield was almost 100% for both normal operating voltages
and sub-threshold voltages.
Parametric variations also have an e ect on the speed and average energy dissi-
pated of the circuit. On performing 1000 Monte Carlo simulations, and comparing its
histogram with 30 samples, it is seen that the mean delays are very close to each other.
Hence, the conclusions drawn from 30 samples would be the same as the ones drawn
from analyzing a 1000 samples although, 1000 samples may provide more accurate
 gures and comparisons .
SPICE simulations have shown that even with process variations, circuits oper-
ating at 0.3 V (sub-threshold voltages) remain more energy e cient than at 0.9 V
(normal operating voltages). Hence, it is more energy e cient to operate the circuits
at sub-threshold voltages rather than at normal supply voltages. Also, it is seen that
high-k designs are more resilient than bulk designs not only in terms of yield, but
they are faster and more energy e cient compared to bulk designs.
Studies have shown that the voltage at which the minimum energy point occurs
reduces with change in technology, reached a minimum at 90 nm and then starts
increasing with every technology advance [14]. Although we expect the clock rate
to further improve and energy per cycle to reduce for 32 nm and  ner technologies,
some projections by [14] indicate that energy per cycle could increase with a move
43
towards  ner technologies. Hence, for lower technologies, the voltage at which the
minimum energy point occurs should increase. However, as these studies have been
done only for bulk technologies, it is hard to predict how high-k models will behave.
Simulations need to be done to check how the minimum energy point moves from 45
nm high-k technology to  ner high-k technologies.
Hence, future research could probably look into the movement of the minimum
energy point when transistors designed in high-k technology are scaled down. It
is still unknown how sequential circuits will behave when a ected by parametric
variations for  ner high-k technologies. Research could be done to understand the
e ect of process variations on timing, energy dissipation and yield for sub-threshold
operations of sequential circuits.
The ultimate minimum energy any circuit can achieve is bounded by the Lan-
dauer limit, which is given by kTln2, where k is the Bolzmann constant and T is the
absolute temperature in Kelvin. Current studies have shown that the lower bound
on the energy to process one bit is about 36,000 times higher than the absolute Lan-
dauer limit [28, 42]. A shift towards high-k technology is only a small step towards
achieving energy values close to that limit. However, more research and supporting
experiments need to be done to  nd the limits of high-k technology so that it can lead
to actual implementations of digital systems like microprocessors, graphics processors,
and digital signal processors.
44
Bibliography
[1] \Design Architect." Mentor Graphics. http://www.mentor.com/products/ic
nanometer design/custom-ic-design/design architect ic/.
[2] \High-k and Metal Gate Research." Intel Website. http://www.intel.com/technology/
silicon/high-k.htm.
[3] \HSPICE." Synopsys Inc. http://www.synopsys.com/Tools/Veri cation/
AMSVeri cation/CircuitSimulation/HSPICE/Pages/default.aspx.
[4] \Leonardo Spectrum." Mentor Graphics. http://www.mentor.com/products/fpga/
synthesis/leonardo spectrum/.
[5] \PTM Website." Arizona State University. http://ptm.asu.edu/.
[6] F. Abouzeid, S. Clerc, F. Firmin, M. Renaudin, and G. Sicard, \A 45nm CMOS 0.35
V optimized standard cell library for ultra-low power applications," in Proceedings of
the 14th ACM/IEEE international symposium on Low Power Electronics and Design,
2009, pp. 225{230.
[7] K. Agarwal and S. Nassif, \Characterizing process variation in nanometer CMOS," in
Proceedings of the 44th ACM Design Automation Conference, 2007, pp. 396{399.
[8] A. R. Alvarez and L. A. Akers, \Monte Carlo analysis of sensitivity of threshold voltage
in small geometry MOSFETs," Electronics Letters, vol. 18, no. 1, pp. 42{43, 1982.
[9] A. Asenov, S. Kaya, and A. R. Brown, \Intrinsic parameter  uctuations in de-
cananometer MOSFETs introduced by gate line edge roughness," IEEE Transactions
on Electron Devices, vol. 50, no. 5, pp. 1254{1260, 2003.
[10] A. Asenov, S. Kaya, and J. H. Davies, \Intrinsic threshold voltage  uctuations in
decanano MOSFETs due to local oxide thickness variations," IEEE Transactions on
Electron Devices, vol. 49, no. 1, pp. 112{119, 2002.
45
[11] C. Auth, A. Cappellani, J. S. Chun, A. Dalis, A. Davis, T. Ghani, G. Glass, T. Glass-
man, M. Harper, M. Hattendorf, et al., \45nm High-k + metal gate strain-enhanced
transistors," in Proc. IEEE Symposium on VLSI Technology, 2008, pp. 128{129.
[12] L. O. Bauer, M. R. MacPherson, A. T. Robinson, and H. G. Dill, \Properties of silicon
implanted with boron ions through thermal silicon dioxide," Solid-State Electronics,
vol. 16, no. 3, pp. 289{300, 1973.
[13] M. T. Bohr, R. S. Chau, T. Ghani, and K. Mistry, \The high-k solution," IEEE
Spectrum, vol. 44, no. 10, pp. 29{35, 2007.
[14] D. Bol, D. Kamel, D. Flandre, and J. D. Legat, \Nanometer MOSFET e ects on
the minimum-energy point of 45nm subthreshold logic," in Proceedings of the 14th
ACM/IEEE International Symposium on Low Power Electronics and Design, 2009,
pp. 3{8.
[15] S. Borkar, \Design Challenges of Technology Scaling," IEEE Micro, vol. 19, no. 4, pp.
23{29, 1999.
[16] Y. A. Borodovsky, \Impact of local partial coherence variations on exposure tool per-
formance," in Proceedings of SPIE, volume 2440, 1995, p. 750.
[17] T. A. Brunner, \Impact of lens aberrations on optical lithography," IBM Journal of
Research and Development, vol. 41, no. 1.2, pp. 57{67, 1997.
[18] B. H. Calhoun and A. P. Chandrakasan, \Ultra-dynamic voltage scaling using sub-
threshold operation and local voltage dithering in 90nm CMOS," in Proc. IEEE In-
ternational Conference on Solid-State Circuits, 2005, pp. 300{599.
[19] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, \Modeling and sizing for mini-
mum energy operation in subthreshold circuits," IEEE Journal of Solid-State Circuits,
vol. 40, no. 9, pp. 1778{1786, 2005.
[20] K. M. Cao, W. C. Lee, W. Liu, X. Jin, P. Su, S. K. H. Fung, J. X. An, B. Yu, and C. Hu,
\BSIM4 gate leakage model including source-drain partition," in Proc. International
IEEE Electron Devices Meeting, 2000, pp. 815{818.
[21] Y. Cao, T. Sato, M. Orshansky, D. Sylvester, and C. Hu, \New paradigm of predictive
MOSFET and interconnect modeling for early circuit simulation," in Proceedings of
46
the IEEE Custom Integrated Circuits Conference, 2000, pp. 201{204.
[22] A. P. Chandrakasan, W. J. Bowhill, and F. Fox, Design of high-performance micro-
processor circuits. Wiley-IEEE Press, 2000.
[23] C. H. Diaz, H. J. Tao, Y. C. Ku, A. Yen, and K. Young, \An experimentally validated
analytical model for gate line-edge roughness (LER) e ects on technology scaling,"
IEEE Electron Device Letters, vol. 22, no. 6, pp. 287{289, 2001.
[24] H. Fukutome, Y. Momiyama, T. Kubo, Y. Tagawa, T. Aoyama, and H. Arimoto,
\Direct evaluation of gate line edge roughness impact on extension pro les in sub-
50-nm n-MOSFETs," IEEE Transactions on Electron Devices, vol. 53, no. 11, pp.
2755{2763, 2006.
[25] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson,
L. Nazhandali, T. Austin, et al., \Performance and variability optimization strategies
in a sub-200mV, 3.5 pJ/inst, 11nW subthreshold processor," in IEEE Symposium on
VLSI Circuits, 2007, pp. 152{153.
[26] R. Heald and P. Wang, \Variability in sub-100nm SRAM designs," in Proceedings of
the IEEE/ACM International Conference on Computer-aided Design, IEEE Computer
Society, 2004, pp. 347{352.
[27] C. Hedlund, H. O. Blom, and S. Berg, \Microloading e ect in reactive ion etching,"
Journal of Vacuum Science & Technology A: Vacuum, Surfaces, and Films, vol. 12,
no. 4, pp. 1962{1965, 1994.
[28] J. Izydorczyk and M. Izydorczyk, \Microprocessor Scaling: What Limits Will Hold?,"
Computer, vol. 43, no. 8, pp. 20{26, 2010.
[29] V. Kaushik, B. O?Sullivan, G. Pourtois, N. Van Hoornick, A. Delabie, S. Van Elshocht,
W. Deweerd, T. Schram, L. Pantisano, E. Rohr, et al., \Estimation of  xed charge
densities in hafnium-silicate gate dielectrics," IEEE Transactions on Electron Devices,
vol. 53, no. 10, pp. 2627{2633, 2006.
[30] R. W. Keyes, \Physical problems and limits in computer logic," IEEE Spectrum, vol. 6,
no. 5, pp. 36{45, 1969.
47
[31] R. W. Keyes, \The e ect of randomness in the distribution of impurity atoms on FET
thresholds," Applied Physics A: Materials Science and Processing, vol. 8, no. 3, pp.
251{259, 1975.
[32] H. W. Kim, J. Y. Lee, J. Shin, S. G. Woo, H. K. Cho, and J. T. Moon, \Experi-
mental investigation of the impact of LWR on sub-100-nm device performance," IEEE
Transactions on Electron Devices, vol. 51, no. 12, pp. 1984{1988, 2004.
[33] K. Kim and V. D. Agrawal, \Minimum Energy CMOS Design with Dual Subthreshold
Supply and Multiple Logic-Level Gates," in Proc. 12th International Symposium on
Quality Electronic Design, 2011.
[34] K. Kim and V. D. Agrawal, \True Minimum Energy Design Using Dual Below-
Threshold Supply Voltages," in Proc. 24th Annual Conference on VLSI Design, 2011,
pp. 292{297.
[35] N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin,
M. Kandemir, and V. Narayanan, \Leakage current: Moore?s law meets static power,"
Computer, vol. 36, no. 12, pp. 68{75, 2003.
[36] Y. Kim, G. Gebara, M. Freiler, J. Barnett, D. Riley, J. Chen, K. Torres, J. E. Lim,
B. Foran, F. Shaapur, et al., \Conventional n-channel MOSFET devices using single
layer HfO2 and ZrO2 as high-k gate dielectrics with polysilicon gate electrode," in
IEEE International Electron Devices Meeting, 2001, pp. 20{2.
[37] M. Koh, W. Mizubayashi, K. Iwamoto, H. Murakami, T. Ono, M. Tsuno, T. Mihara,
K. Shibahara, S. Miyazaki, and M. Hirose, \Limit of gate oxide thickness scaling in
MOSFETs due to apparent threshold voltage  uctuation induced by tunnel leakage
current," IEEE Transactions on Electron Devices, vol. 48, no. 2, pp. 259{264, 2001.
[38] M. Koike, T. Ino, Y. Kamimuta, M. Koyama, Y. Kamata, M. Suzuki, Y. Mitani,
A. Nishiyama, and Y. Tsunashima, \E ect of Hf-N bond on properties of thermally
stable amorphous HfSiON and applicability of this material to sub-50nm technology
node LSIs," in IEEE International Electron Devices Meeting, 2003, pp. 4{7.
[39] K. Kuhn, C. Kenyon, A. Kornfeld, M. Liu, A. Maheshwari, W. Shih, S. Sivakumar,
G. Taylor, P. VanDerVoorn, and K. Zawadzki, \Managing process variation in Intels
48
45nm CMOS technology," Intel Technology Journal, vol. 12, no. 2, pp. 93{109, 2008.
[40] M. Kulkarni and V. D. Agrawal, \Energy Source Lifetime Optimization for a Digital
System through Power Management," in Proceedings of the 43rd Southeastern Sympo-
sium on System Theory, 2011.
[41] J. Kwong and A. P. Chandrakasan, \Advances in Ultra-Low-Voltage Design," IEEE
Solid-State Circuits Newsletter, vol. 13, no. 4, pp. 20{27, 2008.
[42] R. Landauer, \Irreversibility and heat generation in the computing process," IBM
Journal of Research and Development, vol. 5, no. 3, pp. 183{191, 1961.
[43] J. D. Meindl and R. N. Swanson, \Potential improvements in power-speed performance
of digital circuits," Proceedings of the IEEE, vol. 59, no. 5, pp. 815{816, 1971.
[44] K. Mistry, C. Allen, C. Auth, B. Beattie, D. Bergstrom, M. Bost, M. Brazier,
M. Buehler, A. Cappellani, R. Chau, et al., \A 45nm logic technology with high-k+
metal gate transistors, strained silicon, 9 Cu interconnect layers, 193nm dry pattern-
ing, and 100% Pb-free packaging," in IEEE International Electron Devices Meeting,
2007, pp. 247{250.
[45] G. E. Moore et al., \Cramming more components onto integrated circuits," Proceedings
of the IEEE, vol. 86, no. 1, pp. 82{85, 1998.
[46] S. Mukhopadhyay, A. Raychowdhury, and K. Roy, \Accurate estimation of total leak-
age in nanometer-scale bulk CMOS circuits based on device geometry and doping
pro le," IEEE Transactions on Computer-Aided Design of Integrated Circuits and
Systems, vol. 24, no. 3, pp. 363{381, 2005.
[47] L. W. Nagel and D. O. Pederson, \Spice (simulation program with integrated cir-
cuit emphasis)," Technical Report UCB/ERL M382, EECS Department, University of
California, Berkeley, Apr 1973.
[48] S. Natarajan, M. Armstrong, M. Bost, R. Brain, M. Brazier, C. H. Chang, V. Chikar-
mane, M. Childs, H. Deshpande, K. Dev, et al., \A 32nm logic technology featuring
2nd-generation high-k+ metal-gate transistors, enhanced channel strain and 0.171 m
2 SRAM cell size in a 291Mb array," in IEEE International Electron Devices Meeting,
2008, pp. 1{3.
49
[49] M. A. Quevedo-Lopez, S. A. Krishnan, D. Kirsch, C. H. J. Li, J. H. Sim, C. Hu man,
J. J. Peterson, B. H. Lee, G. Pant, B. E. Gnade, et al., \High performance gate  rst
HfSiON dielectric satisfying 45nm node requirements," in IEEE International Electron
Devices Meeting, 2005, pp. 4{6.
[50] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, \Leakage current mechanisms
and leakage reduction techniques in deep-submicrometer CMOS circuits," Proceedings
of the IEEE, vol. 91, no. 2, pp. 305{327, 2003.
[51] W. Schemmert and G. Zimmer, \Threshold-voltage sensitivity of ion-implanted MOS
transistors due to process variations," Electronics Letters, vol. 10, no. 9, pp. 151{152,
1974.
[52] G. Sery, S. Borkar, and V. De, \Life is CMOS: why chase the life after?," in Proceedings
of the 39th Annual Design Automation Conference, 2002, pp. 78{83.
[53] W. Shockley, \Problems related to pn junctions in silicon," Solid-State Electronics,
vol. 2, no. 1, pp. 35{60, 1961.
[54] H. Soeleman and K. Roy, \Ultra-low power digital subthreshold logic circuits," in Pro-
ceedings of the 1999 International Symposium on Low Power Electronics and Design,
1999, pp. 94{96.
[55] P. A. Stolk, F. P. Widdershoven, and D. B. M. Klaassen, \Modeling statistical dopant
 uctuations in MOS transistors," IEEE Transactions on Electron Devices, vol. 45,
no. 9, pp. 1960{1971, 1998.
[56] T. Sugii, \High-performance bulk CMOS technology for 65/45 nm nodes," Solid-State
Electronics, vol. 50, no. 1, pp. 2{9, 2006.
[57] R. M. Swanson and J. D. Meindl, \Ion-implanted complementary MOS transistors in
low-voltage circuits," IEEE Journal of Solid-State Circuits, vol. 7, no. 2, pp. 146{153,
1972.
[58] A. T. Tran and B. M. Baas, \Design of an energy-e cient 32-bit adder operating
at subthreshold voltages in 45-nm CMOS," in Proc. 3rd International Conference on
Communications and Electronics, 2010, pp. 87{91.
50
[59] M. Venkatasubramanian and V. D. Agrawal, \Subthreshold Voltage High-k CMOS
Devices Have Lowest Energy and High Process Tolerance," in Proceedings of the 43rd
Southeastern Symposium on System Theory, 2011.
[60] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-threshold design for ultra
low-power systems. Springer Verlag, 2006.
[61] H. C. Wen, H. R. Harris, C. D. Young, H. Luan, H. N. Alshareef, K. Choi, D. L. Kwong,
P. Majhi, G. Bersuker, and B. H. Lee, \On Oxygen De ciency and Fast Transient
Charge-Trapping E ects in High-k Dielectrics," IEEE Electron Device Letters, vol. 27,
no. 12, pp. 984{987, 2006.
[62] A. K. Wong, R. A. Ferguson, and S. M. Mans eld, \The mask error factor in optical
lithography," IEEE Transactions on Semiconductor Manufacturing, vol. 13, no. 2, pp.
235{242, 2000.
[63] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, \A variation-tolerant sub-200 mV
6-T subthreshold SRAM," IEEE Journal of Solid-State Circuits, vol. 43, no. 10, pp.
2338{2348, 2008.
[64] W. Zhao and Y. Cao, \Predictive technology model for nano-CMOS design explo-
ration," ACM Journal on Emerging Technologies in Computing Systems (JETC),
vol. 3, no. 1, 2007.
[65] R. Zimmermann and W. Fichtner, \Low-power logic styles: CMOS versus pass-
transistor logic," IEEE Journal of Solid-State Circuits, vol. 32, no. 7, pp. 1079{1090,
1997.
51