Energy E ciency and Process Variation Tolerance of 45 nm Bulk and High-k CMOS Devices by Muralidharan Venkatasubramanian A thesis submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Master of Science Auburn, Alabama May 9, 2011 Keywords: Low-power circuits, subthreshold voltage operation, high-k CMOS technology, process variation Copyright 2011 by Muralidharan Venkatasubramanian Approved by Vishwani D. Agrawal, Chair, James J. Danaher Professor of Electrical and Computer Engineering Adit D. Singh, James B. Davis Professor of Electrical and Computer Engineering Charles E. Stroud, Professor of Electrical and Computer Engineering Abstract With transistor sizes being reduced to sub 45nm ranges, we have seen an im- provement in speed, better performance, and deeper integration of digital circuits. However, there has been a corresponding increase in power consumption, along with greater energy dissipation. The reason is because of increased leakage current in the channel. A proposed solution is a shift towards high-k materials and metal gate from poly-silicon gate of yesteryear. Reduced feature sizes also su er from greater para- metric process variations during lithography and cause identical circuits to behave di erently. With high-k technology overshadowing bulk technology ever since transistor sizes hit 45nm, a greater understanding of how the properties of high-k technology will a ect digital devices especially their speed, power consumption, and energy dissipated upon voltage scaling is needed. Also, a better estimation of e ects of parametric variations on circuits designed in high-k technology can provide valuable information which can be used to improve current designs. ii Acknowledgments First of all, I would like to thank my advisor, Dr. Vishwani Agrawal, for his tremendous help and support he has provided during the pursuit of my thesis. His knowledge, guidance, and patience were immensely bene cial, without which this research would not have been completed successfully. I also would like to thank Dr. Adit D. Singh, and Dr. Charles E. Stroud for being on my thesis committee. Their courses provided me the theoretical knowledge which was used to pursue this research, and their corrections and input was invaluable in the writing of this thesis. I also would like to thank Manish Kulkarni, Kyungseok Kim, and Sarthak Kakkar for being easily available for help when I had queries in my simulations tools like HSPICE and MATLAB. Last but not the least, I would like to thank all my friends and family whose never gave up on me, and whose moral support helped me pursue my research successfully. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Technology Shift . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Voltage Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Tools and Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.1 Test Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 IC Design and Simulation Tools . . . . . . . . . . . . . . . . . . . . . 19 3.2.1 Leonardo Spectrum . . . . . . . . . . . . . . . . . . . . . . . . 19 3.2.2 Design Architect . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.2.3 HSPICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3 Circuit Design and Simulation Techniques . . . . . . . . . . . . . . . 20 3.3.1 VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3.3.2 Monte Carlo Analysis . . . . . . . . . . . . . . . . . . . . . . . 21 3.4 Predictive Technology Model . . . . . . . . . . . . . . . . . . . . . . . 21 4 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.1 Test Circuit Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.2 Minimum Energy Point Estimation . . . . . . . . . . . . . . . . . . . 24 4.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 iv 5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.1 Inverter Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 5.2 Minimum Energy Point Estimation . . . . . . . . . . . . . . . . . . . 29 5.3 Process Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 v List of Figures 3.1 Schematic of a 32-bit ripple carry adder. . . . . . . . . . . . . . . . . . . 18 3.2 Comparison of PTM and industry?s technology model for Vdd and Vth scaling vs. e ective length (Leff) for for a range of technology nodes [64]. 22 3.3 Comparison of PTM and industry?s technology model for channel doping concentration Nch vs. e ective length (Leff) for for a range of technology nodes [64]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 5.1 Energy per cycle vs. Vdd for 32-bit ripple carry adder simulated in 45nm bulk and high-k CMOS. . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.2 E ect of Process variation on critical path delay for adder operating at 0.9 V designed in 45nm high-k technology. . . . . . . . . . . . . . . . . . 34 5.3 E ect of Process variation on critical path delay for adder operating at 0.3 V designed in 45nm high-k technology. . . . . . . . . . . . . . . . . . 35 5.4 E ect of Process variation on critical path delay for adder operating at 0.3 V designed in 45nm bulk technology. . . . . . . . . . . . . . . . . . . 36 5.5 Comparison of energy/cycle for di erent adder circuit operations when threshold parameter (vth0) undergoes process variation. . . . . . . . . . 39 5.6 Comparison of energy/cycle for di erent adder circuit operations when oxide thickness (tox) undergoes process variation. . . . . . . . . . . . . . 39 vi 5.7 Comparison of energy/cycle for di erent adder circuit operations when both vth0 and tox undergo process variation. . . . . . . . . . . . . . . . . 40 6.1 Comparison of gate oxide and gate design between a bulk MOSFET (top), and high-k MOSFET (bottom) [13]. . . . . . . . . . . . . . . . . . . . . 42 vii List of Tables 5.1 Comparison of various currents and clock period of a CMOS inverter op- erating at 0.4 V for 45nm bulk and high-k technologies. . . . . . . . . . . 29 5.2 Simulated performance of 32-bit ripple carry adder designed in 45nm bulk technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Simulated performance of 32-bit ripple carry adder designed in 45nm high- k technology. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.4 Comparison of mean and standard deviation of critical path delays for 30 and 1000 random samples. . . . . . . . . . . . . . . . . . . . . . . . . . . 33 5.5 Yield of circuit designed in 45nm bulk and high-k technologies when af- fected by process variations. . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.6 Comparison of average energy/cycle and clock period with and without process variations for a 32-bit ripple carry adder. . . . . . . . . . . . . . 38 viii Chapter 1 Introduction Gordon Moore, co-founder of Intel, famously stated in 1965 \The amount of transistors which can be inexpensively placed on an Integrated Circuit doubles every 18 months." This statement has been dubbed as Moore?s law and scaling down of transistors has been the trend of the industry ever since [45]. We have come a long way since 1971 when the semiconductor manufacturing process was 10 m, now we are adopting 32nm technology and research is being done to implement 22nm tech- nology and beyond. Evolving nanometer CMOS technologies provide better function- ality, higher performance and greater levels of integration but su er from increased subthreshold leakage and excessive process variation. With the industry and mar- ket emphasizing on "performance per watt" and "performance per joule", there is a growing need for new power and energy saving techniques for the increased power and energy dissipation caused due to scaling down of transistors. This thesis work examines the 45 nm bulk and high-k metal gate technologies. Aggressive voltage scaling techniques described in previous research [18, 19, 41, 54, 60] was used to evaluate how a chosen circuit?s (32-bit ripple carry adder) power and energy consumption varies with a change in supply voltage (Vdd). After obtaining the optimum Vdd at which the minimum energy per cycle occurs, the results were compared the for both processes. The performance of a 32-bit ripple-carry adder circuit was evaluated for the entire range of supply voltages over which it displays a correct functionality. Lowering voltage increases delay, reducing the maximum clock frequency. We use the maximum permissible clock rate and the energy per cycle at that clock rate as two performance criteria. 1 The same 32-bit ripple carry adder circuit was designed in both 45 nm bulk and high-k technologies in order to compare which technology is better suited for a low power and higher energy e cient design. The minimum energy per cycle operation occurs at a subthreshold voltage for both designs. For minimum energy, the bulk technology has a very low performance ( 7 MHz). However, high-k technology works at a much higher 250 MHz clock. Faster clock rate reduces the leakage energy making high-k almost twice as energy e cient compared to bulk. This thesis also examines the relationship between energy per cycle versus supply voltage and how the minimum energy point behaves against speed and energy devia- tions due to process related parametric variations for di erent technologies provides a stable equilibrium. These deviations can be expected to be lower for high-k tech- nology compared to those circuits designed in bulk technology that are commonly in use. These deviations are also lower compared to those at higher supply voltages that are commonly in use. Monte Carlo simulations for various parameters like threshold parameter (vth0), oxide thickness (tox), and mobility (u0) of the technology model les [5] were conducted, and the variations were compared with the ideal scenario (no process variations) to see how total power and energy varied with the ideal number. We conclude that there is a signi cant improvement in performance when the process is changed from bulk to high-k technology. The circuit modeled in high-k showed an operating frequency of 250 MHz which is a signi cant jump from bulk CMOS technology while retaining the advantage of low energy consumption. Fur- thermore, from the nature of the energy versus Vdd graph, we hypothesize that the operation at subthreshold Vdd is more resilient to process variation than that at the normal Vdd for both high-k and bulk technologies. This thesis is divided further into ve more chapters. Chapter two is the back- ground chapter, and it gives a brief summary of all the important work done in the area of this thesis, and work that has been an inspiration to pursue this research. 2 This chapter has a section which explains about technology scaling, and why there has been a shift from bulk to high-k technology. In the next section, it gives a back- ground about di erent voltage scaling techniques used in this thesis research. The last section talks about various kinds of process variation, and how each one can a ect the threshold voltage and current of a digital circuit. Chapter three talks about various tools and techniques used to conduct the ex- periments of this thesis. The working of various design tools like Leonardo Spectrum [4], Design Architect [1], HSPICE [3] etc. is explained, and how voltage scaling and process variation techniques are applied in this particular experiment is elaborated. Chapter four elaborates the methods by which the various tools and techniques dis- cussed in the previous chapter are used to conduct the simulation of the circuit. It explains how the experiment was conducted, and gives a step by step procedure so as to provide the reader an easy guide to repeat the experiment if necessary. Chapter ve discusses the results of the experiment conducted using the methods mentioned in the previous chapter. Using the literature review as reference, this section validates the obtained results and explains the meaning of the data obtained from the experiment. Finally, Chapter six concludes the thesis by summarizing all the previous chapters, discusses the practical applications of this thesis, and gives an overview about the future direction this research could lead to. 3 Chapter 2 Background In the 1980s, there was a switch to CMOS logic from other forms like TTL, NMOS logic etc. CMOS had a lot of advantages like high noise immunity, ability to integrate higher logic functions on a chip, and low power consumption [65]. The total power (Ptotal) dissipated in a CMOS logic gate consists of static power (Pstatic) and dynamic power (Pdynamic). In a typical CMOS circuit, most of the power dissipated is dynamic power while static power makes up a small part of the total power dissipated. Scaling down of transistors every two years [45] showed a reduction in total power dissipation because of a reduction in dynamic power as the transistors switched faster. In 1971, Meindl and Swanson concluded that CMOS circuits o ered an advantage of 10 to 1000 times in power-speed product when compared to a bi-polar junction transistor (BJT) [43]. They expanded on the work done by Keyes [30] and derived that \fundamental limits on power-speed performance are imposed by the uncertainty energy, the thermal energy, and the minimum high speed switching power." They identi ed the advantages of CMOS over BJT transistors like zero standby power drain, reduced load capacitance, and lower supply voltage of a CMOS digital circuit. All this was achieved without permitting degradation of fan-in and fan-out, and introducing noise immunity to a logic gate. They also showed the relation between the delay in a logical state and various circuit design parameters as shown below: Td 10 LW tox" ox 1 n CL Vdd (2.1) 4 where Td = Delay per logical state L = Channel length W = Channel width tox = Oxide thickness "ox = Oxide permittivity n = Electron surface mobility CL = Load capacitance Vdd = Supply voltage However, when transistor sizes shrunk to 90 nm and below, two new trends began to emerge. The rst one was that the industry literally \ran out of atoms" to insulate the transistor gate [13]. Basically, because of continuous scaling down of transistors following Moore?s law [45], the SiO2 layer insulating the gate had become only a few atoms thick and any further scaling would have caused a breakdown of the transistor because of the heat due to high power dissipation. The scientists at Intel came up with an innovative solution to counter this problem. They used materials with high dielectrics (high-k) like metal and metal oxides to build the transistor gates [2]. Other researchers were also researching into high-k transistor designs to achieve greater power and energy savings [11, 36, 44, 48]. Kim et al. [35] highlighted two components of leakage current. One is the sub-threshold leakage current (Isub), which is a weak inversion current in the device, and the other is gate leakage current (Iox) which is a tunneling current through the gate oxide insulation. 2.1 Technology Shift Chandrakasan et al. [22] derived equations on how the leakage current com- ponents (Isub and Iox) depend on various parameters like threshold voltage, supply 5 voltage, and oxide thickness. Sub-threshold current (Isub) is de ned as: Isub = K1We Vth nV 1 e Vdd V (2.2) where W = Gate width V = Thermal voltage Vth = Threshold voltage Vdd = Supply voltage K1 and n are experimentally derived parameters Roy et al. [50] said that the subthreshold conduction is dominated by the di usion current caused due to weak inversion. This weak inversion current de nes the o state leakage because of low Vth. The authors in [50] de ned a characteristic called subthreshold slope which indicates how e ectively a transistor can be turned o when Vdd is below Vth and is de ned as: St = 2:3kTq 1 + CdmC ox (2.3) where Cdm = Depletion layer capacitance Cox = Gate oxide capacitance Ideally, the value of the slope in equation 2.3 should be as low as possible. With a shift towards high-k technology, due to the increase in gate oxide capacitance because of the use of high-k dielectric materials, transistors when operated in the subthreshold region switch faster due to a larger gate oxide capacitance resulting in a faster rate of decrease of Ioff. Another component of subthreshold current is the Drain Induced Barrier Lower- ing (DIBL) current. In a short channel device, the threshold voltage and subthreshold 6 current varies with the drain bias. It occurs when the energy barrier at the surface between the source and drain preventing electrons from owing to the drain reduces, causing an increase in subthreshold current due to the lowering of the threshold volt- age. Roy et al. showed that DIBL does not change the subthreshold slope but only a ects the threshold voltage [50]. From equation 2.2, we can see there are two ways to reduce subthreshold current. The rst one is to reduce the supply voltage hereby, reducing the exponential term in the equation and hence reducing the current. The other technique is to increase the threshold voltage (Vth), because it appears as a negative exponent, and hence can cause a dramatic change in current even in small changes. However, since the frequency of the circuit depends on the operating voltage and threshold voltage: f/ (V Vth) V (2.4) where f = Frequency = Activity factor An increase in Vth would cause a decrease in performance of the circuit which is undesirable. The second equation derived by Chandraksan et al. [22] which illustrates the factors a ecting gate-oxide leakage current is: Iox = K2W V Tox 2 e Tox=V (2.5) where W = Channel Width Tox = Oxide Thickness K2 and are experimentally determined. 7 It is seen clearly that a reduction in the oxide thickness will cause an increase in the eld across the gate-oxide. The high electric eld along with a low oxide thickness caused the electrons to tunnel through the oxide layer resulting in gate oxide leakage current. There are two mechanisms of tunneling through the gate oxide: Fowler- Nordheim (FN) tunneling, and direct tunneling [50]. The authors showed that the tunneling probabilities are di erent in these two cases leading to two di erent types of gate leakages. In FN tunneling, electrons tunnel into the conduction band of the oxide layer. In [50], Roy et al. derive that the FN current represents the tunneling through the triangular potential barrier and is only valid for Vox > ox, where Vox is the voltage drop across the oxide, and ox is the barrier height for electrons in the conduction band. The authors noted that the measured value of FN tunneling is very small, and can be easily neglected when the device is in normal operating mode. In direct tunneling, electrons tunnel directly to the gate through the forbidden energy gap of the oxide layer. This phenomena occurs in very thin oxide layers, namely on the order of 3-4 nm. Direct tunneling occurs when Vox < ox as electrons tunnel through the trapezoidal potential barrier instead of the triangular barrier [50]. Direct tunneling has three mechanisms: electron tunneling from conduction band (ECB), electron tunneling from valence band (EVB), and hole tunneling from valence band (HVB) [20]. In NMOS devices, ECB controls the gate to channel tunneling current, while EVB controls gate to body tunneling in depletion-inversion, and ECB controls it in accumulation. In PMOS devices, HVB controls the gate to channel tunneling, while gate to body leakage is controlled by EVB in depletion-inversion, and ECB in accumulation [20, 50]. The authors in [50] showed that tunneling associated with HVB is much less than tunneling associated with ECB, leading to lower leakage current in PMOS compared to NMOS. 8 As high-k gates can be thicker when compared to \bulk" or SiO2 gates, gate leakage was reduced hence causing the devices to run cooler. These new transistors worked so well that Intel has started incorporating them in their new microprocessor designs starting from the Penryn chip lineup [13]. However, the shift in transistor design did not completely solve the primary problem faced by chip makers which was increased power and energy dissipation due to leakage at sub 90 nm technologies. With a reduction in transistor size, it was seen that although scaling caused a reduction in dynamic energy per cycle due to reduced capacitances in the circuit, there was an increase in leakage current of the circuit due to scaling down of the threshold voltage causing a signi cant increase in the static power dissipation [15]. Hence, there is a high interest in developing design techniques for power and energy e cient circuits using high leakage nanometer technologies. 2.2 Voltage Scaling The speed of digital circuits is currently limited by the energy density. Shrinking feature sizes will continue to have the advantage of higher degree of integration, resulting in lower cost, provided energy density can be kept in control. Another characteristic that will assume increasing signi cance is tolerance to larger process variation of smaller features. The supply voltage has the strongest in uence on all components of power and energy of a digital CMOS circuit. Meindl and Swanson mathematically showed that to obtain the greatest power saving and the least power-speed product, the circuit must be operated at the lowest supply voltage practically possible by the design technology [57]. Their calculations showed that CMOS transistors did not abruptly turn o below the threshold voltage but acted as weak inversion devices. They determined that the smallest theoretical supply voltages at which circuits could function is approximately 8kT=q 0:2V at T = 300 Kelvin, where k is the Boltzmann constant, T is absolute temperature, and 9 q is the electron charge. They also experimentally noticed that reduced operating temperatures permitted lower supply voltages theorized by [30]. One technique high- lighted in their paper was ion implantation of boron for adjusting the turn-on voltages for both p and n transistors, achieving an operation close to their derived theoretical limit. However, because of very low performance for technologies in use at that time such low voltage operation was not adopted in practical systems. Another approach has been to examine the energy minimization for circuits oper- ating in the sub-threshold region. Studies have shown subthreshold operations have a number of advantages, namely, improved gain, noise margin, and greater energy e ciency at lower frequencies than the standard CMOS [54]. The authors simulated a chain of inverter gates forming a ring oscillator and noticed the following: The power consumption is linearly dependent with the operating frequency at higher frequencies due to the dominance of dynamic power component. The power consumption becomes independent of operating frequency at lower frequencies as static power is more dominant. Subthreshold circuits consume less power than strong inversion circuits at the same operating frequency. The authors in [54] also simulated subthreshold pseudo-NMOS circuits and com- pared the results with its CMOS counterpart. They found that pseudo-NMOS has comparable in its power dissipation and robustness with CMOS but with less area, ca- pacitance, and has an improved performance. However, very careful sizing of PMOS to NMOS ratio is needed in order to ensure the proper functioning of the circuit. Calhoun and Chandrakasan further examine solutions for optimum supply volt- age (Vdd) and threshold voltage (Vth) to minimize energy in subthreshold operations of digital circuits [19]. The authors identi ed that there is a maximum achievable frequency for a given circuit operating in the subthreshold region. They observed 10 that any work done on strong inversion optimization did not account for gate leakage even though it is a signi cant contributing factor in deep submicron technologies. Their calculations showed that parameters like gate current, gate-induced drain leak- age (GIDL), and pn junction leakage are negligible when compared to sub-threshold current because they roll o much faster with Vdd. Their paper highlights the depen- dence of minimum energy point on technology, design characteristics of the circuit, and operating conditions like temperature, duty cycle, workload etc. They showed that in sub-threshold region, the optimum Vdd changes by several hundred milli volts when the above parameters are changed leading us to infer that circuits are very sen- sitive to process variations in subthreshold voltage operations. They also conclude that the current standard cell libraries also show reduced energy per operation for a minimum sized device. In a follow up paper, Calhoun and Chandrakasan successfully showed test chips fabricated in 90nm technology operating at 330 mV supply voltage while obtaining energy savings on the order of 9X compared to other reduced performance scenarios [18]. They proposed a technique called \Ultra-dynamic voltage scaling" where the circuit will work at normal operating voltage when speed of circuit or performance is the primary criteria and at sub-threshold voltage when energy conservation is the main motive. This technique made sense as for a majority of circuits, sub-threshold operations was only needed when a major section of the chip was in \OFF" mode, and needed to \wake up" or if the entire circuits were in sub-threshold region (e.g. microsensor mode). This gave the users exibility to operate the circuits either in an energy e cient mode or performance mode depending upon their need. Kwong and Chandrakasan highlight two major challenges faced by sub-threshold voltage designs and can potentially impact circuit functionality [41]. The rst one was that the drive-current (Ion) is lower in sub-threshold region when compared to strong inversion. Hence, the ratio of active to idle leakage current (Ion/Ioff) is lower. This 11 means that idle leakage may counter the active current and the output of the device may not pull completely to Vdd or ground. Another problem faced by sub-threshold voltage operations highlighted by [41] was process variations. Global variations can a ect the entire circuit and its operations throughout the voltage scale. In sub- threshold regions, it is seen at skewed P/N corners with either strong PMOS/weak NMOS or vice versa. However, local uctuations mainly, random dopant uctuations (RDF) cause random shifts in threshold voltage (Vth). These shifts can cause the shifting of the minimum energy operating point and hence should be accounted in circuit modeling as well. The authors also concluded that optimum Vdd need not occur at the lowest voltage at which the circuit functions correctly. This result was quite signi cant as it disproved the conclusion drawn by Meindl and Swanson [43]. The reason was the increased leakage of the sub-micron devices. Zhai et al. highlighted the challenges of subthreshold voltage operation in SRAM designs [63]. They highlighted three key challenges. First was a reduced Ion/Ioff cur- rent ratio which led to a di culty in distinguishing between the read current of an accessed cell and the leakage current in the unaccessed cell. Another key problem highlighted by [63] was the change in gate sizing requirements in low voltage opera- tions. The read and write stability of any conventional SRAM are heavily dependent upon the pull-up, pull-down, and pass transistors whose strengths can be drastically a ected due to skewed PMOS to NMOS Vth ratios. The most important challenge to low voltage SRAM designs is the increased sensitivity to process variations. Even small variations have known to cause mismatches hence causing functional failure [63]. The authors presented a novel 6 transistor SRAM design in 0.13 m capable of overcoming these challenges and successfully operating at subthreshold voltages. Their results showed that the proposed design works successfully between 1.2 V to 193 mV while providing a 36% improvement in energy over other SRAM proposed 12 designs with less area overhead. Hanson et al. have also designed a processor ca- pable of working in the subthreshold voltage region of transistors [25]. Their design was used for sensor applications and showed correct operations at 350 mV operating voltage while consuming only 3.5pJ of energy per cycle. Dual voltage design in the subthreshold voltage range has recently been studied and shown to have energy and speed advantages [33, 34]. In [34], Kim and Agrawal obtained a point which they call the \true minimum" by using dual sub-threshold voltage supplies. Using these dual supplies, the authors were able to lower the energy per cycle to a point below the known minimum energy energy point. They avoided the use of level converters which are usually needed in any dual level voltage design by implementing mixed integer linear programs (MILP) hereby negating the disadvan- tages of level converters such as delay insertion and power consumption. The authors were successfully able to achieve a saving of 23% for a 16 bit ripple carry adder and 5% for a 4 4 multiplier which was a worst case scenario in their case. In their follow up paper [33], they achieved an energy savings of 25% for various ISCAS?85 benchmark circuits. Subthreshold voltage operation may also have an advantage in extending the battery lifetime in portable and mobile electronics [40]. In this paper, Kulkarni and Agrawal examined the energy consumption of a circuit and observed the impact of the e ciency of the battery. They observed the need for controlling the power consumption in order to control the size of the battery. They demonstrated that for most circuits, the e ciency of the battery reduces for higher currents and operating the battery at sub-threshold voltages (0.3 V in their case) vastly improved the battery lifetime, which is critical for today?s portable electronic devices. Abouzeid et al. developed a 45nm CMOS cell library which was optimized for ultra-low power applications. They developed a decoder circuit, which operated at a speed of 457 kHz when operated at 0.35 V [6]. That point was the minimum energy 13 point and they achieved a total energy consumption of 3.9 fJ per cycle [6]. Tran and Baas designed a 32 bit fast adder which functioned successfully at subthreshold voltage regions. They showed that their design performed successfully while being most energy e cient at 0.37 V with a frequency of 100 MHz [58]. However, their circuit was designed in PTM 45nm bulk technology. Since, the shift towards high-k, a study is needed to see how the shift towards high-k would a ect circuit performance in terms of speed and energy e ciency. 2.3 Process Variation Till now, we have seen a lot of mentions of the term \process variation". It is the natural variation occurring in the parameters of transistors (like threshold parameter, oxide thickness, channel width and length, mobility etc.) during the fabrication of integrated circuits. William Shockley rst discovered random variation in semicon- ductor devices during his analysis of random uctuations in junction breakdown [53]. He theorized that the e ects of spatial uctuations of donor and acceptor ions are randomly distributed according to a Poisson distribution. Keyes expanded on Shock- ley?s work by studying the e ect of randomness of impurity atoms on the electrical characteristics of a MOSFET [31]. From his models, he concluded that threshold voltages are normally distributed in a square transistor. In 1974, Schemmert and Zimmer used the conclusion drawn by Bauer et al. [12] that threshold voltage (Vth) depends upon the depth of penetration of ions during ion implantation, among other parameters and introduced a procedure for minimizing threshold-voltage sensitivity of ion-implanted MOSFETs due to di erent process pa- rameters [51]. Their results showed a maximum deviation of 10% for tox. A Monte Carlo analysis on a small MOSFET conducted by Alvarez and Akers in 1981 showed 14 that controlling the process variation parameters to 10% yielded a threshold volt- age variation of 15% [8]. They also noticed that the distribution was normal, and almost 95% of the variance was around 100 mV about the mean threshold voltage. Agrawal and Nassif further classify process variation into two sub-categories: random variation and systematic variations [7]. They further classify systematic vari- ations into across- eld and layout dependent variations. The authors in [7] explain that across- eld variations can cause identical devices at di erent locations of the reticle to behave di erently. They classify the sources of error are caused due to photolithographic and etching sources (dose, focus, expose variations etc.), lens aber- rations, mask errors, and variations in etch loading [16, 17, 27, 62]. The authors characterize layout dependent variations as the one causing di erent layouts of the same same device to have di erent characteristics even when they are close to each other. They note that these variations are predictable and can be modeled according to di erent deterministic factors such as layout structure and topological environment surrounding the device layout. Agrawal and Nassif [7] characterize random variation as unpredictable random uncertainties in the fabrication process like uctuations in the number and location of dopant atoms, and poly-silicon gate line-edge roughness. According to authors in [9, 23, 24, 32], line-edge roughness and line-width roughness can cause an increase in sub-threshold current and a degradation in the threshold voltage. Random variations can cause device mismatch of identical and adjacent devices and the deviation of threshold voltage caused due to these variations is represented by an equation derived by Stolk et al. [55]: Vt = 4p4q3 Si B 2 ! :Tox ox : 4pN q WeffLeff (2.6) 15 where Tox = Gate oxide thickness N = Channel Dopant concentration Weff and Leff = E ective channel width and length Siand ox = Permittivity of silicon and oxide B = 2kBT ln(N=ni) (with kB Boltzmann?s constant, T the absolute temperature, and ni the intrinsic carrier concentration) The above equation illustrates that mismatch reduces with a decrease in doping (N) and gate oxide thickness (Tox) and increases when e ective length and width decreases. Kuhn et al. from the Technology and Manufacturing Group at Intel cited that high-k metal gates are also subject to variations in oxide thickness, xed charge, and interference traps [39]. They note that these physical changes result in parametric variations in drive current, gate tunneling current, or threshold voltage. Studies show that intrinsic threshold voltage uctuations induced by local oxide thickness variations become comparable to voltage uctuations introduced by Random Dopant Fluctuations (RDF) in deep submicron MOSFETs [10]. By evaluating gate-tunneling leakage current theoretically and experimentally for 1.2 - 2.8 nm SiO2 gate oxides in MOSFETs, Koh et al. showed that when the gate oxide tunnel resistance becomes comparable to the gate poly-Si resistance, the statistical distribution of gate-tunnel leakage current causes large uctuations in Vth [37]. Kaushik et al. studied the e ects of xed charge in the high-k layer and concluded that mobility and uniformity of threshold voltages were a ected by variations in the xed charge [29]. Another concern highlighted by [39] is mobility degradation and Vth instability due to fast transient charging (FTC) in electron traps. Investigation of e ects of FTC by studying the impacts metal gate electrodes on mobility degradation suggest that the increase in FTC can be attributed to the higher densities of the oxygen atom 16 vacancies in the dielectric caused due to dielectric induced scavenging processes [61]. Various optimization techniques have been shown to reduce the charge trapping pro- cess [38, 49]. Management of process variation is playing a greater important role in technol- ogy scaling and CMOS literature has always shown process variation as a critical element in semiconductor fabrication. Until better fabrication and post-lithography techniques are designed to minimize process variations, it must be considered in all circuit and design simulations in order to accurately guess how a real world model would actually function. 17 Chapter 3 Tools and Techniques This section gives an introduction to the various tools and techniques used to conduct the experiments of this thesis. There are di erent tools for circuit modeling, netlist generation, simulation, process variation, and result analysis. Also, there are di erent techniques to estimate the minimum energy operating point, and simulation of circuit by varying di erent process parameters. 3.1 Test Circuit The rst step to performing any experiment is to choose a test circuit. After a speci c test circuit is chosen, the decided tools will be used to apply the appropriate technique for conducting the experiment. Usually, a simple replicable circuit or a benchmark circuit where performance and working can be easily monitored is chosen. For this thesis, a 32-bit ripple carry adder was chosen for its simple design yet it has a su cient logic depth for the proper utilization of the design technique. Figure 3.1: Schematic of a 32-bit ripple carry adder. 18 Figure 3.1 shows the basic schematic of a 32-bit ripple carry adder. A[1:32] and B[1:32] are two 32-bit inputs to the adder, Ci is the carry in to the rst adder, Co is the carry out from the last adder, and S[1:32] are the sum outputs of each full adder cell. A ripple carry adder consists of a chain of full adders where the carry output of the least signi cant bit (LSB) adder goes into the next adder. This way, the carry signal "ripples" through the chain of adders hence the term, ripple carry adder. In a 32-bit ripple carry adder, the carry signal must propagate through 32 iterations of 1 bit full adders before the next set of input vectors can be applied. Thus, the critical path delay of the adder is de ned as total path delay between the carry in (Ci) signal given to the rst adder and the carry out (Co) of the last adder. 3.2 IC Design and Simulation Tools This section discusses the various tools used for designing and simulating the test circuit. 3.2.1 Leonardo Spectrum Leonardo Spectrum [4] is a logic synthesis tool from Mentor Graphics Corp. Logic synthesis is the process of translating a Hardware Description Language (HDL) into a technology speci c gate-level description. Leonardo Spectrum [4] o ers de- sign capture, VHDL and Verilog entry, register transfer level debugging for logic synthesis, constraint based optimization, timing analysis, encapsulated place-and- route, and schematic viewing for Complex Programmable Logic Devices (CPLD, Field Programmable Gate Arrays (FPGAs), and Application Speci c Integrated Circuits (ASICs). 19 3.2.2 Design Architect Design Architect [1] is a scalable design de nition environment provided by Men- tor Graphics Corp. Since, it can interface easily with Leonardo Spectrum, this tool can import the netlist generated by Leonardo Spectrum, and display the Register level or transistor level design of the desired circuit. It can model digital, analog or mixed-signal blocks, and can quickly simulate the entire hierarchal design. 3.2.3 HSPICE Simulation Program with Integrated Circuit Emphasis (SPICE) is a general pur- pose electronic circuit simulator used to check the integrity of circuit design and predict circuit behavior [47]. HSPICE is a circuit simulator tool derived from SPICE and designed by Synopsys Inc. in order to predict the timing, functionality, power consumption, and yield of their designs. HSPICE takes a text netlist describing the circuit elements like transistors, resistors, capacitors etc. and their connections, and translate this description into solvable equations, and produce the nal result. It is common to use SPICE simulators to simulate Monte Carlo Simulations to doc- ument the e ect of process variations on any circuit, hence providing an accurate approximation of the yield of the circuit when fabricated. 3.3 Circuit Design and Simulation Techniques This section explains how the circuit was modeled using a HDL before being optimized by the tools explained in the previous section. It also explains how process variation for various design parameters was modeled using Monte Carlo simulations in SPICE. 20 3.3.1 VHDL A popular high level description language for system and circuit design is VHDL. The language has various levels of abstraction and supports behavioral, structural, and data ow descriptions. Although behavioral statements are executed sequentially, the structural and data ow descriptions in VHDL display a concurrent behavior i.e, all statements written in that format are executed concurrently. Hence, the order of the statements are not important. 3.3.2 Monte Carlo Analysis Monte Carlo experiments can be de ned as a collection of computational algo- rithms that compute results by repeated random sampling. This method is most often used when it is impractical or impossible to compute an exact result because of reliance on random numbers. Monte Carlo simulations are particularly useful in studying process variations, more speci cally, how variations in process parame- ters (like vth0, mobility, oxide thickness etc.) of transistors can a ect the various functional parameters (like delay, drive current, threshold voltage, power dissipation, energy etc.) of the circuit. Designers use this method to correctly estimate 3 sigma corners and optimize their circuits to get the best yields. 3.4 Predictive Technology Model Predictive Technology Models are customizable and predictive model les for transistor and interconnect technologies. They are compatible with SPICE, easily scalable for a wide range of process variations, and provide accurate models from 180 nm to sub-45 nm technologies [5]. In today?s fast paced scaling of MOSFET tech- nology, research and circuit design must begin before a future generation of MOS- FET technology is fully implemented [21]. Challenges like process variations, leakage current, and reliability must be properly addressed for each technology before being 21 embraced fully [64]. Hence, it is important for researchers to work with fully customiz- able and accurate transistor models for each technology. Almost all semiconductor companies guard their models closely, and do not disclose the data of their models in order to prevent industrial espionage. Hence, it is critical for researchers to use models which are not only available in open source but also provide accurate results when compared to benchmark circuits designed using industrial models. Figure 3.2: Comparison of PTM and industry?s technology model for Vdd and Vth scaling vs. e ective length (Leff) for for a range of technology nodes [64]. 22 Figure 3.3: Comparison of PTM and industry?s technology model for channel doping concentration Nch vs. e ective length (Leff) for for a range of technology nodes [64]. The authors in [64] have successfully developed technology models for a range of 130nm to sub-45nm. By analyzing Figures 3.2 and 3.3 which were drawn from their paper, the conclusion shows that results obtained from their model matches closely with data obtained from the industry. PTM has especially shown excellent predictions for 45 nm technology node, along with better scalability for a wide range of process and design conditions. Hence, it is highly preferable to use PTM models for use in modeling and simulation of circuits when industrial models are not available. 23 Chapter 4 Methodology 4.1 Test Circuit Modeling The 32-bit ripple carry adder circuit was rst designed using VHDL. The VHDL model was then imported into Leonardo Spectrum tool [4], which can create a simulat- able netlist for the VHDL model. A circuit netlist can be created for any technology. For this thesis, the circuit was modeled in TSMC 0.18 micron technology. Leonardo Spectrum generated a verilog le which contained the properly synthesized netlist. This synthesized verilog le was then imported into the Design Architect tool [1], which gave the schematic of the 32-bit ripple carry adder using the standard TSMC cell libraries. The Design Architect tool has an internal SPICE simulator which can internally generate a SPICE netlist. This SPICE netlist was further modi ed by changing the width of all transistors from 0.18 m to 45 nm while preserving the width over length (W/L) ratio. Instead of using the TSMC libraries as used by the Design Architect, we used the Predictive Technology Model (PTM) for both 45 nm bulk and high-k technologies [5]. This was done because Design Architect did not provide 45 nm libraries, and the research required us to simulate circuits in the latest transistor technologies. 4.2 Minimum Energy Point Estimation To calculate the voltage at which the circuit operates at minimum energy, we use a technique called "Dynamic Voltage Scaling" used by Calhoun and Chandrakasan 24 [19]. This technique consists of changing the operating voltage step by step, measuring the critical path delay, and power dissipated by the circuit at each voltage step, and calculating the energy dissipated by multiplying the power and the delay. Eavg = Pavg t (4.1) Pavg = Vdd Iavg (4.2) where Eavg = Average dissipated energy t = Critical path delay Pavg = Average dissipated power Vdd = Operating voltage Iavg = Average current drawn by the circuit At each voltage step, there is a change in path delay as well as drawn current. In other words, when voltage is decreased, there is an decrease in drawn current but an increase in critical path delay. Hence, to nd the minimum energy dissipated by the circuit, the delay and current at each voltage step needs to be measured. To calculate the delay at each voltage, the critical path needs to be activated. Therefore, the following vectors were applied. First, all the inputs (A, B, and Ci) were initialized to 0. This sets all the sum outputs and the carryout to value 0. In the second vector, all A inputs (A[1:32]) were set to 1, while keeping all B inputs (B[1:32]) to 0. All sum outputs thus became 1, but there was no change in the carry signal and there was no rippling of bits through the carry signals. A third vector then set at Ci at 1 to activate the critical path. As a carry was propagated through all 32 full adders, two critical paths were simultaneously activated. While the carry bits in all the 32 full adders changed to 1, sum outputs were simultaneously brought back to 0. The time delay between the initializing of the 3rd test vector, and changing of 25 the output signals of the nal adder was measured. t1 = tCo tCi (4.3) t2 = tS32 tCi (4.4) where t1 and t2 = Path delays tCi = Time when Ci switched from 0 to 1 tCo = Time when Co switched from 0 to 1 tCi = Time when S32 switched from 1 to 0 The largest time delay out of t1 and t2 is deemed the critical path delay. The critical path determines the frequency of test vector application. This frequency changes for each voltage point and needs to be measured each time there is a voltage step change. After nding the frequency, 100 random vectors were applied to the inputs of the 32-bit ripple carry adder at the maximum operating frequency at that voltage point. On conducting the SPICE simulations using HSPICE [3], the average current consumed by the circuit was measured. It was then multiplied by voltage to give the average power dissipated by the test circuit as given in equation 4.2. To determine the average Energy per cycle, the average power was multiplied with the delay of the circuit as shown in equation 4.1. The average energy per cycle for each voltage step was calculated, tabulated and graphed. 4.3 Process Variation The results obtained using the above described technique is only applicable for an ideal circuit. However, in real life, process variations can cause changes in various transistor parameters like threshold parameter, mobility, oxide thickness etc. Hence, 26 it is important to investigate how a circuit?s characteristics like threshold current, delay etc. changes with process variation. We use Monte Carlo analysis to model process variations in the circuit. For this circuit, we perform three types of variations. In the rst one, we change the threshold parameter (vth0) by 16%. In the second one, the oxide thickness (tox) is varied by a factor of 20%. The last one consists of a variance of both vth0 and tox, and then calculating the mean and sigma values of all three cases. We compare two cases to test the e ect of process variations on the circuit. First, we compare the operation of the circuit at 0.3 V for both bulk and high-k technologies. More speci cally, the critical path delays are measured under the e ect of process variation, and then the mean value is used to run the adder circuit to measure the average current drawn. The average power dissipated, and energy per cycle are calculated using equations 4.1 and 4.2, and the means are compared with the ideal scenario. The second case consists of comparing the operation of circuit designed in high k technology at 0.9 V and 0.3 V. We calculate how process variations a ect the critical path delays and energy per cycle for both voltage points, and compare the means. 27 Chapter 5 Results 5.1 Inverter Simulation Current owing in a circuit has two components: static current and dynamic current. Furthermore, static (or leakage) current has two major components: sub- threshold leakage and gate oxide leakage. Due to reduced feature size of the gate oxide in bulk MOSFET designs, there is an increase in gate oxide leakage which can a ect the delay of a circuit because of electron tunneling through the oxide layer. This issue was addressed by a switch to high-k designs. However, with high-k, due to the presence of a larger dielectric, the oxide capacitance increases leading to larger dynamic current owing through the circuit. Secondly, high-k designs have a thicker oxide layer compared to bulk designs which led to a greater sub-threshold current owing through the transistor. Hence, it is evident that high-k designs will have more dynamic and leakage current owing through the circuit. However, because of greater gate oxide leakage in bulk designs, the delays of circuits designed in bulk technology will be signi cantly larger compared to high-k designs. Therefore, we expect the energy per cycle for high-k designs to be lower compared to bulk designs inspite of high-k having a higher current ow because of the tremendous gain in speed. Before we performed SPICE simulations on the 32-bit ripple carry adder, we simulated a single inverter designed in both 45nm bulk and high-k technologies to understand how current, delay, and energy varies with a switch in technology. We operated the inverter for 10 clock cycles at 0.4 V. Within those 10 cycles, there were 2 transitions occurring 0!1, and a 1!0 transition. The other 8 cycles were idle 28 cycles i.e no transitions were occurring. During idle periods, the only current owing through the circuit will be leakage current which will lead to static power dissipation. During the transition cycles, both leakage and drive current will be owing, hence the power consumed in that period would be the sum of both static and dynamic power. To calculate dynamic current, we can subtract the static current owing during idle periods the current owing during the transition cycles. Table 5.1 shows the values of dynamic current, static current, average current over 10 clock cycles, and clock period (gate delay) for both technologies. Table 5.1: Comparison of various currents and clock period of a CMOS inverter operating at 0.4 V for 45nm bulk and high-k technologies. Static Dynamic Average Clock Energy Technology Current Current Current Period per cycle 10 7 (A) 10 5 (A) 10 6 (A) 10 12 (s) 10 18 (J) 45nm bulk 0.11 0.82 0.83 25.9 8.59 45nm high-k 3.02 5.48 5.72 3.63 8.30 From Table 5.1, it is clearly seen that energy per cycle for high-k designs is lower compared to bulk design even though there is a greater current owing through the inverter designed in high-k. For circuits with greater critical path delay, we expect the gap between the energy per cycles to further increase as increased gate oxide leakage would cause larger circuits to run much slower. 5.2 Minimum Energy Point Estimation From the Tables 5.2 and 5.3, it is evident that when there is a decrease in operating voltage, there is a simultaneous decrease in average drive current and an increase in critical path delay. However, it is seen that with a drop in voltage, the decrease in current is greater than the increase in delay. Hence, there is a gradual reduction in energy per cycle with every voltage drop. We also see that, at a particular voltage (0.3 V), the energy dissipated per cycle is minimum for the circuit, and 29 for voltages below that point, the energy starts to increase. The reason for this is because, as voltage decreases further below that point, the savings in current cannot compensate the huge increase in delay which causes the energy per cycle to increase. Also, the circuit works faster when designed in high-k technology rather than in bulk technology. From Tables 5.2 and 5.3, we nd the frequency of operation at the optimum energy (minimum energy/cycle) point is 250 MHz (critical path delay is 4 ns) for high-k technology while for bulk technology the corresponding frequency for minimum energy/cycle operation is just above 7 MHz (critical path delay is 137 ns). The reason is because there is more drive current owing through the circuit, hence causing the transistors to switch faster, and the critical path delay is reduced due to reduced gate current leakage in high-k. Notably, it is seen that circuits modeled in high-k technology has the advantage of greater energy e ciency as seen in Figure 5.1. In high-k technology, the mini- mum energy obtained is lower at the same voltage than that for the bulk technology. Comparing the minimum energy operations for the two technologies, we nd that for high-k energy per cycle is 40% lower compared to that for the bulk technology. The minimum energy point occurs at 0.3 V for both high-k and bulk technologies. Again, the reason is because although there is a higher drive current in the circuit designed in high-k technology, the improvement in delay is more than enough to accommodate the increase in the drive current, hence causing energy savings. 30 Table 5.2: Simulated performance of 32-bit ripple carry adder designed in 45nm bulk technology. Operating Average Average Critical path Average Voltage (V) Current Power delay energy/cycle 10 5 (A) 10 6 (W) 10 9 (s) 10 14 (J) 1 18.6 186 0.939 17.5 0.9 12.7 114 1.11 12.7 0.8 8.97 71.7 1.38 9.89 0.7 5.63 39.4 1.88 7.41 0.6 2.96 17.8 3.01 5.36 0.5 1.15 5.74 6.52 3.74 0.4 2.76 1.1 23.4 2.58 0.35 0.119 0.416 54.3 2.26 ?0.3 0.053 0.16 137 2.19 0.2 0.017 0.035 923 3.19 Table 5.3: Simulated performance of 32-bit ripple carry adder designed in 45nm high-k technology. Operating Average Average Critical path Average Voltage (V) Current Power delay energy/cycle 10 5 (A) 10 6 (W) 10 9 (s) 10 14 (J) 1 34.9 349 0.45 15.6 0.9 25.7 231 0.47 10.9 0.8 20 152 0.51 8.10 0.7 15.5 109 0.57 6.16 0.6 10.5 62.9 0.67 4.19 0.5 6.38 31.9 0.87 2.78 0.4 3.20 12.8 1.42 1.82 0.35 1.84 6.42 2.12 1.36 ?0.3 1.09 3.28 3.71 1.22 0.2 0.382 0.764 18.7 1.43 ? Highlighted row indicates minimum energy voltage point 31 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 110?14 10?13 10?12 Voltage (V) Energy/cycle (J) 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 10?10 10?9 10?8 10?7 10?6 Critical Path Delay (s) 45 nm bulk45 nm high?k 45 nm bulk45 nm high?k Figure 5.1: Energy per cycle vs. Vdd for 32-bit ripple carry adder simulated in 45nm bulk and high-k CMOS. 5.3 Process Variation All circuits su er from process variation in the real world. Hence, it is important to understand how process variation will a ect voltage scaling or more speci cally, the minimum energy point with the nominal operating point. On analyzing the graphs in Figure 5.1, we infer that circuits designed in 45nm high-k technology should be more resilient to process variations because the energy-delay curve is lower when compared to circuits designed in 45nm bulk technology and that minor changes would not cause any drastic e ect on e ciency or performance. Two parameters, threshold parameter (vth0) and oxide thickness (tox) are varied separately, and then together. vth0 is varied by a factor of 16% because that is the deviation cited by the ITRS roadmap [26]. Oxide thickness is varied by a factor of 20% as calculated by the authors in [46]. The delay was measured after performing a Monte Carlo analysis of a 1000 samples of the circuit for the voltage points of 0.9 V and 0.3 V in high-k technology, and for the point of 0.3 V designed in bulk technology. The delays obtained by the analysis of the 1000 samples was compared with the delays obtained by the analysis 32 of 30 random samples. Critical path delay was measured for each sample through HSPICE [3] simulation using a vector pair that activated the critical path. The means (tm) and standard deviations ( ) of the critical path delay for circuits operating at 0.3 V designed in 45nm bulk and high-k technologies, and 0.9 V at high- k technology are tabulated in Table 5.4. It is seen that the means and standard deviations are closely comparable for the 30 and 1000 random samples establishing the fact that 30 random samples can be used to model process variations. Table 5.4: Comparison of mean and standard deviation of critical path delays for 30 and 1000 random samples. 30 samples 1000 samples Operating Process Mean Standard Mean Standard Voltage Variations (tm) Deviation ( ) (tm) Deviation ( ) 10 9 s 10 9 s 10 9 s 10 9 s 0.9 V high-k vth0 0.488 0.045 0.475 0.048 tox 0.465 0.023 0.474 0.032 Both 0.477 0.055 0.48 0.062 0.3 V high-k vth0 6.36 4.45 6.29 8.57 tox 4.61 1.52 4.23 1.65 Both 6.15 4.95 6.85 9.25 0.3 V bulk vth0 274.4 210.32 225.7 207.32 tox 279.6 171.7 204.6 237.4 Both 192.3 202.03 241.1 238.65 The following gures (Figures 5.2 - 5.4) compare the histograms of the delays for the 30 and 1000 random samples. It can be clearly seen that the two histograms over- lap closely meaning that simulations done using 30 random samples is equivalent to simulations done using 1000 random samples. This experiment was done to establish the above stated fact since all the following results were done using 30 random sam- ples. Experiments using 1000 random samples were unfeasible because calculating the energy can take a duration of almost 3 days for one voltage point, and secondly, there was not enough memory in the computer to store the output from the SPICE le. 33 3 3.5 4 4.5 5 5.5 6 6.5x 10?100 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (a) Variation of vth0 by 16% 3 3.5 4 4.5 5 5.5 6x 10?100 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (b) Variation of tox by 20% 2 3 4 5 6 7 8x 10?100 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (c) Variation of both vth0 and tox Figure 5.2: E ect of Process variation on critical path delay for adder operating at 0.9 V designed in 45nm high-k technology. 34 0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10?70 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (a) Variation of vth0 by 16% 0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10?80 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (b) Variation of tox by 20% 0 0.2 0.4 0.6 0.8 1 1.2 1.4x 10?70 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (c) Variation of both vth0 and tox Figure 5.3: E ect of Process variation on critical path delay for adder operating at 0.3 V designed in 45nm high-k technology. 35 0 0.2 0.4 0.6 0.8 1x 10?60 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (a) Variation of vth0 by 16% 0 0.2 0.4 0.6 0.8 1x 10?60 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (b) Variation of tox by 20% 0 0.2 0.4 0.6 0.8 1x 10?60 0.2 0.4 0.6 0.8 1 Critical Path Delay (s) 1000 samples 30 samples (c) Variation of both vth0 and tox Figure 5.4: E ect of Process variation on critical path delay for adder operating at 0.3 V designed in 45nm bulk technology. 36 Table 5.5: Yield of circuit designed in 45nm bulk and high-k technologies when a ected by process variations. Operating Process Yield Voltage (V) Variations (%) 0.9 V high-k vth0 100% tox 98.9% Both 99% 0.3 V high-k vth0 99.6 % tox 98.9% Both 99.7% 0.3 V bulk vth0 90.7% tox 97.5% Both 79.8% Table 5.5 tells us how many samples out of 1000 function correctly after being a ected by process variation, i.e yield of the circuit. It is seen that circuits designed in high-k technology are more resilient to process variations, has a very low failure rate. Bulk technology, on the other hand, is seen to have a lower yield, and when both parameters undergo process variation at the same time, the yield drops drastically to less than 80% unlike high-k, which still maintains an almost 100% yield. Hence, it could be hypothesized that as more parametric parameters undergo process variation, the yield will be a ected as well. The corresponding sum of mean and 3 gives the worst case delay for a circuit operating at 0.3 V for each technology. This worst case delay was used as clock period to feed 100 random vectors to 30 random Monte Carlo samples of the 32 bit adder circuit and the current drawn from Vdd for each sample was measured. The average current of a circuit sample was multiplied by the current operating voltage to obtain the power, which when multiplied by the clock period gave us the energy/cycle for each random sample. Table 5.6 compares the average values of energy/cycle and the clock period with and without process variations for various technologies and operating voltages. Al- though the clock period almost doubles due to process variations for subthreshold 37 Table 5.6: Comparison of average energy/cycle and clock period with and without process variations for a 32-bit ripple carry adder. Operating Process Clock Period Energy/cycle Voltage (V) Variations 10 9 (s) 10 14 (J) 0.9 V high-k No Variation 0.47 10.9 vth0 (16%) 0.619 12.4 tox (20%) 0.57 120 Both 0.666 87 0.3 V high-k No Variation 3.71 1.22 vth0 (16%) 32 8.15 tox (20%) 9.18 24 Both 36.4 43.2 0.3 V bulk No Variation 137 2.19 vth0 (16%) 847.66 19.6 tox (20%) 916.8 50.4 Both 957.05 62.7 voltages, it is clearly seen that the circuit?s energy consumption is not that far from the nominal energy/cycle. Since we assumed all samples to have a clock period cor- responding to the worst (3 ) delay, it is possible that some circuits may be able to run faster and, for those cases, their individual energy/cycle may come closer to the nominal values or even perform better than that. The graphs in Figures 5.5 - 5.7 highlight the variations in energy/cycle for the circuit operating at 0.3 V and 0.9 V designed in both bulk and high-k technologies. From the table and graphs, it is evident that a combinational circuit designed in high-k technology is more resilient to process variation, has a smaller critical path delay and a lower energy/cycle. 38 0 5 10 15 20 25 3010?14 10?13 10?12 10?11 No. of samples Energy/cycle (J) 0.3 V bulk 0.3 V high?k0.9 V high?k Figure 5.5: Comparison of energy/cycle for di erent adder circuit operations when threshold parameter (vth0) undergoes process variation. 0 5 10 15 20 25 3010?14 10?13 10?12 10?11 10?10 No. of samples Energy/cycle (J) 0.3 V bulk 0.3 V high?k0.9 V high?k Figure 5.6: Comparison of energy/cycle for di erent adder circuit operations when oxide thickness (tox) undergoes process variation. 39 0 5 10 15 20 25 3010?14 10?13 10?12 10?11 10?10 No. of samples Energy/cycle (J) 0.3 V bulk 0.3 V high?k0.9 V high?k Figure 5.7: Comparison of energy/cycle for di erent adder circuit operations when both vth0 and tox undergo process variation. A deviation in the process parameters causes a change in the drive current and critical path delay. This change usually causes the energy/cycle to increase as current and delay are not exactly inversely proportional to each other. However, there are rare instances (in high-k) where their relationship has caused the energy/cycle to decrease from the nominal value resulting in a circuit that runs faster. By analyzing the graphs in Figures 5.5 - 5.7, it is clearly seen that even with process variations, circuits operating at 0.3 V are considerably more energy e cient than circuits operating at 0.9 V. 40 Chapter 6 Conclusion The results presented in this thesis are believed to be accurate and portray a picture of how a device will behave when fabricated in these technologies as the PTM models have shown a trend of closely following the actual fabrication trends. They have also shown better physical scalability over a wide range of process and design conditions [64]. Results indicate that the average power dissipated by the circuit decreases steadily with voltage scaling. This is true for both bulk and high-k designs. Simultaneously, it is also seen that the critical path delay increases or in other words, the speed of the circuit decreases. However, due to a greater drop in power compared to speed, the average energy per cycle of the circuit for both designs also decreases steadily. It is seen that the circuit has a minimum energy at an operating point of 0.3 V below which, the circuit started to dissipate more energy compared to higher voltages. The reason for this was that the drop in power dissipated was not enough to compensate the increase in delay of the circuit leading it to take more energy per cycle to run successfully. Similar work was done by Tran and Baas, and their results showed their fast adder circuit functioning properly at 0.37 V while consuming 34 fJ per cycle [58]. Their design was based on 45 nm bulk PTM model, and since our 45 nm bulk model design also got similar results, [58] validates our results and a rms the conclusions drawn in this thesis. Results also show that high-k technology runs faster, and more energy e ciently when compared to bulk technology. Although, the minimum energy point occurs at 41 the same voltage for both bulk and high-k, the value of average energy per cycle is 40% lower for high-k when compared to bulk. Also, high-k design operated at 250 MHz while bulk design operated at just above 7 MHz showing that high-k designs are faster at sub-threshold voltages as well. Figure 6.1 explains why high-k technology is better than bulk technology. High-k technology has a thicker gate oxide when compared to bulk leading to lower current leakage through the gate oxide via tunneling. Secondly, presence of a metal gate instead of a polysilicon gate allows a better ow of charge in the channel, leading to a larger drive current and hence causing circuits designed in high-k to run faster. Figure 6.1: Comparison of gate oxide and gate design between a bulk MOSFET (top), and high-k MOSFET (bottom) [13]. Recent research has shown that process variation can greatly a ect the function- ality of logic gates [56]. It can also bring in uncertainties in the circuit logic. Shifts in the threshold voltage Vth can drastically a ect the Ion and Ioff in sub-threshold regions causing an exponential shift in the minimum energy point [41]. By analyzing the data from our results, we theorize that high-k technology designs at the minimum 42 energy point will be more resilient to process variations when compared to bulk tech- nology because high-k technologies provide a higher drive current in the sub-threshold region along with a reduction in gate oxide leakage for the same drive current when compared to the bulk technology [13, 52]. It is seen that process variation has a large e ect on the yield of the circuit designed in bulk technology compared to high-k technology. Changes in threshold parameter (vth0) and oxide thickness (tox) caused the yield of the circuit designed in 45nm bulk technology to drop to less than 80%. However, high-k designs showed more resilience and the yield was almost 100% for both normal operating voltages and sub-threshold voltages. Parametric variations also have an e ect on the speed and average energy dissi- pated of the circuit. On performing 1000 Monte Carlo simulations, and comparing its histogram with 30 samples, it is seen that the mean delays are very close to each other. Hence, the conclusions drawn from 30 samples would be the same as the ones drawn from analyzing a 1000 samples although, 1000 samples may provide more accurate gures and comparisons . SPICE simulations have shown that even with process variations, circuits oper- ating at 0.3 V (sub-threshold voltages) remain more energy e cient than at 0.9 V (normal operating voltages). Hence, it is more energy e cient to operate the circuits at sub-threshold voltages rather than at normal supply voltages. Also, it is seen that high-k designs are more resilient than bulk designs not only in terms of yield, but they are faster and more energy e cient compared to bulk designs. Studies have shown that the voltage at which the minimum energy point occurs reduces with change in technology, reached a minimum at 90 nm and then starts increasing with every technology advance [14]. Although we expect the clock rate to further improve and energy per cycle to reduce for 32 nm and ner technologies, some projections by [14] indicate that energy per cycle could increase with a move 43 towards ner technologies. Hence, for lower technologies, the voltage at which the minimum energy point occurs should increase. However, as these studies have been done only for bulk technologies, it is hard to predict how high-k models will behave. Simulations need to be done to check how the minimum energy point moves from 45 nm high-k technology to ner high-k technologies. Hence, future research could probably look into the movement of the minimum energy point when transistors designed in high-k technology are scaled down. It is still unknown how sequential circuits will behave when a ected by parametric variations for ner high-k technologies. Research could be done to understand the e ect of process variations on timing, energy dissipation and yield for sub-threshold operations of sequential circuits. The ultimate minimum energy any circuit can achieve is bounded by the Lan- dauer limit, which is given by kTln2, where k is the Bolzmann constant and T is the absolute temperature in Kelvin. Current studies have shown that the lower bound on the energy to process one bit is about 36,000 times higher than the absolute Lan- dauer limit [28, 42]. A shift towards high-k technology is only a small step towards achieving energy values close to that limit. However, more research and supporting experiments need to be done to nd the limits of high-k technology so that it can lead to actual implementations of digital systems like microprocessors, graphics processors, and digital signal processors. 44 Bibliography [1] \Design Architect." Mentor Graphics. http://www.mentor.com/products/ic nanometer design/custom-ic-design/design architect ic/. [2] \High-k and Metal Gate Research." Intel Website. http://www.intel.com/technology/ silicon/high-k.htm. [3] \HSPICE." Synopsys Inc. http://www.synopsys.com/Tools/Veri cation/ AMSVeri cation/CircuitSimulation/HSPICE/Pages/default.aspx. [4] \Leonardo Spectrum." Mentor Graphics. http://www.mentor.com/products/fpga/ synthesis/leonardo spectrum/. [5] \PTM Website." Arizona State University. http://ptm.asu.edu/. [6] F. Abouzeid, S. Clerc, F. Firmin, M. Renaudin, and G. Sicard, \A 45nm CMOS 0.35 V optimized standard cell library for ultra-low power applications," in Proceedings of the 14th ACM/IEEE international symposium on Low Power Electronics and Design, 2009, pp. 225{230. [7] K. Agarwal and S. Nassif, \Characterizing process variation in nanometer CMOS," in Proceedings of the 44th ACM Design Automation Conference, 2007, pp. 396{399. [8] A. R. Alvarez and L. A. Akers, \Monte Carlo analysis of sensitivity of threshold voltage in small geometry MOSFETs," Electronics Letters, vol. 18, no. 1, pp. 42{43, 1982. [9] A. Asenov, S. Kaya, and A. R. Brown, \Intrinsic parameter uctuations in de- cananometer MOSFETs introduced by gate line edge roughness," IEEE Transactions on Electron Devices, vol. 50, no. 5, pp. 1254{1260, 2003. [10] A. Asenov, S. Kaya, and J. H. Davies, \Intrinsic threshold voltage uctuations in decanano MOSFETs due to local oxide thickness variations," IEEE Transactions on Electron Devices, vol. 49, no. 1, pp. 112{119, 2002. 45 [11] C. Auth, A. Cappellani, J. S. Chun, A. Dalis, A. Davis, T. Ghani, G. Glass, T. Glass- man, M. Harper, M. Hattendorf, et al., \45nm High-k + metal gate strain-enhanced transistors," in Proc. IEEE Symposium on VLSI Technology, 2008, pp. 128{129. [12] L. O. Bauer, M. R. MacPherson, A. T. Robinson, and H. G. Dill, \Properties of silicon implanted with boron ions through thermal silicon dioxide," Solid-State Electronics, vol. 16, no. 3, pp. 289{300, 1973. [13] M. T. Bohr, R. S. Chau, T. Ghani, and K. Mistry, \The high-k solution," IEEE Spectrum, vol. 44, no. 10, pp. 29{35, 2007. [14] D. Bol, D. Kamel, D. Flandre, and J. D. Legat, \Nanometer MOSFET e ects on the minimum-energy point of 45nm subthreshold logic," in Proceedings of the 14th ACM/IEEE International Symposium on Low Power Electronics and Design, 2009, pp. 3{8. [15] S. Borkar, \Design Challenges of Technology Scaling," IEEE Micro, vol. 19, no. 4, pp. 23{29, 1999. [16] Y. A. Borodovsky, \Impact of local partial coherence variations on exposure tool per- formance," in Proceedings of SPIE, volume 2440, 1995, p. 750. [17] T. A. Brunner, \Impact of lens aberrations on optical lithography," IBM Journal of Research and Development, vol. 41, no. 1.2, pp. 57{67, 1997. [18] B. H. Calhoun and A. P. Chandrakasan, \Ultra-dynamic voltage scaling using sub- threshold operation and local voltage dithering in 90nm CMOS," in Proc. IEEE In- ternational Conference on Solid-State Circuits, 2005, pp. 300{599. [19] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, \Modeling and sizing for mini- mum energy operation in subthreshold circuits," IEEE Journal of Solid-State Circuits, vol. 40, no. 9, pp. 1778{1786, 2005. [20] K. M. Cao, W. C. Lee, W. Liu, X. Jin, P. Su, S. K. H. Fung, J. X. An, B. Yu, and C. Hu, \BSIM4 gate leakage model including source-drain partition," in Proc. International IEEE Electron Devices Meeting, 2000, pp. 815{818. [21] Y. Cao, T. Sato, M. Orshansky, D. Sylvester, and C. Hu, \New paradigm of predictive MOSFET and interconnect modeling for early circuit simulation," in Proceedings of 46 the IEEE Custom Integrated Circuits Conference, 2000, pp. 201{204. [22] A. P. Chandrakasan, W. J. Bowhill, and F. Fox, Design of high-performance micro- processor circuits. Wiley-IEEE Press, 2000. [23] C. H. Diaz, H. J. Tao, Y. C. Ku, A. Yen, and K. Young, \An experimentally validated analytical model for gate line-edge roughness (LER) e ects on technology scaling," IEEE Electron Device Letters, vol. 22, no. 6, pp. 287{289, 2001. [24] H. Fukutome, Y. Momiyama, T. Kubo, Y. Tagawa, T. Aoyama, and H. Arimoto, \Direct evaluation of gate line edge roughness impact on extension pro les in sub- 50-nm n-MOSFETs," IEEE Transactions on Electron Devices, vol. 53, no. 11, pp. 2755{2763, 2006. [25] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, et al., \Performance and variability optimization strategies in a sub-200mV, 3.5 pJ/inst, 11nW subthreshold processor," in IEEE Symposium on VLSI Circuits, 2007, pp. 152{153. [26] R. Heald and P. Wang, \Variability in sub-100nm SRAM designs," in Proceedings of the IEEE/ACM International Conference on Computer-aided Design, IEEE Computer Society, 2004, pp. 347{352. [27] C. Hedlund, H. O. Blom, and S. Berg, \Microloading e ect in reactive ion etching," Journal of Vacuum Science & Technology A: Vacuum, Surfaces, and Films, vol. 12, no. 4, pp. 1962{1965, 1994. [28] J. Izydorczyk and M. Izydorczyk, \Microprocessor Scaling: What Limits Will Hold?," Computer, vol. 43, no. 8, pp. 20{26, 2010. [29] V. Kaushik, B. O?Sullivan, G. Pourtois, N. Van Hoornick, A. Delabie, S. Van Elshocht, W. Deweerd, T. Schram, L. Pantisano, E. Rohr, et al., \Estimation of xed charge densities in hafnium-silicate gate dielectrics," IEEE Transactions on Electron Devices, vol. 53, no. 10, pp. 2627{2633, 2006. [30] R. W. Keyes, \Physical problems and limits in computer logic," IEEE Spectrum, vol. 6, no. 5, pp. 36{45, 1969. 47 [31] R. W. Keyes, \The e ect of randomness in the distribution of impurity atoms on FET thresholds," Applied Physics A: Materials Science and Processing, vol. 8, no. 3, pp. 251{259, 1975. [32] H. W. Kim, J. Y. Lee, J. Shin, S. G. Woo, H. K. Cho, and J. T. Moon, \Experi- mental investigation of the impact of LWR on sub-100-nm device performance," IEEE Transactions on Electron Devices, vol. 51, no. 12, pp. 1984{1988, 2004. [33] K. Kim and V. D. Agrawal, \Minimum Energy CMOS Design with Dual Subthreshold Supply and Multiple Logic-Level Gates," in Proc. 12th International Symposium on Quality Electronic Design, 2011. [34] K. Kim and V. D. Agrawal, \True Minimum Energy Design Using Dual Below- Threshold Supply Voltages," in Proc. 24th Annual Conference on VLSI Design, 2011, pp. 292{297. [35] N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J. Irwin, M. Kandemir, and V. Narayanan, \Leakage current: Moore?s law meets static power," Computer, vol. 36, no. 12, pp. 68{75, 2003. [36] Y. Kim, G. Gebara, M. Freiler, J. Barnett, D. Riley, J. Chen, K. Torres, J. E. Lim, B. Foran, F. Shaapur, et al., \Conventional n-channel MOSFET devices using single layer HfO2 and ZrO2 as high-k gate dielectrics with polysilicon gate electrode," in IEEE International Electron Devices Meeting, 2001, pp. 20{2. [37] M. Koh, W. Mizubayashi, K. Iwamoto, H. Murakami, T. Ono, M. Tsuno, T. Mihara, K. Shibahara, S. Miyazaki, and M. Hirose, \Limit of gate oxide thickness scaling in MOSFETs due to apparent threshold voltage uctuation induced by tunnel leakage current," IEEE Transactions on Electron Devices, vol. 48, no. 2, pp. 259{264, 2001. [38] M. Koike, T. Ino, Y. Kamimuta, M. Koyama, Y. Kamata, M. Suzuki, Y. Mitani, A. Nishiyama, and Y. Tsunashima, \E ect of Hf-N bond on properties of thermally stable amorphous HfSiON and applicability of this material to sub-50nm technology node LSIs," in IEEE International Electron Devices Meeting, 2003, pp. 4{7. [39] K. Kuhn, C. Kenyon, A. Kornfeld, M. Liu, A. Maheshwari, W. Shih, S. Sivakumar, G. Taylor, P. VanDerVoorn, and K. Zawadzki, \Managing process variation in Intels 48 45nm CMOS technology," Intel Technology Journal, vol. 12, no. 2, pp. 93{109, 2008. [40] M. Kulkarni and V. D. Agrawal, \Energy Source Lifetime Optimization for a Digital System through Power Management," in Proceedings of the 43rd Southeastern Sympo- sium on System Theory, 2011. [41] J. Kwong and A. P. Chandrakasan, \Advances in Ultra-Low-Voltage Design," IEEE Solid-State Circuits Newsletter, vol. 13, no. 4, pp. 20{27, 2008. [42] R. Landauer, \Irreversibility and heat generation in the computing process," IBM Journal of Research and Development, vol. 5, no. 3, pp. 183{191, 1961. [43] J. D. Meindl and R. N. Swanson, \Potential improvements in power-speed performance of digital circuits," Proceedings of the IEEE, vol. 59, no. 5, pp. 815{816, 1971. [44] K. Mistry, C. Allen, C. Auth, B. Beattie, D. Bergstrom, M. Bost, M. Brazier, M. Buehler, A. Cappellani, R. Chau, et al., \A 45nm logic technology with high-k+ metal gate transistors, strained silicon, 9 Cu interconnect layers, 193nm dry pattern- ing, and 100% Pb-free packaging," in IEEE International Electron Devices Meeting, 2007, pp. 247{250. [45] G. E. Moore et al., \Cramming more components onto integrated circuits," Proceedings of the IEEE, vol. 86, no. 1, pp. 82{85, 1998. [46] S. Mukhopadhyay, A. Raychowdhury, and K. Roy, \Accurate estimation of total leak- age in nanometer-scale bulk CMOS circuits based on device geometry and doping pro le," IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 24, no. 3, pp. 363{381, 2005. [47] L. W. Nagel and D. O. Pederson, \Spice (simulation program with integrated cir- cuit emphasis)," Technical Report UCB/ERL M382, EECS Department, University of California, Berkeley, Apr 1973. [48] S. Natarajan, M. Armstrong, M. Bost, R. Brain, M. Brazier, C. H. Chang, V. Chikar- mane, M. Childs, H. Deshpande, K. Dev, et al., \A 32nm logic technology featuring 2nd-generation high-k+ metal-gate transistors, enhanced channel strain and 0.171 m 2 SRAM cell size in a 291Mb array," in IEEE International Electron Devices Meeting, 2008, pp. 1{3. 49 [49] M. A. Quevedo-Lopez, S. A. Krishnan, D. Kirsch, C. H. J. Li, J. H. Sim, C. Hu man, J. J. Peterson, B. H. Lee, G. Pant, B. E. Gnade, et al., \High performance gate rst HfSiON dielectric satisfying 45nm node requirements," in IEEE International Electron Devices Meeting, 2005, pp. 4{6. [50] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, \Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," Proceedings of the IEEE, vol. 91, no. 2, pp. 305{327, 2003. [51] W. Schemmert and G. Zimmer, \Threshold-voltage sensitivity of ion-implanted MOS transistors due to process variations," Electronics Letters, vol. 10, no. 9, pp. 151{152, 1974. [52] G. Sery, S. Borkar, and V. De, \Life is CMOS: why chase the life after?," in Proceedings of the 39th Annual Design Automation Conference, 2002, pp. 78{83. [53] W. Shockley, \Problems related to pn junctions in silicon," Solid-State Electronics, vol. 2, no. 1, pp. 35{60, 1961. [54] H. Soeleman and K. Roy, \Ultra-low power digital subthreshold logic circuits," in Pro- ceedings of the 1999 International Symposium on Low Power Electronics and Design, 1999, pp. 94{96. [55] P. A. Stolk, F. P. Widdershoven, and D. B. M. Klaassen, \Modeling statistical dopant uctuations in MOS transistors," IEEE Transactions on Electron Devices, vol. 45, no. 9, pp. 1960{1971, 1998. [56] T. Sugii, \High-performance bulk CMOS technology for 65/45 nm nodes," Solid-State Electronics, vol. 50, no. 1, pp. 2{9, 2006. [57] R. M. Swanson and J. D. Meindl, \Ion-implanted complementary MOS transistors in low-voltage circuits," IEEE Journal of Solid-State Circuits, vol. 7, no. 2, pp. 146{153, 1972. [58] A. T. Tran and B. M. Baas, \Design of an energy-e cient 32-bit adder operating at subthreshold voltages in 45-nm CMOS," in Proc. 3rd International Conference on Communications and Electronics, 2010, pp. 87{91. 50 [59] M. Venkatasubramanian and V. D. Agrawal, \Subthreshold Voltage High-k CMOS Devices Have Lowest Energy and High Process Tolerance," in Proceedings of the 43rd Southeastern Symposium on System Theory, 2011. [60] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-threshold design for ultra low-power systems. Springer Verlag, 2006. [61] H. C. Wen, H. R. Harris, C. D. Young, H. Luan, H. N. Alshareef, K. Choi, D. L. Kwong, P. Majhi, G. Bersuker, and B. H. Lee, \On Oxygen De ciency and Fast Transient Charge-Trapping E ects in High-k Dielectrics," IEEE Electron Device Letters, vol. 27, no. 12, pp. 984{987, 2006. [62] A. K. Wong, R. A. Ferguson, and S. M. Mans eld, \The mask error factor in optical lithography," IEEE Transactions on Semiconductor Manufacturing, vol. 13, no. 2, pp. 235{242, 2000. [63] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, \A variation-tolerant sub-200 mV 6-T subthreshold SRAM," IEEE Journal of Solid-State Circuits, vol. 43, no. 10, pp. 2338{2348, 2008. [64] W. Zhao and Y. Cao, \Predictive technology model for nano-CMOS design explo- ration," ACM Journal on Emerging Technologies in Computing Systems (JETC), vol. 3, no. 1, 2007. [65] R. Zimmermann and W. Fichtner, \Low-power logic styles: CMOS versus pass- transistor logic," IEEE Journal of Solid-State Circuits, vol. 32, no. 7, pp. 1079{1090, 1997. 51