## Dual-Threshold Voltage Design of Sub-Threshold Circuits

by

Jia Yao

A dissertation submitted to the Graduate Faculty of
Auburn University
in partial fulfillment of the
requirements for the Degree of
Doctor of Philosophy

Auburn, Alabama August 2, 2014

Keywords: Sub-threshold circuits, Dual-threshold voltage, Minimum energy operation

Copyright 2014 by Jia Yao

#### Approved by

Vishwani D. Agrawal, Chair, James J. Danaher Professor of Electrical and Computer Engineering Victor P. Nelson, Professor of Electrical and Computer Engineering Bogdan M. Wilamowski, Professor of Electrical and Computer Engineering

#### Abstract

Threshold voltage of MOSFET technology represents the value of the gate-source voltage when the current in a MOS transistor starts to increase significantly since the conduction layer just begins to appear. However, a MOSFET transistor can also function correctly with a supply voltage below its threshold voltage  $(V_{th})$ , which is referred to as sub-threshold operation or weak-inversion of a transistor. The circuits that works under a supply voltage in the sub-threshold range are named sub-threshold circuits.

Due to the increasing emergence of energy-constrained electronic devices, it is more important to suppress energy consumption to achieve a longer battery life. Therefore there are demands for design methods for less energy consumption. Sub-threshold circuit provides a potential solution since it can reduce energy per cycle significantly by the scaling of supply voltage  $(V_{dd})$  below threshold voltage  $(V_{th})$ . In addition, sub-threshold circuits are expected to receive increasing attention in the coming years since the minimum energy CMOS operation typically occurs when supply voltage scales down to the sub-threshold range.

The dual-threshold voltage method benefits from the characteristics of low and high threshold voltages. Higher threshold voltage results in less leakage current, therefore less leakage power consumption with the sacrifice of delay. On the contrary, lower threshold voltage brings more leakage power consumption with faster speed. Dual-threshold voltage design is a common method for reducing leakage power consumption for above-threshold circuits. In this thesis, dual-threshold voltage is proven effective for reducing energy consumption per cycle (EPC) of sub-threshold circuits. It is demonstrated in this research that the energy per cycle is independent of threshold voltage in single- $V_{th}$  designs.

A dual-threshold voltage framework written in PERL language is developed to generate the optimal dual-threshold voltage design with minimum energy consumption. The proposed framework is built on a gate-slack based dual-threshold voltage algorithm to precisely find out the optimal high threshold voltage and supply voltage  $(V_{ddopt})$ , along with accurate estimation of energy consumption for the generated dual- $V_{th}$  designs. Meanwhile, the framework conducts static timing analysis (STA) to ensure that the dual-threshold voltage design is able to run at the fastest possible operating frequency at  $V_{ddopt}$ . Experimental results on 32-bit ripple carry adder (RCA), 4-by-4 multiplier and ISCAS85 benchmark circuits show that minimum EPC is lowered by 10% to 29% by dual- $V_{th}$  design over its single- $V_{th}$  version.

The impact of process variations is also discussed. Applying random process variations on threshold voltage as Gaussian variables can bring variations on both energy consumption and performance for sub-threshold circuits.

#### Acknowledgments

First of all, I would like to give my deepest thanks to my advisor, Dr. Vishwani D. Agrawal. I am sincerely grateful for his patience and guidance during my study. He always encourages the students to explore for more possibilities not only for research but also for personal life. Without his encouragements, I would not complete my study. Also, I would like to thank my committee members, Dr. Victor Nelson, Dr. Bogdan Wilamowski and Dr. Xiao Qin, for their great help and efforts. Special thanks go to Dr. Prathima Agrawal and Auburn University Wireless Engineering Research and Education Center for the generous support and help. I would like to thank my colleagues, Suraj, Mridula, Farhana, Kim, Karthik, Vijay, Praveen, Wei and Yu for their valuable help on my research work.

Also, I would like to thank my parents and my husband for their love and support. And thanks for my friends I met in Auburn - Julia Evans, Elizabeth Wills, Elizabeth Williams, Jena Robison, Martha Dees, Man Zhang, Yueqin Lin, Binying Tan and Fei Tong. Their friendship and encouragements give me strength and comforts.

Thank myself for not giving up and I am glad that I did hold on to the end. Life is too short to waste. I will keep on moving.

## Table of Contents

| A  | ostrac          | t                                                         | ii  |  |
|----|-----------------|-----------------------------------------------------------|-----|--|
| A  | Acknowledgments |                                                           |     |  |
| Li | st of ]         | Figures                                                   | vii |  |
| Li | st of           | Tables                                                    | X   |  |
| 1  | Int             | troduction                                                | 1   |  |
|    | 1.1             | Motivation                                                | 2   |  |
|    | 1.2             | Problem Statement                                         | 2   |  |
|    | 1.3             | Contribution of The Dissertation                          | 3   |  |
|    | 1.4             | Organization of The Dissertation                          | 3   |  |
| 2  | Ва              | ackground                                                 | 5   |  |
|    | 2.1             | The History of Sub-threshold Circuits                     | 5   |  |
|    | 2.2             | Logic Operation in Sub-threshold Region                   | 15  |  |
|    |                 | 2.2.1 Sub-threshold Current                               | 15  |  |
|    |                 | 2.2.2 Inverter                                            | 17  |  |
|    | 2.3             | Dual-Threshold Voltage Techniques                         | 18  |  |
|    |                 | 2.3.1 The History of The Dual-Threshold Voltage Technique | 19  |  |
|    |                 | 2.3.2 Slack-Based Dual-Threshold Voltage Technique        | 20  |  |
|    | 2.4             | The History of Minimum Energy Operation                   | 22  |  |
|    | 2.5             | Alpha-Power Law MOSFET Model                              | 23  |  |
| 3  | Sin             | ngle- $V_{th}$ Design of Sub-threshold Circuits           | 25  |  |
|    | 3.1             | Theoretical Analysis                                      | 25  |  |
|    | 3.2             | Simulation                                                | 27  |  |
| 4  | Dι              | ual- $V_{th}$ Design of Sub-threshold Circuits            | 30  |  |

|    | 4.1    | The Pr   | roposed Dual- $V_{th}$ Design Framework                      | 30 |
|----|--------|----------|--------------------------------------------------------------|----|
|    |        | 4.1.1    | The Method of Obtaining Low $V_{th}$ and High $V_{th}$ gates | 31 |
|    |        | 4.1.2    | Library Characterization                                     | 33 |
|    |        | 4.1.3    | Gate Slack Based Dual- $V_{th}$ Algorithm                    | 37 |
|    | 4.2    | Experi   | mental Results                                               | 42 |
|    |        | 4.2.1    | 32-Bit Ripple Carry Adder                                    | 43 |
|    |        | 4.2.2    | 4-by-4 Multiplier                                            | 46 |
|    |        | 4.2.3    | ISCAS85 Benchmark Circuits                                   | 47 |
| 5  | An     | alysis o | f Results                                                    | 50 |
|    | 5.1    | Overvi   | ew                                                           | 50 |
|    | 5.2    | Theore   | etical Analysis                                              | 51 |
|    |        | 5.2.1    | Single- $V_{th}$ Design                                      | 54 |
|    |        | 5.2.2    | Dual- $V_{th}$ Design                                        | 55 |
| 6  | Ul     | tra-Low  | Voltage Circuit Under Process Variations                     | 59 |
|    | 6.1    | Previo   | us Work                                                      | 59 |
|    | 6.2    | Impact   | of Variation                                                 | 62 |
|    |        | 6.2.1    | Gate delay                                                   | 62 |
|    |        | 6.2.2    | Circuit Delay                                                | 63 |
|    |        | 6.2.3    | Energy                                                       | 63 |
| 7  | Со     | nclusion | a and Future Work                                            | 67 |
|    | 7.1    | Summa    | ary                                                          | 67 |
|    | 7.2    | Future   | Work                                                         | 68 |
|    |        | 7.2.1    | Challenge with Scaled Technology                             | 68 |
|    |        | 7.2.2    | Variation-Aware Design                                       | 69 |
| Bi | bliogr | aphy .   |                                                              | 70 |

# List of Figures

| 2.1 | Measurment of $I_d(V_{GS})$ of a P-channel MOS transistor (cleaned up plot from             |    |
|-----|---------------------------------------------------------------------------------------------|----|
|     | E.Vittoz notebook, CEH, 1967)                                                               | 7  |
| 2.2 | DIBL effects on drain current of NMOS transistor in PTM 32nm Bulk CMOS                      |    |
|     | technology with $W_n = 5L$                                                                  | 17 |
| 2.3 | Voltage Transfer Curve of Inverter at $V_{dd}=0.2\mathrm{V}$ with varying transistor sizing |    |
|     | ratio ( $\beta = W_p/W_n$ ) in PTM 32nm Bulk CMOS technology                                | 18 |
| 2.4 | One bit full adder                                                                          | 21 |
| 2.5 | 8-bit RCA energy calculated by HSPICE in PTM 32nm Bulk CMOS technology                      |    |
|     | with $W_n = 5L$ and $W_p = 12L$                                                             | 23 |
| 3.1 | 32-bit RCA energy calculated by HSPICE using PTM 32nm Bulk CMOS tech-                       |    |
|     | nology with $W_n = 5L$ and $W_p = 12L$                                                      | 28 |
| 4.1 | Two-input low $V_{th}$ NAND gate                                                            | 32 |
| 4.2 | Two-input high $V_{th}$ NAND gate implemented with reverse bias voltage = 0.1V.             | 32 |
| 4.3 | One-bit full adder                                                                          | 39 |
| 4.4 | HSPICE simulation for energy per cycle (EPC) of 32-bit RCA single $V_{th}$ designs          |    |
|     | in PTM 32nm Bulk CMOS with $W_n = 5L$ and $W_p = 12L$                                       | 43 |
| 4.5 | Gate slacks of single low $V_{th}$ design of a 288-gate 32-bit RCA at $V_{dd} = 0.25V$ .    | 44 |

| 4.6  | Gate slacks of dual $V_{th}$ design of 288-gate 32-bit RCA with bias voltage = 0.3V at $V_{dd} = 0.25V$                                                                       | 45 |
|------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.7  | Energy savings vs. percentage of high $V_{th}$ gates from HSPICE simulation results of a 124-gate 32-bit RCA dual $V_{th}$ designs under varying $V_{dd}$ values              | 46 |
| 4.8  | Random-vector HSPICE simulation results vs. estimation results for energy per cycle (EPC) for 124-gate 32-bit RCA dual- $V_{th}$ design with reverse body bias voltage = 0.3V | 47 |
| 4.9  | Gate slacks of single low $V_{th}$ design of 4-by-4 multiplier at $V_{dd}=0.21V.$                                                                                             | 48 |
| 4.10 | Gate slacks of dual $V_{th}$ design of 4-by-4 multiplier with bias voltage = 0.2V at $V_{dd} = 0.21V$                                                                         | 49 |
| 5.1  | HSPICE simulation results of dynamic energy vs. leakage energy for 32-bit RCA single low $V_{th}$ and dual- $V_{th}$ design                                                   | 51 |
| 5.2  | Root mean squared error analysis of polynomial fit for leakage energy with low $V_{th}$ ; a third-degree polynomial is selected                                               | 53 |
| 5.3  | Root mean squared error (RMSE) analysis of polynomial fit for leakage energy with high $V_{th}$ ; a third-degree polynomial is selected                                       | 54 |
| 5.4  | Regression coefficent (R-squared) analysis of polynomial fit for leakage energy with low $V_{th}$ ; a third-degree polynomial is selected                                     | 55 |
| 5.5  | Regression coefficent (R-squared) analysis of polynomial fit for leakage energy with high $V_{th}$ ; a third-degree polynomial is selected                                    | 56 |
| 5.6  | HSPICE simulation results vs. theoretical analysis of energy ratio of 32-bit RCA dual- $V_{th}$ designs with bias voltage = 0.3V and single- $V_{th}$ design                  | 57 |

| 6.1 | Monte Carlo HSPICE simulations of NAND02 gate delay with three Inverters as                    |    |
|-----|------------------------------------------------------------------------------------------------|----|
|     | load at Vdd = $0.25\mathrm{V}$ and source-bulk bias voltage = $0.3\mathrm{V}$ in PTM 32nm Bulk |    |
|     | CMOS with $W_n = 5L$ and $W_p = 12L$                                                           | 62 |
| 6.2 | Monte Carlo HSPICE simulations of circuit delay of 32-bit RCA single low $V_{th}$              |    |
|     | design under random $V_{th}$ Gaussian variations at $V_{dd}=0.25\mathrm{V}$ in PTM 32nm Bulk   |    |
|     | CMOS with $W_n = 5L$ and $W_p = 12L$                                                           | 64 |
| 6.3 | Monte Carlo HSPICE simulations of circuit delay of 32-bit RCA dual $V_{th}$ design             |    |
|     | with bias voltage = 0.3V under random $V_{th}$ Gaussian variations at $V_{dd} = 0.25$ V        |    |
|     | in PTM 32nm Bulk CMOS with $W_n = 5L$ and $W_p = 12L$                                          | 64 |
| 6.4 | Monte Carlo HSPICE simulations of EPC of 32-bit RCA single low $V_{th}$ design                 |    |
|     | under random $V_{th}$ Gaussian variations at $V_{dd}=0.25~\mathrm{V}$ in PTM 32nm Bulk CMOS    |    |
|     | with $W_n = 5L$ and $W_p = 12L$                                                                | 65 |
| 6.5 | Monte Carlo HSPICE simulations of EPC of 32-bit RCA dual $V_{th}$ design with                  |    |
|     | bias voltage = 0.3 V under random $V_{th}$ Gaussian variations at $V_{dd}$ = 0.25 V in         |    |
|     | PTM 32nm Bulk CMOS with $W_n = 5L$ and $W_p = 12L$                                             | 65 |

## List of Tables

| 3.1  | PTM 32nm Bulk CMOS $V_{th}$ Calculated in HSPICE                                                                                                                                                                         | 27 |
|------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| 4.1  | Threshold voltages of NMOS and PMOS in PTM 32nm bulk technology calculated by HSPICE with $W_n=5L$ and $W_p=12L$                                                                                                         | 31 |
| 4.2  | $V_{th}$ calculated by HSPICE in PTM 32nm bulk CMOS technology with varying source-bulk voltages with $W_n = 5L$ and $W_p = 12L$                                                                                         | 33 |
| 4.3  | Low $V_{th}$ NAND02 gate delay calculated by HSPICE in PTM 32nm bulk CMOS technology with single fan-out to an inverter load under varying input vectors at $V_{dd} = 0.2V$ with $W_n = 5L$ and $W_p = 12L$              | 34 |
| 4.4  | Low $V_{th}$ NAND02 gate delay calculated by HSPICE in PTM 32nm bulk CMOS technology with fan-out varying from one inverter to ten inverters at $V_{dd}=0.2V$ and $V_{dd}=0.4V$ with $W_n=5L$ and $W_p=12L$              | 35 |
| 4.5  | NAND02 gate delay calculated by HSPICE in PTM 32nm bulk CMOS technology with one inverter as load at $V_{dd}=0.2V$ with $W_n=5L$ and $W_p=12L$ under varying reverse source-bulk bias voltages                           | 35 |
| 4.6  | Low $V_{th}$ NAND02 gate leakage power calculated by HSPICE in PTM 32nm bulk CMOS technology with fan-out as one inverter under varying input vectors at $V_{dd} = 0.2V$ with $W_n = 5L$ and $W_p = 12L$                 | 36 |
| 4.7  | NAND02 gate leakage power calculated by HSPICE in PTM 32nm bulk CMOS technology with varying reverse source-bulk bias voltages with fan-out as one inverter at $V_{dd}=0.2V$ with $W_n=5L$ and $W_p=12L$                 | 36 |
| 4.8  | Low $V_{th}$ NAND02 gate nodal capacitance calculated by HSPICE in PTM 32nm bulk CMOS technology with fan-out condition from one inverter to 10 inverters at $V_{dd}=0.2V$ and $V_{dd}=0.4V$ with $W_n=5L$ and $W_p=12L$ | 37 |
| 4.9  | NAND02 gate nodal capacitance calculated by HSPICE in PTM 32nm bulk CMOS technology with varying $V_{th}$ with fan-out as one inverter at $V_{dd}=0.2V$ with $W_n=5L$ and $W_p=12L$                                      | 38 |
| 4.10 | Parameters used in the proposed dual- $V_{th}$ algorithm                                                                                                                                                                 | 38 |

| 4.11 | 4-by-4 Multiplier single- $V_{th}$ design vs. dual- $V_{th}$ minimum EPC saving            | 47 |
|------|--------------------------------------------------------------------------------------------|----|
| 4.12 | ISCAS85 benchmark circuit HPICE simulation results on energy saving                        | 48 |
| 4.13 | ISCAS85 benchmark circuit HPICE simulation results on $V_{ddopt}$ and optimal bias voltage | 49 |
| 6.1  | PTM 32nm $V_{th}$ variation characteristics                                                | 61 |
| 6.2  | Low $V_{th}$ gate vs. high $V_{th}$ gate delay under Gaussian $V_{th}$ variations          | 63 |

#### Chapter 1

#### Introduction

Sub-threshold circuit is also named as ultra-low voltage circuit. It refers to the circuit operating at a supply voltage  $(V_{dd})$  below transistor's threshold voltage  $(V_{th})$ . Conventionally, a transistor is considered as ON when  $V_{dd}$  is greater than  $V_{th}$  and it is considered as OFF when  $V_{dd}$  is less than  $V_{th}$ . As a result, sub-threshold operation of transistors has been neglected for a long time. However, since 1960s, much work has been done in this field and researchers found out that a transistor in sub-threshold region is not completely OFF since there are "leaking" currents flowing through it. Moreover, based on these "leaking" currents, transitions of logic levels of a transistor can be successfully completed, in another word, transistors can function correctly. But it brings the cost of large transition delays.

The most obvious advantage of sub-threshold operation is its low energy consumption. The energy consumption per cycle (EPC) can be tremendously reduced as  $V_{dd}$  scales down to sub-threshold region, by an order of magnitude compared to conventional operation. Therefore, sub-threshold circuits have been very attractive to certain applications which require ultra-low energy consumption, like wristwatches, implantable medical sensors, wireless sensor networks and so on. Moreover, due to the increasing emergence of battery-based portable electronic devices, long battery life becomes a big concern for the circuit designer. Sub-threshold circuit appears as a potential candidate due to its low energy consumption.

However, sub-threshold circuits are not the ultimate solution for all low power or low energy applications due to the large circuit delay. For example, they are not suitable for applications like processors, CPUs and others which demand high circuit performance.

#### 1.1 Motivation

Due to the increasing emergence of energy-constrained electronic devices, it is important to suppress energy consumption to achieve a longer battery life. Therefore, it demands for design methods for less energy consumption. Among the previously proposed work, minimum energy operation designs draw researchers' attention [15, 19, 28, 95, 125, 198, 199, 216]. As the  $V_{dd}$  scales from the above-threshold region to sub-threshold region, the energy per cycle curve starts to drop significantly, reaching a minimum at optimal supply voltage  $(V_{ddopt})$ . When  $V_{dd}$  decreases further, energy per cycle curve begins to increase. The minimum EPC point of a circuit typically locates in the sub-threshold region [194]. This is one of the reasons why sub-threshold circuits have been gaining increasing attentions recently.

For above-threshold circuits, circuit performance is still the key concern therefore dual- $V_{th}$  algorithms aim to reduce the power consumption while maintaining the high performance of the circuit. However, when it comes to circuits with sub-threshold supply voltage, power consumption is no longer the correct criteria to evaluate whether it is a good design or not. Because the circuit delay of a sub-threshold circuit is very large, even if the power consumption is small, the EPC can still be very high since energy is the product of power and circuit delay. Therefore, for sub-threshold circuits, it is more important to identify the minimum EPC point and develop algorithms to further reduce the minimum EPC. The dual- $V_{th}$  approach has been well applied to reduce power consumption for above-threshold circuits. But its effectiveness on reducing energy per cycle of sub-threshold circuits has not been investigated. In this work, we perform thorough study towards this issue.

#### 1.2 Problem Statement

The aim of this dissertation is:

1. Investigate the effectiveness of the dual- $V_{th}$  approach in reducing energy per cycle for CMOS sub-threshold circuits.

2. Develop a framework to generate the optimal dual- $V_{th}$  design for sub-threshold circuits. The proposed dual- $V_{th}$  algorithm is able to estimate the optimal high  $V_{th}$ , automatically generate the optimal dual- $V_{th}$  assignments with maximum energy saving and accurately estimate the optimal supply voltage.

#### 1.3 Contribution of The Dissertation

The single- $V_{th}$  design of sub-threshold circuits was first reviewed. Theoretical and simulation-based proofs are presented to demonstrate that the energy per cycle of single- $V_{th}$  sub-threshold circuits is independent of threshold voltage  $V_{th}$ . In fact,  $V_{th}$  increase does not influence dynamic energy, which is only related to switching capacitance of the circuit and supply voltage. Leakage energy remains unchanged since the increment of  $V_{th}$  not only reduces the leakage power consumption but also raises the circuit delay.

However, in a dual- $V_{th}$  design, the energy per cycle depends on both threshold voltage and supply voltage. A framework written in PERL language is proposed to generate the optimal dual- $V_{th}$  design for sub-threshold circuits to achieve maximum energy saving. Given the circuit gate-level netlist and low  $V_{th}$  level, the framework analyzes the single low  $V_{th}$  design and find out its minimum EPC point ( $E_{min,singleVth}$ ). Then the framework conducts static timing analysis and energy estimation analysis to accurately find out the optimal the optimal high  $V_{th}$  level, optimal supply voltage, automatically generate the optimal dual- $V_{th}$  assignment and estimates the energy consumption per cycle. The minimum EPC is effectively lowered by dual- $V_{th}$  over its single- $V_{th}$  version.

#### 1.4 Organization of The Dissertation

This dissertation is organized as follows. Chapter 2 provides detailed background knowledge on sub-threshold circuits, the dual- $V_{th}$  approach, and minimum energy operation design.

Chapter 3 discusses single- $V_{th}$  sub-threshold circuits. It is discovered that the energy per cycle of single- $V_{th}$  designs is independent of threshold voltage. Theoretical and simulation proof are presented in this chapter to demonstrate this statement.

In Chapter 4, thorough interpretations of the proposed dual- $V_{th}$  framework are given. The gate slack based dual- $V_{th}$  algorithm is presented as well.

In Chapter 5, experimental results on 32-bit ripple carry adder, 4-by-4 multiplier, and ISCAS85 benchmark circuits are listed, followed by a section of theoretical analysis.

Chapter 6 discusses the impact of process variations on sub-threshold circuits, along with a short introduction of variation-aware sub-threshold circuits.

In the end, summary and future work are included in Chapter 7.

#### Chapter 2

#### Background

This chapter provides background knowledge on sub-threshold circuits, including the history of sub-threshold circuit research and logic gate operation in the sub-threshold region. More background knowledge about common low power techniques, especially dual- $V_{th}$  technique, are presented in this chapter. Previous research work on minimum energy operation of sub-threshold circuits are summarized in the end.

#### 2.1 The History of Sub-threshold Circuits

The only dominant current component for a transistor working under a supply voltage in the sub-threshold range is called sub-threshold current. As we know, in a transistor, there are two modes by which current can flow: diffusion and drift. Diffusion is the natural flow of carriers from higher to lower concentration while drift is the the flow of carriers under an external voltage potential. Sub-threshold current is the result of diffusion. The fact behind the scene can be summarized as follows. As we apply a small positive gate-source voltage to an n-type transistor, holes are repelled away from the surface while electrons are the only mobile charge available at the surface. The density of electrons depends on the difference in the voltage across the two pn junction diodes (Bulk-Source and Bulk-Drain). In other words, the densities of electrons are different at Drain and Source. As a result, a diffusion current between Drain and Source is formed.

Because of this weak "leakage" current, sub-threshold operation is often referred to as weak inversion operation. The study of this weak current first emerged with the discovery of parabolic region in 1955 [58], the authors showed that the density of the electrons is proportional to  $exp(\frac{q \cdot \psi_s}{K \cdot T})$ ,  $\psi_s$  being the surface potential of a MOS transistor. Later, research

on sub-threshold current and sub-threshold operation appeared in 1960s and early 1970s [12, 61, 91, 92, 108, 129, 174, 177]. In [12], an analytical expression between sub-threshold drain current and gate voltage was illustrated. In [61], the authors demonstrated that the main contributions to drain leakage current in sub-threshold region are reverse-bias drain junction leakage current and surface channel current in weak inversion mode. In [91], the author presented measurement results of sub-threshold circuit delay on test chips of 2-stage and 6-stage CMOS inverters. In [92], mathematical analysis of MOS transistor behavior in weak inversion were presented along with analysis of surface mobility of electron and inversion layer charge. In [108], a CMOS binary counter had been realized to suit low power low voltage applications. In [174], the authors specified the exponential relation between sub-threshold current and transistor gate voltage for the first time. Due to revelation of this sub-threshold current, researchers and designers knows that the transistor is not completely OFF when gate voltage is below threshold voltage. This also introduces a brand-new understanding of threshold voltage since it was known as the voltage to turn the transistor from complete OFF to ON. In addition, authors in [174] also pointed out that the supply voltage must be at least 3-4 times of KT/q for a digital circuit to function properly. This paper also was the first discussion on minimum operating voltage of digital CMOS circuits. Later in [177], analytical model of sub-threshold current for insulated-gate field-effect transistors (IGFET) was developed for both long channel and short channel devices.

The first measurement of drain current characteristics of MOS transistor in both weak and strong inversion regions was conducted by Eric Vittoz in 1967 [188, 194] as shown in Figure 2.1.

Since the middle 1970s, it has been well recognized that sub-threshold region exists in addition to linear and saturation regions. At first, the behavior of MOS transistors in weak inversion only draws attentions of analog circuit designers. The first discussion on small-signal model for weak inversion operation was published in 1976 in [11]. The author proposed the model and pointed out the possibility of applying it to amplifiers. In



Figure 2.1: Measurment of  $I_d(V_{GS})$  of a P-channel MOS transistor (cleaned up plot from E.Vittoz notebook, CEH, 1967).

the same year, Eric Vittoz, a pioneer of sub-threshold circuit research, published his first paper on possible application of analog circuits utilizing weak-inversion operation [190]. He mentioned several possible analog circuit applications, including voltage reference [180, 193], amplitude detector, low power quartz oscillator [185, 186] and bandpass amplifier. Quoted from Vittoz [194], he received serious comments from the audience that circuits which operate on sub-threshold current of transistors cannot be reliable. Soon afterwards, a thorough characterization and modelling of MOS transistor behavior in weak inversion region was presented by Vittoz and Fellrath [191]. The well known sub-threshold drain current expression was introduced in this paper as shown in Equation (2.1):

$$I_D = I_{Do} \exp\left(\frac{V_G}{n V_t}\right) \left[\exp\left(\frac{-V_S}{V_t}\right) - \exp\left(\frac{-V_D}{V_t}\right)\right]$$
 (2.1)

where  $I_{Do}$  is a characteristic current and n is sub-threshold slope factor. More importantly, the authors built sample circuits of current references, amplitude detector and low-frequency bandpass amplifier. The experimental results obtained from these sample circuits demonstrated the validation of weak inversion operation for CMOS technology.

During the following decade, the application of weak inversion was still limited to analog circuits, mainly focusing on oscillators and amplifiers [49, 50, 51, 93, 94, 180, 185, 186, 189, 192]. The most successful production was an electronic wrist watch which first appeared on the market in 1975 [179]. The following important milestone in the history of weak inversion was the invention of the EKV model in 1995 [54]. The authors established continuous expressions for MOS transistor parameters like drain current, small signal and large signal transconductance, intrinsic capacitance, trans-admittance and thermal noise for all regions of operation, from weak inversion to saturation. Around 1990s, researchers shifted their focus onto digital sub-threshold circuit design due to increasing need for portable and low power circuit designs. The authors in [38] proposed an architectural-based voltage scaling strategy for parallel and pipelined architectures. The concept of optimal supply voltage was introduced and the authors demonstrated that reduction of supply voltage can come from proper structural parallelism. In [187], a discussion of existing low power techniques was presented and it was mentioned that design methodologies, tool, libraries and models have to be adapted for low power low voltage designs.

Due to the increasing interest in ultra-low power medical devices that do not need high performance but require extremely low power consumption, sub-threshold digital circuit came to researchers' attention since it offers potential solutions for these specific applications [8, 59, 119, 135, 170]. A research group at Purdue University did some of the earliest work on sub-threshold digital CMOS circuits. In 1999, the authors examined both CMOS and pseudo-NMOS logic operating in the sub-threshold region in power consumption and

delay [164, 165]. For a characteristic inverter, sub-threshold pseudo-NMOS logic was shown to consume less power and have smaller delay than sub-threshold CMOS logic. In [204], the authors proposed a design technique which provides different threshold voltages within a logic gate and demonstrated its effectiveness on leakage reduction. The authors extended their work to investigate multiple  $V_{dd}$  and multiple  $V_{th}$  design techniques to achieve high speed and low power consumption in [146]. In 2000, common low power techniques like multiple  $V_{dd}$ , multiple  $V_{th}$  and some standby leakage control techniques were investigated for their impacts on leakage current reduction [206]. In 2000, robust sub-threshold DTMOS logic was introduced in [167] and its stability to temperature and process variations was discussed. In [166], the authors presented simulation results on sub-domino logic and demonstrated that sub-domino logic has lower power consumption and smaller area and faster than sub-CMOS logic. In 2001, this group built a test chip of 8-by-8 carry save array multiplier and analyzed it with sub-threshold supply voltage using TSMC 0.35  $\mu$ m process technology [132]. A 96% power-delay-product reduction was reported on the test chip operating at 0.47 V (the threshold voltages for PMOS and NMOS are 0.82 V and 0.67 V, respectively) [132]. Same year, this team presented successful implementation of sub-threshold circuits in the area of hearing aid instruments [82]. In this paper, an ultra-low-power delayed least mean square (DLMS) adaptive filter was designed and simulated. Sub-threshold pseudo-NMOS logic was chosen over CMOS logic since it provides lower power-delay-product. The authors also suggested robust sub-threshold circuit design scheme using adaptive body biasing method to resist process and temperature variations. Experiments on their 8-by-8 carry save array multiplier test chip showed stable operation with sub-threshold supply voltage. In 2004, this group also conducted research on double-gated MOSFET sub-threhoold circuits in [85]. The results showed that devices with longer length instead of minimum gate length should be used for robust sub-threshold operation. Later, in [142], the authors explored the feasibility of sub-threshold SRAM design by investigating its stability. The results showed that sub-threshold SRAM design provides significant power reduction in both operating and standby modes. This team also presented research work on device optimization for sub-threshold circuits in terms of finding the optimum doping profile to achieve minimum power consumption [130, 131, 143]. In [130], the authors pointed out that MOSFET libraries optimized for above-threshold region need extra adjustments to suit sub-threshold circuit designs. Device optimization method was proposed to migrate the impact on circuit power-delay product [131, 143]. In 2006, the authors demonstrated that minimum oxide thickness may not produce minimum energy for sub-threshold circuits and oxide thickness should be adjusted to resist process variations [133].

In the following few years, a research group in University in Michigan presented work on energy-constrained designs. Since sensor network typically executes low speed tasks and only relies on small energy supplies, sub-threshold circuit design provides a potential solution. In [125], energy optimization of sub-threshold voltage sensor processors was presented and the authors built a sensor network in 130nm CMOS technology which can operate at 235 mV with low energy consumption of 1.38pJ per instruction. This group also presented work on statistical analysis of sub-threshold leakage current [141], sub-threshold circuit variations [65, 175, 213], clock network design optimization for subthreshold circuit [15, 153], etc. In 2007, the group presented research on subthreshold circuit leakage energy reduction targeted at stand-by mode. The proposed method can reduce energy by 99.2% compared to circuits with no stand-by mode optimization [155]. This work was further modified to a power gating switch (PGS) approach to minimize energy consumption in stand-by mode [156]. Known stand-by power reduction techniques were investigated for ultra-low power processors [104]. In the same year, this team also discussed an energy optimization method which uses gate sizing and supply voltage scaling [64]. In [63], the authors investigated the relation between device scaling and sub-threshold circuit operations and found out that the difference between on current and off current of a MOSFET drops rapidly as gate oxide thickness scales down. This team designed and built a test chip for sub-threshold sensor processor in  $0.13\mu m$  CMOS process with optimum supply voltage at 0.36 mV, consuming 2.6 pJ/Instat 833 KHz [66, 216]. Their work also extended to designing subthreshold SRAM [212] fully functional from 193 mV to 1.2 V, which was the first reported 6T SRAM which can operate in sub-threshold region. Later, the robust version of 6T sub-threshold SRAM were presented in [215]. In 2012, this group designed a subthresold Fast Fourier Transform (FFT) core in 65nm technology. The reported optimal  $V_{dd}$  is 270 mV and energy consumption is 15.8 nJ with a clock frequency of 30 MHz [74]. Besides sub-threshold SRAM design, this team also presented sub-threshold ROM design in [154]. More recently, their research work included super cutoff CMOS (SCCMOS) bias generator circuit design with ultra low standby power [105], ultra low power sensor nodes [102, 107, 109], wireless sensor nodes [106] and energy efficient interconnects [152]. Moreover, in [14], the authors pointed out some possible sub-threshold application areas such as medical implantable devices for monitoring disease, surveillance detection and environmental monitoring. In [101], the authors proposed a maximum power point tracking (MPPT) circuit for ripple voltage sensor systems. In [10], the authors presented a power management unit (PMU) design for sub-threshold wireless sensor nodes. In [73], the authors proposed a VGA full-frame extraction processor design which operates at 400 mV. In [72], the authors presented a ultra-low power digital multiplying DDL (MDLL) design. In [84], the authors built an ultra low power wake-up receiveer for wireless sensor nodes. The proposed design consumes only 695 pW in stand-by mode. In [83], the team attempted to build a sub-threshold 10T SRAM which only consumes 1.85 fW per bit operating at 350 mV. In [103], a multi-stage temperature-compensated timer was presented for sub-threshold wireless sensor nodes. An improved timer design for ultra low power sensors were presented in [110]. In [157], the authors implemented an example circuit of a voltage reference design which operates between 0.5 V and 3.0 V in 130nm and 180nm technologies and studied the impact of process variations.

Around the same time, Dr. Anantha Chandrakasan, another pioneer of sub-threshold circuit design, published his first work in this area [33]. This work pointed out that the

need of ultra low power circuit for portable devices such as portable real-time digital signal processor (DSP) [33], portable multimedia product [32, 160] and low power programmable computation devices [34]. In 1992, the authors pointed out that optimum voltage for low power purpose should be determined by the circuit architecture, logic style and technology optimization. This optimum value should be achieved by a balance between area overhead and power consumption [38].

Later, Chandrakasan's research group at MIT continued work on ultra low power low voltage circuit designs. In 1996, the authors first proposed a minimum power consumption design method for DSP [30]. In [35], this team discussed some key design issues for ultra low power low voltage circuits, including low-threshold devices, multiple-threshold devices, and bulk CMOS based variable threshold devices. Since 1997, more work were done on related topic on low power DSP in this group [7, 8, 9, 31, 62, 118, 158, 159]. In 2001, system partitioning method was proposed to improve energy efficiency of DSPs [196].

This team also explored low power low voltage wireless applications such as wireless camera [36, 37] and wireless sensor networks [163]. In [46, 48], pulse width modulated controller (PWM) for low power low voltage DC-DC conversion circuit design were presented. Their work on low voltage DC-DC conversion circuit was extended in [47, 60]. In [161], the authors proposed an ultra low power video encoder for battery-operated portable applications. In [209, 210], the authors pointed out that supply voltage scaling with associated threshold voltage scaling is an important method for energy efficient design. However, the sub-threshold leakage current increases as the threshold voltage scales down. In order to deal with this issue, the authors proposed a low voltage design using Silicon-On-Insulator-with-Active-Substrate (SOIAS) technology to control energy saving. In 2000, this team explored several dual-threshold voltage methods for stand-by power reduction for combinational logic [75, 76, 78]. The discussions focused multi-threshold CMOS (MTCMOS) sleep

transistor sizing. And the authors pointed out it is achievable to design a dual-threshold circuit which has the same performance as in single low threshold circuit and has low stand-by power dissipation as in single high threshold circuit.

This group started to build and analyze test chips for sub-threshold circuits since 2000s. The research was initiated due to the increasing interest on wireless microsensor networks which normally operate from scavenged energy from the environment or self-powered systems and therefore have constraints on energy consumption. In 2002, this group built a 175 mV multiply-accumulate test chip with tunable supply voltage and body bias values to investigate its optimum supply voltage and threshold voltage operating points [79]. In [77], the authors presented sub-threshold current modeling and pointed out possible improvement for current CAD tools to fit sub-threshold circuit design. Sub-threshold leakage modeling for 32-bit microprocessors in 0.18  $\mu mm$  CMOS technology in [123]. In 2003, this group started to focus on low energy low voltage Fast Fourier Transformer (FFT) designs [197]. In 2004, a sub-threshold fast Fourier transform (FFT) processor which can operate as low as 0.18V was designed and fabricated with  $0.18 \mu m$  CMOS process technology [199]. In their following publication [198], the authors demonstrated that the minimum energy operation point typically locates in the sub-threshold region. The fabricated 16-bit FFT processor has its minimum energy point at 0.35V, with energy consumption of 155nJ at a clock frequency of 10KHz. In the same year, the authors proposed a sub-threshold leakage power prediction tool for sub-threshold circuits in [124]. In 2005, the authors described some key issues associated with modeling subthreshold circuits in [27]. In 2005, research on sub-threshold SRAM was also initiated. The group explored how static noise margin for sub-threshold SRAM was associated with supply voltage, transistor sizing and process variations [20, 24]. A sub-threshold 100 Mbps ultra-wideband (UWB) baseband processor was proposed in [100, 120, 121, 176, 208. A summary of sub-threshold circuit design challenges was listed in [29]. More recently, a 256KB SRAM was successfully designed and fabricated in 65nm CMOS process technology [23, 26, 162, 181, 182]. Challenges and directions for low-voltage SRAM design was presented in [136]. In 2008, this team predicted some promising applications for sub-threshold circuit design such as toxic gas sensors and portable video gadgets [96, 97]. A 65nm sub-threshold micro-controller was presented in 2009 [99]. Minimum energy operation of DC-DC converter design was presented with 65nm technology in [139, 140]. Sub-threshold analog-to-digital converter (ADC) was reported which can operate between 0.2V and 0.9V in [44, 45]. This team also presented an ultra-low-voltage 32-bit microprocessor which can operate at 0.4 V [69]. Newest work on signal processing was proposed in [98] for biomedical purpose. A framework was developed to estimate circuit delay under process variations for sub-threshold circuits [144]. Their research also focused on technique implementation and optimization on minimum energy operation for sub-threshold circuits, including device sizing [21, 28, 95] and dynamic voltage scaling [18, 22, 25, 195].

A research group in Arizona State University presented some applications of sub-threshold circuits [39, 40, 41]. It was suggested [39] that circuits with high fan-in or fan-out are more likely prone to logic failure in sub-threshold region due to process variations. The authors derived a statistical model to evaluate the robustness of a sub-threshold circuit. In [40], the authors designed and fabricated a sub-threshold memory with 130nm process technology. The proposed memory can operate at 190mV for read operation and 216mV for write operation. In [41], the authors designed a radiation-harden by design (RHBD) 3218-bit register file operating at 320mV with energy dissipation of 10.3fJ per bit. The proposed design was proven to have good immunity to single event upset (SEU).

Our research group at Auburn University has presented work on dual- $V_{dd}$  sub-threshold circuit design [86, 87, 88, 89, 90]. In [89], the authors developed mixed integer linear programs (MILP) to optimally assign dual-below threshold supply voltages to sub-threshold circuits. The optimum dual  $V_{dd}$  assignment was designed to eliminate the requirement of level converters. In [87], a gate slack based algorithm was proposed for dual  $V_{dd}$  sub-threshold circuit design to achieve maximum energy per cycle saving. Compared to previous MILP method, the complexity of this new algorithm dropped significantly. Level converter was also

eliminated in this work. Except for using proper design method to avoid level converters between different supply voltages, the authors utilized multiple logic-level gates in dual- $V_{dd}$  sub-threshold circuit design in [88]. In [90], the authors pointed out that level converters are too slow to be used in sub-threshold circuit design. The proposed MILP method to generate optimum dual- $V_{dd}$  sub-threshold circuit design can result in 23% and 5% energy saving fro 16-bit ripple-carry adder and 4-by-4 multiplier, respectively. Experimental results on ISCAS85 benchmark circuits showed energy saving up to 22.2%, compared to single  $V_{dd}$  design. The proposed gate-slack based dual- $V_{dd}$  assignment algorithms were later adopted for above-threshold circuit design [4, 5, 6]. Simulation results on ISCAS85 benchmark circuits in 90nm bulk CMOS technology showed up to 60% energy savings. The idea of the dual- $V_{dd}$  MILP method was adopted from previous work in our group on glitch minimization [2, 3, 67, 137, 138] and dual- $V_{th}$  assignment for above-threshold circuits under process variations [112, 113, 115, 116, 117].

#### 2.2 Logic Operation in Sub-threshold Region

Around 1970s, with the discovery of weak inversion region of MOS transistors, researchers have confirmed that CMOS transistors can function correctly under both normal and sub-threshold supply voltages.

#### 2.2.1 Sub-threshold Current

The sub-threshold current  $I_{sub}$  is the main source of current in the sub-threshold region. The charge and discharge of load capacitance rely on sub-threshold current. It can be summarized as follows [194]:

$$I_{sub} = I_{o}exp(\frac{V_{gs} - V_{th}}{nV_{t}})[1 - exp(\frac{-V_{ds}}{V_{t}})]$$
 (2.2)

where

$$I_o = \mu C_{ox} \frac{W}{L} (n-1) V_t^2$$
 (2.3)

where  $\mu$  is effective mobility, which is  $0.067m^2/Vsec$  for n-channel device and  $0.025m^2/Vsec$  for p-channel device,  $C_{ox}$  is oxide capacitance, W is transistor effective width, L is transistor effective length,  $V_t$  is the thermal voltage, which is 25.8mV for room temperature,  $V_{gs}$  is the gate-source voltage,  $V_{ds}$  is the drain-source voltage,  $V_{th}$  is threshold voltage and n is sub-threshold slope, which is a technology determined parameter. n is related to the ratio of oxide capacitance over depletion capacitance by  $n = 1 + C_{ox}/C_{depletion}$ . It is in the range of 1 and 1.5 for modern deep sub-micron technology.

For short channel transistors, Drain Induced Barrier Lowering (DIBL) has significant effect on the threshold voltage. DIBL refers to a reduction of threshold voltage and an increase of drain current under higher source-drain voltage. Therefore, the drain current is controlled not only by the gate voltage, but also by the drain voltage. To be noted is that DIBL effect exists in the sub-threshold region. Therefore, if DIBL is taken into account for modeling  $I_{sub}$ , Equation (2.2) can be modified as,

$$I_{sub} = I_o exp(\frac{V_{gs} - V_{th} + \eta V_{ds}}{nV_t})[1 - exp(\frac{-V_{ds}}{V_t})]$$
 (2.4)

and  $\eta$  is called DIBL coefficient,  $\eta = \Delta V_{th}/\Delta V_{ds}$ . From the HSPICE measurements of  $\eta$  for PTM 32nm NMOS transistor with  $W_n = 5L$  under sub-threshold supply voltage, typical value of  $\eta$  is around 0.05.

Figure 2.2 illustrates the DIBL effect on MOS transistors. For an NMOS transistor with  $W_n = 5L$  in PTM 32nm Bulk CMOS technology, the drain current versus  $V_{gs}$  under varying  $V_{ds}$  values are plotted. As  $V_{ds}$  increases, the current curve shifts to the left. This horizontal shift caused by DIBL is translated to a decrease of threshold voltage as  $V_{ds}$  increases.



Figure 2.2: DIBL effects on drain current of NMOS transistor in PTM 32nm Bulk CMOS technology with  $W_n = 5L$ .

#### 2.2.2 Inverter

Figure 2.3 shows the Voltage Transfer Curve (VTC) of an inverter under a sub-threshold supply voltage of 0.2V with different transistor sizing ratio ( $\beta = W_p/W_n$ ).

The transistor under test has characteristic sizing with  $W_p = W_n = L$ . This transistor was simulated with another inverter with the same sizing as fan-out load. As seen, both logic 0 level and logic 1 level are reachable.

Although CMOS logic can function properly when supply voltage scales down to the sub-threshold region, the lower limit for voltage scaling still exists. It occurs when  $V_{dd}$  drops to 3 or 4 times the thermal voltage  $V_T$  [150, 174]. As seen in this dissertation, the minimum



Figure 2.3: Voltage Transfer Curve of Inverter at  $V_{dd} = 0.2$ V with varying transistor sizing ratio ( $\beta = W_p/W_n$ ) in PTM 32nm Bulk CMOS technology.

supply voltage for a 32-bit Ripple Carry Adder (RCA) in PTM 32nm Bulk CMOS technology with  $W_p/W_n = 12L/5L$  is 0.12V. Logic operations break down below this voltage.

#### 2.3 Dual-Threshold Voltage Techniques

Threshold voltage of MOSFET technology represents the value of the gate-source voltage when the current in the MOS transistor starts to increase significantly since the conduction layer just begins to appear. The threshold voltage changes with application of different source-bulk bias voltages. The threshold voltage when bias voltage is present can be summarized in the following equation [70],

$$V_{th} = V_{th0} + \gamma(\sqrt{|-2\phi_F| + |V_{SB}|} - \sqrt{|2\phi_F|})$$

where  $V_{th0}$  is the threshold voltage with zero source-bulk bias voltage (V), which means that the bulk terminal is connected to ground in NMOS transistors and the bulk terminal is connected to supply voltage in PMOS transistors,  $V_{SB}$  is the source-bulk bias voltage (V),  $2\phi_F$  is the surface potential parameter (V),  $\gamma$  is the body effect parameter ( $\sqrt{V}$ ). Applying a positive  $V_{SB}$  to an NMOS transistor or a negative  $V_{SB}$  to a PMOS transistor can increase threshold voltages, which is called reverse body biasing. Applying a negative  $V_{SB}$  to a NMOS transistor or a positive  $V_{SB}$  to a PMOS transistor can decrease threshold voltages, which is called forward body biasing.

Higher threshold voltage results in less leakage current, therefore less leakage power consumption, at the cost of larger delay. On the contrary, lower threshold voltage brings more leakage current, and therefore leakage power consumption with faster speed. By utilizing dual or multiple threshold voltages in a circuit, designers can suppress leakage current while meeting certain performance requirements as well.

#### 2.3.1 The History of The Dual-Threshold Voltage Technique

With the reduction of supply voltage in order to lower power consumption in VLSI circuits, threshold voltage scaling is needed to maintain circuit performance. However, low threshold voltage brings higher leakage power consumption. In order to solve this problem, multi-threshold CMOS was recommended to control leakage power [122] in 1995. Since then it had been acceptable to use low threshold voltage devices for critical paths and high threshold devices for non-critical paths [42, 81, 173, 201, 202, 203, 205]. The basic idea is to assign as many gates as possible to a high threshold voltage to reduce leakage power. In [202, 203, 205], the authors pointed out that not all gates on the off-critical paths can be switched to high threshold voltage and only some gates which have sufficient slack can be switched to high threshold due to certain performance constraints. It was the first introduction of a slack-based theorem of dual-threshold voltage method. They developed an algorithm to find an optimum set of gates off the critical paths which can be switched to

high threshold voltage. In addition, the proposed algorithm can select the optimum high threshold voltage with respect to performance constraints. In [81], the authors presented a transistor-level dual- $V_{th}$  assignment algorithm for standby power minimization under area and timing constraints.

Unlike the heuristic algorithm mentioned above, which can only guarantee a locally optimal solution, researchers attempted to use linear programming (LP) to ensure a global optimization of dual  $V_{th}$  assignments [57, 112, 114, 126]. The linear programming method generally tries to find optimum solutions for target optimization formula(s) under several constraint formulas. In the linear programming method in [112], the target optimization is to minimize total power consumption with constraints on gate delays, gate slacks, cell sizes and circuit speed.

Later, more work was done on investigating leakage power reduction using dual-threshold or multi-threshold voltage methods in a combination with other low power design techniques. The authors in [81, 128, 207] explored the combination of dual-threshold voltage assignment along with transistor sizing optimization. Authors in [168] experimented on utilizing simultaneous gate sizing, dual- $V_{dd}$  and dual- $V_{th}$  approach for power minimization.

#### 2.3.2 Slack-Based Dual-Threshold Voltage Technique

Let's take a one-bit full adder as an example to illustrate the idea of the slack-base theorem. Figure 2.4 illustrates the structure of one-bit full adder which consists of nine NAND02 gates.

The process starts with assigning low  $V_{th}$  to all of the gates in the circuit. Let's assume all of the nine gates have one unit time  $(t_o)$  of gate delay. So the critical path delay of this circuit is equal to 6  $t_o$ . There are four critical paths - the first one goes through gates 1, 2, 4, 5, 6 and 8; the second one goes through gates 1, 3, 4, 5, 7 and 8; the third one goes through gates 1, 2, 4, 5, 6 and 8 and the last one goes through gates 1, 2, 4, 5, 7 and 8.



Figure 2.4: One bit full adder.

Therefore, gates 1 through 8 are marked as critical-path gates while gate 9 is marked as an off-critical-path gate.

The slack of a gate is defined as the difference between the critical path delay of the circuit and the longest path delay through this gate. Take gate 9 for example, there are two longest paths through it - the path goes through gates 1, 2, 4, 5 and 9 and the path goes through gates 1, 3, 4, 5 and 9. The two paths have the same path delay of 5  $t_o$ . Therefore, gate 9 has a slack of  $t_o$ . As for the rest of the gates, they have zero slack since they are all critical-path gates.

As a result, gate 9 will be potentially considered to be shifted from low  $V_{th}$  to high  $V_{th}$  to save leakage power consumption. More importantly, gate 9 can only switch to high  $V_{th}$  if its slack remains positive after the change. Positive slack means the circuit critical path delay remains unchanged. For example, if the delay of gate 9 increases from  $t_o$  to 4  $t_o$ , new critical paths will be created which go through gates 1, 2, 4, 5, 9 and gates 1, 3, 4, 5, 9, respectively. The critical path delay would then be 9  $t_o$ . If the critical path delay of the circuit has to remain as 6  $t_o$  due to performance constraints, gate 9 cannot be changed to high  $V_{th}$ .

#### 2.4 The History of Minimum Energy Operation

When a circuit operates at its minimum energy operation point  $(E_{min})$ , the circuit consumes less energy than any other point. Less energy consumption means longer battery life which is essential for typical sub-threshold circuit applications such as implantable medical devices. Sub-threshold circuits have been used for applications which do not require high circuit performance but require ultra-low energy dissipation. More attention or interests have been drawn by sub-threshold circuits since a sub-threshold circuit is more energy-efficient. The reason is that the minimum energy point typically occurs when the supply voltage scales down to sub-threshold range [199, 200].

As  $V_{dd}$  scales down, the dynamic energy drops quickly due to its quadratic relation with  $V_{dd}$ . At the same time, the leakage energy increases due to the increase of circuit delay. In the above-threshold region, circuit delay increases according to  $\alpha$ -power law, while in sub-threshold region, circuit delay increases exponentially since the sub-threshold current is exponentially related to  $V_{dd}$ . The minimum energy occurs when the dynamic energy is equal to the leakage energy. Figure 2.5 shows the energy plot for an 8-bit RCA from SPICE simulations.

As shown in Figure 2.5, the dynamic energy represented by the red curve drops quadratically as  $V_{dd}$  decreases from above-threshold range to sub-threshold range. While leakage energy represented by the blue curve increases exponentially which is caused by significant increase of circuit delay. These two curves intersect when  $V_{dd}$  is around 0.2V, therefore the optimal supply voltage  $V_{ddopt}$  for this 8-bit RCA is 0.2V. Leakage energy dominates when  $V_{dd}$  is below 0.2V, while dynamic energy plays a more important role when  $V_{dd}$  is greater than 0.2V. As a result, the total energy represented by the black curve reaches its minimum when dynamic and leakage energy are equal.

In [19], authors derived a theoretical equation for the optimal supply voltage  $V_{ddopt}$  to solve for minimum energy operation point in sub-threshold circuits. Similar equations can be found in [194] as well.



Figure 2.5: 8-bit RCA energy calculated by HSPICE in PTM 32nm Bulk CMOS technology with  $W_n = 5L$  and  $W_p = 12L$ .

### 2.5 Alpha-Power Law MOSFET Model

Alpha-power law is one of the most well-known MOSFET models and developed by Dr. Sakurai in 1980s. It describes the behaviour of MOSFET drain current in saturation region as shown in the following equation [148, 149].

$$I_D = \frac{W}{L} \cdot P_c \cdot (V_{GS} - V_{TH})^{\alpha} \tag{2.5}$$

After it was introduced, some researchers presented work on its physical background [16, 17, 68]. And it was verified that the original alpha-power law is only suitable for MOSFET under high electric field. Therefore, it requires modifications when it comes to sub-threshold

region. In 2004, Dr. Sakurai published a short paper and completed the alpha-power law model which includes MOSFET drain current modelling of sub-threshold current [147].

$$I_{on} = I_0 \cdot (S\alpha)^{-\alpha} \cdot (V_{GS} - V_{TH})^{-\alpha}$$
(2.6)

where  $\alpha$  is a technology-determined physical parameter. For modern sub-micro CMOS technology,  $\alpha$  is now almost fixed at about 1.3 [147].

$$I_{sub} = I_0 \cdot e^{-\alpha} \cdot e^{\frac{V_{GS} - V_{TH}}{S}} \tag{2.7}$$

where  $I_0$  is determined only by physical parameters of the transistor and S is sub-threshold slope. As seen in Equation (2.7),  $\exp(-\alpha)$  only appears as a scaling factor for sub-threshold current. The drain current is actually exponentially related to gate voltage, which is true as seen in Equation (2.4). The original alpha-power relation between drain current and gate voltage in Equation 2.5 is not suitable for MOSFET operating in sub-threshold region.

#### Chapter 3

## Single- $V_{th}$ Design of Sub-threshold Circuits

In this chapter, both theoretical analysis and experimental results are presented to demonstrate that energy per cycle (EPC) of single- $V_{th}$  sub-threshold circuits is independent of threshold voltage. In other words, it does not change EPC by changing the threshold voltage in single- $V_{th}$  sub-threshold circuits [211].

#### 3.1 Theoretical Analysis

As mentioned in the previous chapter, the complete expression of sub-threshold current can be summarized in the following equation,

$$I_{sub} = I_o \ exp(\frac{V_{gs} - V_{th} + \eta V_{ds}}{nV_t}) \ [1 - exp(\frac{-V_{ds}}{V_t})]$$
 (3.1)

Transistors in sub-threshold region are considered as "leaking" all the time. The charging and discharging of capacitance both rely on the sub-threshold current. However, there is still a difference between on-current and off-current. Similar to above-threshold region, on-current refers to the "dynamic" current when the logic gate is switching while off-current refers to the "static" current leaking from voltage supply to ground all the time. In fact, based on sub-threshold current in Equation (3.1), on-current can be expressed as the current when assigning  $V_{gs}$  to  $V_{dd}$ , as shown in Equation (3.2), while off-current is the current when assigning  $V_{gs}$  to 0, as shown in Equation (3.3).

$$I_{on} = I_{o}exp(\frac{V_{dd} - V_{th} + \eta V_{dd}}{nV_{t}})[1 - exp(\frac{-V_{dd}}{V_{t}})]$$
(3.2)

$$I_{off} = I_o exp(\frac{-V_{th} + \eta V_{dd}}{nV_t})[1 - exp(\frac{-V_{dd}}{V_t})]$$
(3.3)

The delay of a logic gate is defined as its output capacitance  $(C_{out})$  times  $V_{dd}$  divided by on-current  $(I_{on})$ . Output capacitance is typically dominated by the gate capacitance of its fan-out gate(s). Therefore, it can be expressed as,

$$D = \frac{C_{out} V_{dd}}{I_{on}} = \frac{C_{out} V_{dd}}{I_{o} \exp(\frac{V_{dd} - V_{th} + \eta V_{dd}}{\eta V_{t}}) \left[1 - \exp(\frac{-V_{dd}}{V_{t}})\right]}$$
(3.4)

Therefore, for a given circuit, the critical path delay of a circuit can be defined as,

$$T_c = l \cdot \frac{C_g V_{dd}}{I_{on}} = \frac{C_g V_{dd}}{I_o \exp(\frac{V_{dd} - V_{th} + \eta V_{dd}}{\eta V_i})[1 - \exp(\frac{-V_{dd}}{V_i})]}$$
(3.5)

where  $C_g$  is the gate capacitance of a characteristic inverter and l is the length of the critical path in terms of a characteristic inverter.

Since the leakage energy per cycle is the product of leakage power consumption and circuit critical path delay, we can summarize its expression based on the above equations. As expressed by Equations (3.6), (3.7) and (3.8), the  $V_{th}$  term is canceled out in the leakage energy expression, which means that  $V_{th}$  has no effect on leakage energy.

$$E_{leak} = I_{off} \cdot V_{dd} \cdot T_c \tag{3.6}$$

$$E_{leak} = I_o \exp(\frac{-V_{th} + \eta V_{dd}}{nV_t}) \left[1 - exp(\frac{-V_{dd}}{V_t})\right] \cdot V_{dd} \cdot \frac{l C_g V_{dd}}{I_o \exp(\frac{V_{dd} - V_{th} + \eta V_{dd}}{nV_t})\left[1 - exp(\frac{-V_{dd}}{V_t})\right]}$$
(3.7)

$$E_{leak} = \frac{l C_g V_{dd}^2}{exp(\frac{V_{dd}}{nV_t})}$$
(3.8)

Table 3.1: PTM 32nm Bulk CMOS  $V_{th}$  Calculated in HSPICE.

|          | NMOS    | PMOS     |
|----------|---------|----------|
| HS model | 0.328 V | -0.291 V |
| LP model | 0.549 V | -0.486 V |

On the other hand, dynamic energy is only dependent on the effective switching capacitance of the circuit as well as the supply voltage, as shown in Equation (3.9). The effective switching capacitance is calculated as the product of gate output activity and output capacitance. Gate output activity can be estimated through logic simulations in Modelsim. The total number of  $0 \to 1$  transitions at the gate output divided by the number of applied random vectors is gate output activity.

From Equations (3.8) and (3.9), we can see that the total total energy per cycle (EPC) for single- $V_{th}$  design is independent of  $V_{th}$ . Its complete expression is shown in Equation (3.10).

$$E_{dyn} = C_{eff} \cdot V_{dd}^2 \tag{3.9}$$

$$E = E_{dyn} + E_{leak} = C_{eff} \cdot V_{dd}^2 + \frac{l C_g V_{dd}^2}{exp(\frac{V_{dd}}{nV_t})}$$
(3.10)

## 3.2 Simulation

HPICE simulations are conducted to verify the theory stated above. We utilize 32nm Predictive Technology Model (PTM) technology developed by Arizona University. It offers two types of NMOS and PMOS models for different purposes – high speed (HS) model and low power (LP) model. The HS model offers low  $V_{th}$  and the LP model offer high  $V_{th}$ , their threshold voltage values calculated at nominal  $V_{dd} = 0.9V$  in HSPICE are shown in Table 3.1.

Since PTM 32nm Bulk CMOS technology only offers models for NMOS and PMOS transistors, we construct CMOS logic gates with NMOS transistor width  $W_n = 5L$  and PMOS transistor width  $W_p = 12L$  for HSPICE simulations.



Figure 3.1: 32-bit RCA energy calculated by HSPICE using PTM 32nm Bulk CMOS technology with  $W_n = 5L$  and  $W_p = 12L$ .

For the  $V_{dd}$  range between 0.12V and 0.6V, first we calculate the critical path delay  $(T_c)$  of the RCA by applying critical path vectors to primary inputs. The logic function of RCA fails, when the  $V_{dd}$  is below 0.12V. Second, five hundred random vectors are applied to the RCA with time interval of  $T_c$  to calculate the total energy consumption per cycle, which consists of dynamic energy and leakage energy.

The EPC for the two single threshold voltage circuits, as functions of  $V_{dd}$ , computed in HSPICE simulations with random input vectors are shown in Figure 3.1. The red curve is for low  $V_{th}$  and the blue curve is for high  $V_{th}$ . We notice that EPC for the two designs remain practically same over the sub-threshold supply voltage range  $V_{dd} = 0.12$ V to  $V_{dd} = 0.4$ V. As

 $V_{dd}$  scales down, EPC decreases, reaching a minimum at the same  $V_{ddopt}$  just above 300mV. When  $V_{dd}$  decreases further, EPC increases as leakage energy dominates. Logic operations break down earlier, at about  $V_{dd} = 200$ mV, for high  $V_{th}$ . The low  $V_{th}$  design continues to work at lower  $V_{dd}$ .

### Chapter 4

## Dual- $V_{th}$ Design of Sub-threshold Circuits

In this chapter, we demonstrate that energy per cycle of sub-threshold circuits can be reduced by the dual- $V_{th}$  approach. We propose a gate-slack based dual- $V_{th}$  algorithm for minimum EPC operation. For a given circuit and low threshold voltage level, the proposed algorithm is able to find the optimal high threshold voltage level, optimal sub-threshold supply voltage  $(V_{ddopt})$ , generate an optimal dual- $V_{th}$  design and accurately estimate the EPC. Experimental results on a dual- $V_{th}$  32-bit ripple carry adder show a 29% EPC reduction at minimum EPC point over its single- $V_{th}$  version [211].

## 4.1 The Proposed Dual- $V_{th}$ Design Framework

The dual- $V_{th}$  method utilizes the characteristics of two different  $V_{th}$  levels. A low  $V_{th}$  gate has larger gate delay but higher leakage current, while a high  $V_{th}$  gate has smaller gate delay but less leakage current. In general, gates on critical path(s) use low  $V_{th}$  to maintain circuit speed and gates on off-critical paths use high  $V_{th}$  to suppress leakage current. Therefore, the dual- $V_{th}$  algorithm generally begins with assigning low  $V_{th}$  to all of the gates in the circuit, then properly selecting as many gates as possible to switch to high  $V_{th}$  to suppress leakage power consumption [56]. As a result, critical-path gates are normally assigned low  $V_{th}$  to keep circuit with fast speed and off-critical-path gates are normally assigned high  $V_{th}$  to reduce leakage. However, the tricky part is that it is not appropriate to assign high  $V_{th}$  to every non-critical path gate since the circuit performance may be degraded significantly. Since energy per cycle (EPC) is the product of power and circuit performance, energy reduction should be a more appropriate criteria in dual- $V_{th}$  algorithm compared to power reduction. Therefore, we propose a gate slack based dual- $V_{th}$  algorithm to maximize EPC reduction.

Table 4.1: Threshold voltages of NMOS and PMOS in PTM 32nm bulk technology calculated by HSPICE with  $W_n = 5L$  and  $W_p = 12L$ 

|          | NMOS    | PMOS     |
|----------|---------|----------|
| HP model | 0.328 V | -0.291 V |
| LP model | 0.549 V | -0.486 V |

For a given circuit and a low  $V_{th}$  level, the proposed dual- $V_{th}$  algorithm generates the optimal dual- $V_{th}$  design, accurately determines an optimal high  $V_{th}$  level, and an optimal supply voltage  $V_{ddopt}$ , as well as estimates the energy saving.

## 4.1.1 The Method of Obtaining Low $V_{th}$ and High $V_{th}$ gates

## (A) PTM 32nm Bulk CMOS Technology Models

PTM 32nm Bulk CMOS technology offers two types of models for MOSFET transistors – a High Speed (HS) model and a Low Power (LP) model. The HS model offers low  $V_{th}$ , and thus results in smaller gate delay and more leakage current. On the contrary, the LP model offers high  $V_{th}$ , and thus results in larger gate delay and less leakage current. Table 4.1 lists the threshold voltage of NMOS and PMOS transistors calculated by HSPICE using the two models at nominal supply voltage  $V_{dd} = 0.9V$ .

## (B) Low $V_{th}$ Gate and High $V_{th}$ Gate

For a MOSFET transistor, it is common to adjust its threshold voltage by applying a source-bulk bias voltage. To be more specific, reverse bias voltage is often used to increase the threshold voltage and forward bias voltage is often used to decrease the threshold voltage.

To construct a low  $V_{th}$  gate, we use the original HS model with zero source-bulk voltage. As illustrated in Figure 4.1, the low  $V_{th}$  two-input NAND gate is constructed by grounding the bulk terminals of NMOS transistors and connecting the bulk terminals of PMOS transistors to  $V_{dd}$ . On the contrary, to construct a high  $V_{th}$  gate, we use the HS model with non-zero reverse source-bulk bias voltage. As illustrated in Figure 4.2, high  $V_{th}$  two-input NAND gate should be constructed by connecting the bulk terminals of NMOS transistors to a voltage



Figure 4.1: Two-input low  $V_{th}$  NAND gate.

supply more negative than ground and connecting the bulk terminals of PMOS transistors to a voltage supply more positive than  $V_{dd}$ .



Figure 4.2: Two-input high  $V_{th}$  NAND gate implemented with reverse bias voltage = 0.1V.

The threshold voltage of a transistor increases as the reverse bias voltage increases. Multiple levels of high  $V_{th}$  are provided in the proposed framework. The threshold voltage

Table 4.2:  $V_{th}$  calculated by HSPICE in PTM 32nm bulk CMOS technology with varying source-bulk voltages with  $W_n = 5L$  and  $W_p = 12L$ .

|                       | NMOS    | PMOS      |
|-----------------------|---------|-----------|
| HS model w/ zero bias | 0.328 V | -0.291  V |
| HS model w/ bias=0.1V | 0.348 V | -0.309  V |
| HS model w/ bias=0.2V | 0.367 V | -0.327  V |
| HS model w/ bias=0.3V | 0.385 V | -0.344  V |
| HS model w/ bias=0.4V | 0.402 V | -0.360  V |
| HS model w/ bias=0.5V | 0.419 V | -0.375  V |
| HS model w/ bias=0.6V | 0.435 V | -0.389  V |
| HS model w/ bias=0.7V | 0.450 V | -0.403  V |
| HS model w/ bias=0.8V | 0.465 V | -0.417  V |

of an NMOS transistor calculated in HPSICE at nominal  $V_{dd} = 0.9V$  under varying reverse bias voltages are listed in Table 4.2.

## 4.1.2 Library Characterization

Before proceeding to the interpretations of the proposed dual- $V_{th}$  algorithm, we need to characterize libraries for delay, leakage power consumption and nodal capacitance under varying conditions for basic types of logic gates with characteristic sizing. In the proposed framework, characterizations are included for the following types of logic gate: inverter, buffer, two-input NAND (NAND02), three-input NAND (NAND03), two-input NOR (NOR02), three-input NOR (NOR03), two-input AND (AND03), three-input AND (AND03), two-input OR (OR02) and three-input OR (OR03). Library characterization is done through HSPICE simulations in PTM 32nm bulk CMOS technology with the characteristic sizing as  $W_n = 5L$  and  $W_p = 12L$ .

### (A) Gate Delay

For a single gate, different conditions of  $V_{th}$ ,  $V_{dd}$  and the number of fan-out input vectors would result in different values of gate delay. The  $V_{dd}$  range included in the framework is between 0.12V and 0.6V. Gate delay with different  $V_{th}$  values, which are obtained by applying

Table 4.3: Low  $V_{th}$  NAND02 gate delay calculated by HSPICE in PTM 32nm bulk CMOS technology with single fan-out to an inverter load under varying input vectors at  $V_{dd} = 0.2V$  with  $W_n = 5L$  and  $W_p = 12L$ .

| Input Vectors       | Delay        |
|---------------------|--------------|
| $00 \rightarrow 11$ | 1.7620E-08 S |
| $01 \rightarrow 11$ | 1.4171E-08 S |
| $10 \rightarrow 11$ | 1.5190E-08 S |
| $11 \rightarrow 00$ | 1.6059E-08 S |
| $11 \rightarrow 01$ | 1.3137E-08 S |
| $11 \rightarrow 10$ | 1.5424E-08 S |

source-bulk bias voltages to the HS model, are calculated in HSPICE simulations as well. Besides, in order to obtain different fan-out conditions, a single gate drives different numbers of inverters with characteristic sizing at its output node. As for different input vectors, we calculate gate delay under all possible combinations of input vectors and take the maximum of the calculated values. For example, for  $V_{dd}$  from 0.12V to 0.6V, we calculate NAND02 gate delay with fan-out load ranging from one inverter to 10 inverters with the following input vectors,  $(00 \to 11)$ ,  $(01 \to 11)$ ,  $(10 \to 11)$ ,  $(11 \to 00)$ ,  $(11 \to 01)$  and  $(11 \to 10)$ .

In order to illustrate the influence of input vectors on gate delay, Table 4.3 lists the calculated delay of a low  $V_{th}$  NAND02 gate with one inverter as load under varying input vectors at  $V_{dd} = 0.2V$ . As shown, vector pair  $(00 \to 11)$  generates the largest delay which will be used in the gate delay library. Table 4.4 shows the impact of supply voltage  $V_{dd}$  on gate delay. It lists the calculated delay of the NAND02 gate with fan-out condition changing from one inverter to ten inverters at  $V_{dd} = 0.2V$  and  $V_{dd} = 0.4V$ , respectively. Table 4.5 lists the calculated NAND02 gate delays with one inverter as load with varying  $V_{th}$  levels at  $V_{dd} = 0.2V$ . The varying  $V_{th}$  levels are obtained by applying a reverse source-bulk bias voltage to the PTM 32nm HS model.

#### (B) Leakage Power Consumption

For a single gate, the following parameters affect its delay  $-V_{dd}$ ,  $V_{th}$  and input vectors. Similar to the characterization of gate delay, we calculate leakage power of logic gates under

Table 4.4: Low  $V_{th}$  NAND02 gate delay calculated by HSPICE in PTM 32nm bulk CMOS technology with fan-out varying from one inverter to ten inverters at  $V_{dd} = 0.2V$  and  $V_{dd} = 0.4V$  with  $W_n = 5L$  and  $W_p = 12L$ .

|             | $V_{dd} = 0.2 V$ | $V_{dd} = 0.4 V$ |
|-------------|------------------|------------------|
| fo = 1 inv  | 1.7620E-08 S     | 3.0194E-10 S     |
| fo = 2 inv  | 2.2776E-08 S     | 3.4879E-10 S     |
| fo = 3 inv  | 2.6689E-08 S     | 4.0520E-10 S     |
| fo = 4 inv  | 3.2040E-08 S     | 4.5757E-10 S     |
| fo = 5 inv  | 3.8014E-08 S     | 5.1891E-10 S     |
| fo = 6 inv  | 4.5034E-08 S     | 5.8038E-10 S     |
| fo = 7 inv  | 5.2394E-08 S     | 6.7758E-10 S     |
| fo = 8 inv  | 6.0028E-08 S     | 7.3543E-10 S     |
| fo = 9 inv  | 6.7673E-08 S     | 7.8842E-10 S     |
| fo = 10 inv | 7.5213E-08 S     | 8.4028E-10 S     |

Table 4.5: NAND02 gate delay calculated by HSPICE in PTM 32nm bulk CMOS technology with one inverter as load at  $V_{dd} = 0.2V$  with  $W_n = 5L$  and  $W_p = 12L$  under varying reverse source-bulk bias voltages.

| Bias Voltage | Delay        |
|--------------|--------------|
| Zero Bias    | 1.7620E-08 S |
| Bias = 0.1V  | 3.1674E-08 S |
| Bias = 0.2V  | 5.9140E-08 S |
| Bias = 0.3V  | 1.0263E-07 S |
| Bias = 0.4V  | 1.8353E-07 S |
| Bias = 0.5V  | 3.1555E-07 S |
| Bias = 0.6V  | 5.1736E-07 S |
| Bias = 0.7V  | 8.9194E-07 S |
| Bias = 0.8V  | 1.4136E-06 S |

different conditions from HSPICE simulations. As for different input vectors, we calculate the leakage power with all possible input vectors and take the average of the calculated values. Take the NAND02 gate for example, for a  $V_{dd}$  range from 0.12V to 0.6V, we apply different source-bulk bias voltages to obtain different high threshold voltage levels and calculate gate delay with four different input vectors.

Table 4.6: Low  $V_{th}$  NAND02 gate leakage power calculated by HSPICE in PTM 32nm bulk CMOS technology with fan-out as one inverter under varying input vectors at  $V_{dd} = 0.2V$  with  $W_n = 5L$  and  $W_p = 12L$ .

| Input Vectors | Leakage Power |
|---------------|---------------|
| 00            | 1.5831E-11 W  |
| 01            | 4.7944E-11 W  |
| 10            | 3.6800E-11 W  |
| 11            | 2.6519E-11 W  |

Table 4.7: NAND02 gate leakage power calculated by HSPICE in PTM 32nm bulk CMOS technology with varying reverse source-bulk bias voltages with fan-out as one inverter at  $V_{dd} = 0.2V$  with  $W_n = 5L$  and  $W_p = 12L$ .

| Bias Voltage | Leakage Power |
|--------------|---------------|
| Zero Bias    | 6.3439e-011 W |
| Bias = 0.1V  | 3.2603e-011 W |
| Bias = 0.2V  | 1.7515e-011 W |
| Bias = 0.3V  | 1.0133e-011 W |
| Bias = 0.4V  | 6.6668e-012 W |
| Bias = 0.5V  | 5.2779e-012 W |
| Bias = 0.6V  | 5.0516e-012 W |
| Bias = 0.7V  | 4.8355e-012 W |
| Bias = 0.8V  | 4.7337e-012 W |

Table 4.6 lists the calculated leakage power of the NAND02 gate with fan-out as one inverter with varying input vectors at  $V_{dd} = 0.2V$ . Table 4.7 lists the calculated low  $V_{th}$  NAND02 leakage power with fan-out as one inverter under varying  $V_{th}$  levels at  $V_{dd} = 0.2V$ .

### (C) Output Node Capacitance

Nodal capacitance is needed in the proposed framework because it will be used to calculate the effective switching capacitance of a gate in the calculation of dynamic energy. It is found out that the nodal capacitance is mainly influenced by the threshold voltage of the driving gate as well as its fan-out condition. To be noted is that the variations on the threshold voltage of the driven gates or the loads are ignored in this library characterization since these variations have little effect on the nodal capacitance of the driving gate. In

Table 4.8: Low  $V_{th}$  NAND02 gate nodal capacitance calculated by HSPICE in PTM 32nm bulk CMOS technology with fan-out condition from one inverter to 10 inverters at  $V_{dd} = 0.2V$  and  $V_{dd} = 0.4V$  with  $W_n = 5L$  and  $W_p = 12L$ .

|             | $V_{dd} = 0.2 V$ | $V_{dd} = 0.4 V$ |
|-------------|------------------|------------------|
| fo = 1 inv  | 1.2656E-015 F    | 1.2644E-015 F    |
| fo = 2 inv  | 1.7040E-015 F    | 1.6944E-015 F    |
| fo = 3 inv  | 2.1424E-015 F    | 2.1244E-015 F    |
| fo = 4 inv  | 2.5808E-015 F    | 2.5543E-015 F    |
| fo = 5 inv  | 3.0192E-015 F    | 2.9843E-015 F    |
| fo = 6 inv  | 3.4576E-015 F    | 3.4143E-015 F    |
| fo = 7 inv  | 3.8961E-015 F    | 3.8442E-015 F    |
| fo = 8 inv  | 4.3345E-015 F    | 4.2742E-015 F    |
| fo = 9 inv  | 4.7729E-015 F    | 4.7041E-015 F    |
| fo = 10 inv | 5.2113E-015 F    | 5.1341E-015 F    |

addition, supply voltage  $V_{dd}$  has small but noticeable influence on gate nodal capacitance. Therefore, in summary, nodal capacitance of a gate is calculated with varying  $V_{th}$  of the driving gate and varying fan-out conditions, with  $V_{dd}$  ranging from 0.12V to 0.6V.

Table 4.8 shows how nodal capacitance of a NAND02 gate changes as its fan-out condition and  $V_{dd}$  changes. The NAND02 output node capacitance is calculated in HSPICE with varying input vectors at  $V_{dd} = 0.2V$  and  $V_{dd} = 0.4V$ , respectively. Table 4.9 shows the impact of threshold voltage of a NAND02 gate on its nodal capacitance. NAND02 output node capacitance is calculated in HSPICE at  $V_{dd} = 0.2V$  with varying reverse source-bulk bias voltages.

### 4.1.3 Gate Slack Based Dual- $V_{th}$ Algorithm

Before proceeding to the detailed explanation of the dual- $V_{th}$  algorithm, let us declare some important parameters used in this algorithm, as shown in Table 4.10.

The proposed framework first reads in a given combinational circuit gate-level netlist and analyses the structure of the circuit. It finds out the structural information for each gate i, namely, its gate type, driving gate(s) and fan-out gate(s). Next, levelize all gates.

Table 4.9: NAND02 gate nodal capacitance calculated by HSPICE in PTM 32nm bulk CMOS technology with varying  $V_{th}$  with fan-out as one inverter at  $V_{dd}=0.2V$  with  $W_n=5L$  and  $W_p=12L$ .

| Bias Voltage | Output Node Capacitance |
|--------------|-------------------------|
| Zero Bias    | 1.2656E-015 F           |
| Bias = 0.1V  | 1.2520E-015 F           |
| Bias = 0.2V  | 1.2398E-015 F           |
| Bias = 0.3V  | 1.2289E-015 F           |
| Bias = 0.4V  | 1.2190E-015 F           |
| Bias = 0.5V  | 1.2100E-015 F           |
| Bias = 0.6V  | 1.2018E-015 F           |
| Bias = 0.7V  | 1.1943E-015 F           |
| Bias = 0.8V  | 1.1874E-015 F           |

Table 4.10: Parameters used in the proposed dual- $V_{th}$  algorithm.

| Parameter Name       | Description                                                 |
|----------------------|-------------------------------------------------------------|
| N                    | Total number of gates in the circuit                        |
| TPI(i)               | the longest time for an event to arrive from PI to gate $i$ |
| TPO(i)               | the longest time for an event to reach a PO from gate $i$   |
| DL(i)                | Gate delay of gate $i$ with low $V_{th}$                    |
| DH(i)                | Gate delay of gate $i$ with high $V_{th}$                   |
| D(i)                 | Gate delay of gate $i$                                      |
| D (t)                | D(i) = DH(i)  or  DL(i)                                     |
| $DP_i$               | The path delay of the longest path through gate $i$         |
|                      | DP(i) = TPI(i) + TPO(i) + D(i)                              |
| T                    | Critical path delay of the circuit with low $V_{th}$        |
| 1                    | $T = Max \{DP(i)\} \ where \ i = 1 \sim N$                  |
| $T_{high}$           | Critical path delay of the circuit with high $V_{th}$       |
| k                    | $k = T_{high}/T$                                            |
| S(i)                 | Slack of gate $i$                                           |
| $\mathcal{D}^{-}(t)$ | S(i) = T - DP(i)                                            |
| Delta (i)            | Delay difference of gate $i$                                |
| Detta(t)             | Delta (i) = DH (i) - DL (i)                                 |
| S(i)                 | Upper bound of gate slack                                   |
| $\mathcal{D}^{-}(t)$ | $S_u = \frac{(k-1)\cdot T}{k}$                              |
| C                    | Lower bound of gate slack                                   |
| $S_l$                | $S_l = Min \{Delta (i)\} \text{ where } i = 1 \sim N$       |



Figure 4.3: One-bit full adder.

Levelizing basically means how far gate i is away from primary inputs or primary outputs. For example, if the inputs of a gate are primary inputs of the circuit, this gate is levelized as level 0 from the primary input. Let's take an one-bit full adder for example. As shown in Figure 4.3, gate 1 is marked as level 0 from primary inputs since both of its inputs are primary inputs. As for gate 2, one of its inputs is a primary input and the other input is the output of gate 1. Gate 2 is marked as level 1 since it is one level after a level 0 gate. When a gate has multiple inputs with different levels, this gate's level is determined by the input with the largest level.

After the circuit structural analysis and levelization, the following steps will be conducted for given a  $V_{dd}$  and low  $V_{th}$  level,

## Step (1) Initialize every gate with low $V_{th}$

Assigning low  $V_{th}$  to gate i means that its D(i) will be assigned with DL(i) which is read from the delay library based on its number of fan-outs. The following parameters for gate i are calculated as well: TPI(i), TPO(i), DP(i) and S(i). At the same time, circuit delay with low  $V_{th}$ , T, can be calculated.

For any given high  $V_{th}$  level,  $T_{high}$  can be calculated in a similar way, with every gate assigned with high  $V_{th}$ . As a result, the ratio of  $T_{high}$  and T, k, can be calculated, as well as  $S_u$  and  $S_l$ .

## Step (2) First round of dual- $V_{th}$ gate selection

Based on the values of  $S_u$  and  $S_l$ , we conduct analysis on gate slack S (i). The gates whose gate slack S (i)  $< S_l$  has to remain low  $V_{th}$  since the gate slack will become negative if it is switched to high  $V_{th}$ , as illustrated below in Equation (4.1). However, our framework does not allow negative slack because negative gate slack means that its longest path delay DP (i) exceeds circuit delay T. In other words, the circuit faces performance degradation if high  $V_{th}$  is assigned to gate i. On the contrary, the gates whose gate slack S (i)  $\geq S_u$  will be automatically assigned with high  $V_{th}$  since this change will still keep its slack non-negative, as illustrated below in Equations (4.2) and (4.3).

Let us define S'(i) as the updated gate slack and DP'(i) as the updated longest path delay of gate i after it is switched from low  $V_{th}$  to high  $V_{th}$ . For the gates whose gate slack  $S(i) < S_l$ ,

$$S'(i) = T - DP'(i) = T - [DP(i) + Delta(i)] = S(i) - Delta(i)$$
 (4.1)

From the definition of  $S_l$ , we know that  $S(i) < S_l \le Delta(i)$ , therefore, Equation (4.1) becomes negative.

As for the gates whose gate slack  $S(i) \geq S_u$ ,

$$S'(i) = T - DP'(i) = T - k \cdot DP(i) = T - k \cdot [T - S(i)]$$
(4.2)

From the definition of  $S_u$ , we know that  $S(i) \geq S_u = \frac{(k-1)\cdot T}{k}$ ,

$$T - k \cdot [T - S(i)] \ge T - k \cdot T + k \cdot S_u = 0 \tag{4.3}$$

Therefore, Equation (4.2) remains non-negative after the change of  $V_{th}$ . Here we used an approximation that  $k = \frac{T_{high}}{T} \approx \frac{DP'(i)}{DP(i)}$ .

## Step (3) Second round of dual- $V_{th}$ gate selection

For the rest of the gates that were not analyzed in Step (2), if Equation (4.4) is satisfied, further verification is required.

$$S(i) > Delta(i) \tag{4.4}$$

We first sort them by the value of their gate slack, then switch them one by one from low  $V_{th}$  to high  $V_{th}$ , starting from the gate with larger gate slack. The reason to begin with the gates with large slack is that these gates most likely locate on path(s) with short length. Threshold voltage changes on these gates are less likely to impact on circuit critical path delay T. On the contrary, the gates with large slack generally locate on near-critical paths. A slight gate delay increase can result in instant increase of circuit critical path delay.

The important constraint on high  $V_{th}$  selection is that T can not be exceeded. This is required to ensure that the dual- $V_{th}$  runs at its fastest possible frequency. In order to ensure that, after one gate is switched from low  $V_{th}$  to high  $V_{th}$ , we run static timing analysis (STA) by our framework to re-calculate circuit critical path delay  $T_{new}$ . If  $T_{new}$  is greater than T, this gate can not be switched to high  $V_{th}$  since this switch will cause the appearance of larger critical path delay therefore degrading the circuit performance.

#### Step (4) Estimation of energy per cycle (EPC)

At this step, a dual- $V_{th}$  design is generated in terms of a list of gates that are assigned with high  $V_{th}$ . Our framework then estimates its energy per cycle by Equation (4.5). Energy per cycle of the entire circuit is the sum of that of each gate.

$$E = \sum_{i=1}^{n} E(i) = C_{eff}(i) \cdot V_{dd}^{2} + P_{leak}(i) \cdot T = \alpha(i) \cdot C(i) \cdot V_{dd}^{2} + P_{leak}(i) \cdot T$$
(4.5)

where  $C_{eff,i} \cdot V_{dd}^2$  represents the dynamic energy of gate i.  $C_{eff,i}$  is the effective switching capacitance of the gate, which can be estimated by the product of gate nodal capacitance and its activity factor.  $P_{leak}$   $(i) \cdot T$  represents the leakage energy of gate i. As mentioned earlier, C (i) and  $P_{leak}$  (i) are obtained from HSPICE simulations on basic logic gates. As for activity factor  $\alpha$  (i) for each gate, we conduct logic simulation of an example circuit with a single low  $V_{th}$  design in Modelsim. Five hundred (500) random vectors are applied at primary inputs with a vector-period of T in Modelsim. Based on the simulated signals, we count how many  $0 \to 1$  transitions occur at each gate output during the entire simulation and then estimate  $\alpha$  (i) as the average number of transition per vector period.

## 4.2 Experimental Results

The topology of a circuit influences how much energy saving can be achieved by the dual- $V_{th}$  method.

We investigate our proposed algorithm for a 32-bit ripple carry adder (RCA) and a 4-by-4 multiplier, as well as for ISCAS85 benchmark circuits. Among these, we believe that RCA is an upper bound for energy saving since it has large number of short non-critical paths and the path length difference between the critical path and non-critical paths is considerably large. Therefore, it allows more gates to be switched to high  $V_{th}$ . On the other hand, the multiplier is close to a lower bound case since its balanced structure allows only a small portion of non-critical gates to have high  $V_{th}$ . Application to the 32-bit ripple carry adder and 4-by-4 multiplier show that minimum EPC is reduced 29% and 10.8%, respectively, at minimum EPC point. Experiments on ISCAS85 benchmark circuits demonstrate energy saving in the range of 10% and 29%.



Figure 4.4: HSPICE simulation for energy per cycle (EPC) of 32-bit RCA single  $V_{th}$  designs in PTM 32nm Bulk CMOS with  $W_n = 5L$  and  $W_p = 12L$ .

### 4.2.1 32-Bit Ripple Carry Adder

Figure 4.4 shows experimental results for 32-bit RCA in PTM 32nm bulk CMOS technology. It shows how EPC is lowered via optimized dual- $V_{th}$  design. Supply voltage ranges from 0.12V to 0.6V and we apply reverse body bias voltages to the example circuit in the range between 0.1V to 0.8V. For any given  $V_{dd}$ , EPC decreases as bias voltage increases until it reaches a lower bound. Then it starts to increase slowly, finally reaching the same value as the single- $V_{th}$  design. The lowest minimum energy occurs when the bias voltage equals 0.3V, therefore the optimal high bias voltage is 0.3V. The minimum EPC in Figure is  $1.610 \times 10^{-14} \text{J}$  at  $V_{ddopt} = 0.24 \text{V}$ . The corresponding critical path delay is  $T = 1.2134 \mu \text{s}$ , resulting in a clock frequency of 0.8241MHz.



Figure 4.5: Gate slacks of single low  $V_{th}$  design of a 288-gate 32-bit RCA at  $V_{dd} = 0.25V$ .

Comparing two single-threshold designs in Figure 4.4 we note that the minimum EPC is  $2.268 \times 10^{-14} \text{J}$  at  $V_{ddopt} = 0.31 \text{V}$ . For low  $V_{th}$  circuit, T = 250.11 ns or clock frequency = 3.998 MHz, and for high  $V_{th}$  circuit,  $T = 35.835 \mu \text{s}$  or clock frequency = 27.9 kHz. Thus, EPC for the dual- $V_{th}$  circuit is 29.1% lower than that for either of the single- $V_{th}$  circuit. The speed of the dual- $V_{th}$  circuit is between the speeds of the two single- $V_{th}$  circuits.

In order to show the impact of dual- $V_{th}$  assignments on gate slack distributions for all of the 288 gates in 32-bit RCA, Figure 4.5 and Figure 4.6 are presented. Since the optimal supply voltage of single- $V_{th}$  design are different from that of dual- $V_{th}$  design, it is not appropriate to compare the slack difference of these two designs. Therefore, a fixed  $V_{ddopt} = 0.25V$  is chosen and the gate slack data in single low  $V_{th}$  and dual- $V_{th}$  designs are compared. As seen, the number of gates with smaller slack increased after the dual- $V_{th}$  assignments.



Figure 4.6: Gate slacks of dual  $V_{th}$  design of 288-gate 32-bit RCA with bias voltage = 0.3V at  $V_{dd} = 0.25V$ .

More high  $V_{th}$  gates will lead to higher leakage power saving. But it is not appropriate to assume that more high  $V_{th}$  gates will lead to higher energy saving since the reduction of leakage energy only comes from a balance between reduction of leakage power and increase of circuit delay. For varying  $V_{dd}$ , Figure 4.7 presents a relation between x (percentage of high  $V_{th}$  gates) and the EPC of dual- $V_{th}$  designs normalized to the minimum EPC of single low  $V_{th}$  design. For a fixed  $V_{dd}$ , different high  $V_{th}$  levels result in different values of x. On the other hand, for a fixed high  $V_{th}$  level, there is a  $V_{ddopt}$  which consumes minimum EPC.

Figure 4.8 compares estimated EPC of dual- $V_{th}$  designs with HSPICE simulation. The average error between the estimation and simulation is 6.99%. The error may result from simplifications made in the framework. For example, we assume that fan-out gates are always low  $V_{th}$  gates when calculating output capacitance of the driving gate in HSPICE.



Figure 4.7: Energy savings vs. percentage of high  $V_{th}$  gates from HSPICE simulation results of a 124-gate 32-bit RCA dual  $V_{th}$  designs under varying  $V_{dd}$  values.

That is, when a gate drives high  $V_{th}$  gates, the difference in output capacitance is considered negligible.

### 4.2.2 4-by-4 Multiplier

Experimental results on a 4-by-4 multiplier showing a 10.8% reduction of minimum EPC by the dual- $V_{th}$  method. The minimum EPC drops from 7.5954E-15 J in single- $V_{th}$  design at  $V_{dd} = 0.26V$  with a frequency of 4.29 MHz to 6.77E-15 J in dual- $V_{th}$  design at  $V_{dd} = 0.21V$  with a frequency of 1.76 MHz. The optimal high  $V_{th}$  in the above optimal dual- $V_{th}$  design is obtained when bias voltage = 0.2V.



Figure 4.8: Random-vector HSPICE simulation results vs. estimation results for energy per cycle (EPC) for 124-gate 32-bit RCA dual- $V_{th}$  design with reverse body bias voltage = 0.3V.

Table 4.11: 4-by-4 Multiplier single- $V_{th}$  design vs. dual- $V_{th}$  minimum EPC saving.

| Circuit Name      | Single Vth Design<br>Emin | 9          | Energy<br>Saving |
|-------------------|---------------------------|------------|------------------|
| C432 7.5954E-15 J |                           | 6.77E-15 J | 10.8%            |

Figure 4.9 shows the gate slacks for all of the 124 gates in single low  $V_{th}$  design of 4-by-4 multiplier at  $V_{dd} = 0.21V$ . Figure 4.10 shows the gate slacks for all of the 124 gates in dual  $V_{th}$  design of 4-by-4 multiplier with bias voltage = 0.2V at  $V_{dd} = 0.21V$ .

## 4.2.3 ISCAS85 Benchmark Circuits

Experimental results on ISCAS85 benchmark circuits are listed in the two following tables. Table 4.12 lists the minimum EPC points in single- $V_{th}$  design and dual- $V_{th}$  design of



Figure 4.9: Gate slacks of single low  $V_{th}$  design of 4-by-4 multiplier at  $V_{dd}=0.21V$ .

benchmark circuits as well as energy saving. Table 4.13 lists the calculated optimal  $V_{ddopt}$  as well as optimal source-bulk voltage.

Table 4.12: ISCAS85 benchmark circuit HPICE simulation results on energy saving.

| Circuit Name   | Single Vth Design | Dual Vth Design | Energy |
|----------------|-------------------|-----------------|--------|
| Circuit Ivaine | Emin              | Emin            | Saving |
| C432           | 7.21E-15 J        | 6.32E-15 J      | 12.4%  |
| C499           | 2.12900E-014 J    | 1.8478E-014 J   | 13.2%  |
| C880           | 1.4315E-014 J     | 1.0613E-014 J   | 25.86% |
| C1355          | 1.9784E-014 J     | 1.7355E-014 J   | 12.28% |
| C1980          | 3.1425E-014 J     | 2.6862E-014 J   | 14.52% |
| C2670          | 5.09E-014 J       | 3.71E–14 J      | 27.1%  |



Figure 4.10: Gate slacks of dual  $V_{th}$  design of 4-by-4 multiplier with bias voltage = 0.2V at  $V_{dd} = 0.21V$ .

Table 4.13: ISCAS85 benchmark circuit HPICE simulation results on  $V_{ddopt}$  and optimal bias voltage.

| Circuit Name | Single Vth Design | Dual Vth Design | Optimal Bias |
|--------------|-------------------|-----------------|--------------|
|              | $V_{ddopt}$       | $V_{ddopt}$     |              |
| C432         | 0.28 V            | 0.26 V          | 0.2 V        |
| C499         | 0.27 V            | 0.26 V          | 0.2 V        |
| C880         | 0.25 V            | 0.22 V          | 0.3 V        |
| C1355        | 0.26 V            | 0.24 V          | 0.2 V        |
| C1980        | 0.27 V            | 0.25 V          | 0.3 V        |
| C2670        | 0.22 V            | 0.19 V          | 0.2 V        |

### Chapter 5

### Analysis of Results

#### 5.1 Overview

Dual- $V_{th}$  design for above-threshold circuits aims to reduce only leakage power but doesn't have requirements on circuit performance. Different from above-threshold circuits, sub-threshold circuits aims at energy per cycle reduction. In general, energy per cycle reaches its minimum in the sub-threshold region.

It is effective to lower the minimum EPC point by the dual- $V_{th}$  approach. Minimum EPC only occurs at  $V_{ddopt}$  where dynamic and leakage energy are equal. In addition, dynamic energy is only dependent on effective switching capacitance and supply voltage. Therefore, the reduction of minimum EPC only comes from the reduction of  $V_{dd}$ . Figure 5.1 illustrates how dynamic energy and leakage energy changed as  $V_{dd}$  scales down and how  $V_{ddopt}$  drops from single low  $V_{th}$  design to dual  $V_{th}$  design. In order to calculate dynamic and leakage energy, the following steps are done. First, leakage power of a 32-bit RCA is calculated in HSPICE by simulating with five hundred random vectors. Second, the circuit critical path delay is calculated in HSPICE simulations with a critical path vector. Third, five hundred random vectors are used in HSPICE simulations to calculate total energy per cycle. From the first two steps, we can get the leakage energy since it is the product of leakage power and circuit delay. Then the dynamic energy can be calculated since it is the difference between total energy per cycle and leakage energy.

As shown in Figure 5.1, the black curve represents the total EPC of a single low  $V_{th}$  design. Total EPC is the sum of dynamic energy, shown as the red curve, and leakage energy, shown as the blue curve. The green curve represents the leakage energy of the dual- $V_{th}$  design. For single  $V_{th}$  design, minimum EPC occurs in  $V_{ddopt1}$ , which is the intersect of



Figure 5.1: HSPICE simulation results of dynamic energy vs. leakage energy for 32-bit RCA single low  $V_{th}$  and dual- $V_{th}$  design.

blue and red curves. For dual  $V_{th}$  design, minimum EPC occurs in in  $V_{ddopt2}$  which is the intersect of the green and red curves. The  $V_{ddopt}$  points can only move along the dynamic energy curve.

### 5.2 Theoretical Analysis

In this section, a three-step theoretical analysis is conducted to understand the observed energy saving for a 32-bit RCA in experimental results. For  $V_{dd}$  ranging from 0.12V to 0.4V, energy and circuit delay of a 32-bit RCA are characterized into an equation to estimate  $V_{ddopt}$  and the energy saving at minimum EPC points.

First, we characterize leakage power and circuit delay of the 32-bit RCA. Five hundred random vectors are used to calculate the leakage power for single- $V_{th}$  design with low  $V_{th}$  and

optimal high  $V_{th}$ , respectively. Circuit delay T is calculated from HSPICE simulation for single low  $V_{th}$  design by applying a vector pair that activates the critical path. T determines the fastest possible operating frequency of the circuit.

Second, we characterize leakage energy as a third-order polynomial in  $V_{dd}$ . We multiply the leakage power and circuit delay from the previous step to get the leakage energy. The total energy is,

$$E = E_{dyn} + E_{leak}$$

$$= \alpha C_i V_{dd}^2 + I_{off} V_{dd} Tc$$

$$= \alpha C_i V_{dd}^2 + I_{off} V_{dd} \frac{C_{g,i} V_{dd}}{I_{on}}$$

$$(5.1)$$

The leakage energy is related to  $I_{on}$  and  $I_{off}$  that have exponential relationships to  $V_{dd}$ . According to Euler's formula, an exponential equation can be expanded as a polynomial. Therefore, leakage energy can be expressed as a polynomial in  $V_{dd}$ . We use a third-order polynomial in  $V_{dd}$  for accuracy and simplicity. To illustrate the goodness of curve-fitting, Figure 5.2 and 5.4 show fitting error for leakage energy with low  $V_{th}$  as root mean squared error (RMSE) and R-squared (the square of the correlation between original data and fitted data), respectively. When polynomial degree increases from one to three, the RMSE drops by two orders of magnitude then begins to drop much slower, while the R-squared saturates almost to the best fit value of 1. For leakage energy with optimal high  $V_{th}$ , we see similar trends on RMSE and R-squared as shown in Figure 5.3 and 5.5. The fitted expressions are,

$$E_{leak,LVth} = p1 \cdot V_{dd}^3 + p2 \cdot V_{dd}^2 + p3 \cdot V_{dd} + p4$$
 (5.2)

where  $p1 = -2.916 \times 10^{-12}, p2 = 3.463 \times 10^{-12}, p3 = -1.401 \times 10^{-12} \text{ and } p4 = 1.953 \times 10^{-13}.$ 

$$E_{leak,HVth} = h1 \cdot V_{dd}^3 + h2 \cdot V_{dd}^2 + h3 \cdot V_{dd} + h4$$
 (5.3)



Figure 5.2: Root mean squared error analysis of polynomial fit for leakage energy with low  $V_{th}$ ; a third-degree polynomial is selected.

where 
$$h1 = -3.413 \times 10^{-13}$$
,  $h2 = 4.19 \times 10^{-13}$ ,  $h3 = -1.75 \times 10^{-13}$  and  $h4 = 2.54 \times 10^{-14}$ .

Third, we characterize dynamic energy as a second-order polynomial in  $V_{dd}$ . Since the difference between total energy and leakage energy is the dynamic energy, we run HSPICE simulations for 32-bit RCA to calculate total energy using the same set of random vectors. According to Equation (5.1), dynamic energy is the product of effective switching capacitance of the circuit and  $V_{dd}^2$ . Therefore dynamic energy is characterized as a second-order polynomial in  $V_{dd}$  as,

$$E_{dyn} = a \cdot V_{dd}^2 + b \tag{5.4}$$

where  $a = 1.653 \times 10^{-13}$  and  $b = -2.103 \times 10^{-16}$ .



Figure 5.3: Root mean squared error (RMSE) analysis of polynomial fit for leakage energy with high  $V_{th}$ ; a third-degree polynomial is selected.

# 5.2.1 Single- $V_{th}$ Design

The energy of a single low  $V_{th}$  design is,

$$E_{single} = E_{dyn} + E_{leak,LVth}$$

$$= p1 \cdot V_{dd}^{3} + (p2 + a) \cdot V_{dd}^{2} + p3 \cdot V_{dd} + p4$$
(5.5)

Setting the derivative of  $E_{single}$  w.r.t.  $V_{dd}$  to 0, we write

$$\frac{\partial E_{single}}{\partial V_{dd}} = 3p1 \cdot V_{dd}^2 + 2(p2 + a) \cdot V_{dd} + p3 = 0$$
 (5.6)



Figure 5.4: Regression coefficient (R-squared) analysis of polynomial fit for leakage energy with low  $V_{th}$ ; a third-degree polynomial is selected.

The minimum EPC occurs at  $V_{ddopt}$ , which satisfies Equation (5.6). For a single low  $V_{th}$  design,  $V_{ddopt} = 0.3058$ V, which is close to 0.31V obtained from HSPICE simulation results shown in Figure 3.1.

## 5.2.2 Dual- $V_{th}$ Design

Let us define a parameter x as the fraction of high  $V_{th}$  gates in the circuit. Then, 1-x represents the fraction of low  $V_{th}$  gates in the circuit. The leakage energy for dual- $V_{th}$  design is expressed as,

$$E_{dual} = E_{dyn} + x \cdot E_{leak,HVth} + (1 - x) \cdot E_{leak,LVth}$$

$$= K1 \cdot V_{dd}^{3} + K2 \cdot V_{dd}^{2} + K3 \cdot V_{dd} + K4$$
(5.7)



Figure 5.5: Regression coefficient (R-squared) analysis of polynomial fit for leakage energy with high  $V_{th}$ ; a third-degree polynomial is selected.

where  $K1 = x \cdot h1 + (1 - x) \cdot p1$ ,  $K2 = x \cdot h2 + (1 - x) \cdot p2 + a$ ,  $K3 = x \cdot h3 + (1 - x) \cdot p3$ and  $K4 = x \cdot h4 + (1 - x) \cdot p4 + b$ . Setting the derivative of  $E_{dual}$  w.r.t.  $V_{dd}$  to 0, we get

$$\frac{\partial E_{dual}}{\partial V_{dd}} = 3K1 \cdot V_{dd}^2 + 2K2 \cdot V_{dd} + K3 = 0$$
 (5.8)

Since our framework pointed out that the minimum EPC for dual- $V_{th}$  design occurs when x is equal to 0.6875, with optimal high  $V_{th}$  obtained by applying a bias voltage of 0.3V to PTM 32nm HS model. Using this value of x, Equation (5.8) gives  $V_{ddopt} = 0.254$ V, which is close to 0.25V from HSPICE simulation results shown in Figure 4.4.

Based on the  $V_{ddopt}$  values for single- $V_{th}$  and dual- $V_{th}$  design estimated by the above equations, we found out that the minimum EPC point drops by 33.4%. This result is not too different from the 29% energy saving.



Figure 5.6: HSPICE simulation results vs. theoretical analysis of energy ratio of 32-bit RCA dual- $V_{th}$  designs with bias voltage = 0.3V and single- $V_{th}$  design.

Using different values of x from 0 to 1 in Equation (5.8), we get the corresponding  $V_{ddopt}$  values, for which we calculate the minimum EPC for dual- $V_{th}$  designs from Equation (5.7). This is normalized with respect to the minimum EPC of a single- $V_{th}$  design obtained from Equation (5.5). The normalized energy of dual- $V_{th}$  designs is shown as the blue curve in Figure 5.6.

As seen from Figure 5.6, energy saving reaches its maximum of 67% when x equals 1. However, realizable maximum energy saving is 29% when x equals 0.6875, as shown by the red star in Figure 5.6. This discrepancy is explainable by the first two steps of Section 5. Leakage energy of single high  $V_{th}$  design by Equation (5.3) is the product of leakage power of single high  $V_{th}$  design and circuit delay T obtained from single low  $V_{th}$  design. However, in practice the circuit delay increases as the  $V_{th}$  in single- $V_{th}$  design increases. Therefore, the

blue curve in Figure 5.6 only expresses a lower bound of energy saving for a 32-bit RCA. In addition, the red circle in the upper left corner represents the minimum EPC point for single low  $V_{th}$  design with PTM HS model. The red square in the upper right corner represents the minimum EPC point for single high  $V_{th}$  design with PTM LP model. Both points are normalized against the minimum EPC of a single low  $V_{th}$  design.

### Chapter 6

## Ultra-Low Voltage Circuit Under Process Variations

Process variations are unavoidable in modern semiconductor manufacturing processes. The first common source of variations comes from transistor geometric parameters, for example transistor length, width and gate oxide thickness. The random dopant fluctuations (RDF) in transistor doping profile is another well-known source of variations. In general, process variations are classified into two types: global variations and local variations. Global variations, also named as inter-die variations, refer to the systematic variations among different devices on a die. Global variations remain constant among different devices when it comes to a single die, while the same devices fabricated on different dies would have different characteristics. On the other hand, local variations known as intra-die variations refer to the random variations among different devices on a die. Local variations are often portrayed as random variables, which obey certain distributions. For example, delay variations are normally considered as Gaussian random variables in the above-threshold regime.

The impact of process variation in sub-threshold regime has become an important concern for sub-threshold circuit design [82, 95, 214]. Due to the exponential relation between threshold voltage  $V_{th}$  and sub-threshold current  $I_{sub}$ , sub-threshold circuits tend to be more sensitive to process variations [82].

#### 6.1 Previous Work

Much work has been done on studying the characteristic of process variation in the sub-threshold region. Authors in [214] showed experimental results of sub-threshold circuits with an industrial 130nm technology and demonstrated that random dopant fluctuations are the only dominant source of variations.

Researchers in [52] pointed out that RDF primarily brings uncertainties in threshold voltage. Although correlations of process variation are shown as concerns in some works [1, 127, 184], random uncorrelated  $V_{th}$  variation resulted from RDF was proved to be the dominant component for sub-threshold circuit.

Authors in [145] provided the first robust sub-threshold digital circuit in 2008. They designed and implemented an 8-bit t-tap Finite Impulse Response (FIR) in 130nm IBM technology. The FIR was reported to work at 280mV with a frequency of 9.8KHz. Later [183, 215], these authors proposed a sub-threshold 6T SRAM design with 130nm The SRAM can function between 1.2V and 193mV and has good robustness characteristics in sub-threshold region. This was the first sub-threshold robust 6T SRAM design reported.

A variation-aware device sizing method emerged for minimum energy operation of subthreshold circuits in [95]. The authors pointed out the necessity of increasing the device size in order to achieve a better yield. Authors in [169] suggested a leakage power reduction method using dual- $V_{th}$  and sizing under the existence of process variations. Aiming to provide a higher yield for sub-threshold devices, researchers in [71] demonstrated a transistor-level yield optimization technique to suppress variability in sub-threshold devices.

Another device sizing method to achieve higher yield was shown in [43]. Under the constraints on circuit delay and desired yield the proposed method chose proper gate sizes considering both inter- and intra-die variations. The proposed method can achieve 19% saving in area.

In [111], authors offered a robust sub-threshold library as well as a post-silicon threshold voltage tuning framework which utilized adjusting body bias voltage to suppress variations under certain performance constraint. The tuning framework is implemented by a fuzzy logic controller with input as circuit performance and output as the selected optimal body bias voltage.

Table 6.1: PTM 32nm  $V_{th}$  variation characteristics.

|                                                              | Low $V_{th}$   |                | High $V_{th}^*$ |                |  |
|--------------------------------------------------------------|----------------|----------------|-----------------|----------------|--|
|                                                              | NMOS           | PMOS           | NMOS            | PMOS           |  |
| W/L                                                          | 160 nm / 32 nm | 384 nm / 32 nm | 160 nm / 32 nm  | 384 nm / 32 nm |  |
| $\mu_{vth}$                                                  | 0.328 V        | -0.291 V       | 0.385 V         | -0.344 V       |  |
| $\sigma_{th}$                                                | 12 mV          | 7.5 mV         | 16.7 mV         | 10.1 mV        |  |
| *achieved by applying 0.3 V reverse source-bulk bias voltage |                |                |                 |                |  |
| on PTM HS model                                              |                |                |                 |                |  |

In most previous work mentioned above on process variations in sub-threshold circuits, random uncorrelated variations of threshold voltage resulted from RDF has been proved to be the major type of variations for sub-threshold circuits.

In the sub-threshold region,  $V_{th}$  variations are generally characterized as random variables with Gaussian distribution, as seen in [43, 95, 214, 215].

The standard deviation of  $V_{th}$  variation distribution ( $\sigma_{vth}$ ) is proportional to  $1/\sqrt{W \cdot L}$ , which is known as Pelgroms Law [134]. This rule has been used in many previous work and became well recognized [43, 53, 95, 172, 214, 215]. A detailed  $\sigma_{vth}$  expression was developed targeted for RDF effect in MOS transistors in [171], as shown in Equation 6.1.

$$\sigma_{vth} = \frac{\sqrt[4]{4q^3\varepsilon\phi}}{2} \cdot \frac{T_{ox}}{\varepsilon_{ox}} \cdot \frac{\sqrt[4]{N}}{\sqrt{W \cdot L}}$$
(6.1)

where  $\phi = 2kTln(N/n_i)$ , k is Boltzmann's constant, T is room temperature 300 K,  $n_i$  is intrinsic carrier concentration, N is channel dopant concentration,  $T_{ox}$  is oxide thickness,  $\varepsilon$  is silicon permittivity,  $\varepsilon_{ox}$  is silicon dioxide permittivity, W is transistor width and L is transistor length.

The following Table 6.1 lists the mean value and standard deviation of  $V_{th}$  distribution used in this chapter for NMOS and PMOS transistors.



Figure 6.1: Monte Carlo HSPICE simulations of NAND02 gate delay with three Inverters as load at Vdd = 0.25V and source-bulk bias voltage = 0.3V in PTM 32nm Bulk CMOS with  $W_n = 5L$  and  $W_p = 12L$ .

# 6.2 Impact of Variation

# 6.2.1 Gate delay

First, five hundred Monte Carlo HSPICE simulations are conducted with random Gaussian variations on  $V_{th}$  to learn how gate delay of an NAND02 gate is influenced. According to the definition of log-normal distribution, if variable X has a normal (Gaussian) distribution, then Y = exp(X) has a log-normal distribution. For a MOSFET transistor, since its sub-threshold current is exponentially related to  $V_{th}$ , its delay is exponentially related to  $V_{th}$  as well. Therefore, as seen in Figure 6.1, gate delay obeys log-normal distribution when Gaussian  $V_{th}$  variations are applied with characteristics shown in above Table 6.1. Table 6.2 shows distribution characteristics of a NAND02 gate delay under variations on  $V_{th}$ .

Table 6.2: Low  $V_{th}$  gate vs. high  $V_{th}$  gate delay under Gaussian  $V_{th}$  variations.

|                    | Mean        | STD                  | Mean/STD |
|--------------------|-------------|----------------------|----------|
|                    |             | (Standard Deviation) |          |
| Low $V_{th}$ Gate  | 9.3963e-009 | 2.0518e-009          | 4.5795   |
| High $V_{th}$ Gate | 5.3228e-008 | 1.1895e-008          | 4.4749   |

# 6.2.2 Circuit Delay

Next, the critical path delay of a 32-bit RCA under variations is investigated by five hundred Monte Carlo HSPICE simulations with random Gaussian variations on  $V_{th}$ . The circuit critical path delay is the sum of critical-path gate delays which have log-normal distributions. Variations on each gate are random and uncorrelated. The sum of independent log-normal random variables is proved to be a normal variable as in [13, 55, 151]. Therefore, the circuit delay should obey normal distribution.

Figure 6.2 shows the HSPICE simulation results of the delay of single low  $V_{th}$  design of a 32-bit RCA under variations. Similarly, Figure 6.3 shows the HSPICE simulation results of the delay of dual  $V_{th}$  design of a 32-bit RCA under variations. The dual  $V_{th}$  design investigated in Figure 6.3 is the optimal dual- $V_{th}$  generated by the proposed framework. This optimal dual- $V_{th}$  corresponds to the minimum EPC point at  $V_{dd} = 0.25V$  with optimal bias voltage = 0.3V. In both figures, the red curve represents the simulation data and the blue curve represents the fitted normal distribution.

#### **6.2.3** Energy

Energy per cycle under process variations is the sum of many random Gaussian variables, therefore it should obey normal distribution as well. In order to compare the EPC under process variations of single low  $V_{th}$  design and dual- $V_{th}$  design, both designs are simulated at the same frequency. To ensure correct functionality of both designs, the operating frequency



Figure 6.2: Monte Carlo HSPICE simulations of circuit delay of 32-bit RCA single low  $V_{th}$  design under random  $V_{th}$  Gaussian variations at  $V_{dd} = 0.25$ V in PTM 32nm Bulk CMOS with  $W_n = 5L$  and  $W_p = 12L$ .



Figure 6.3: Monte Carlo HSPICE simulations of circuit delay of 32-bit RCA dual  $V_{th}$  design with bias voltage = 0.3V under random  $V_{th}$  Gaussian variations at  $V_{dd}$  = 0.25V in PTM 32nm Bulk CMOS with  $W_n = 5L$  and  $W_p = 12L$ .

is chosen according to the maximum of  $\mu + 3\sigma$  points from the delay of both designs in Figure 6.2 and Figure 6.3. The chosen frequency is  $1/(1.1 \times 10^{-6} \text{ second}) = 0.91 \text{MHz}$ .



Figure 6.4: Monte Carlo HSPICE simulations of EPC of 32-bit RCA single low  $V_{th}$  design under random  $V_{th}$  Gaussian variations at  $V_{dd} = 0.25$  V in PTM 32nm Bulk CMOS with  $W_n = 5L$  and  $W_p = 12L$ .



Figure 6.5: Monte Carlo HSPICE simulations of EPC of 32-bit RCA dual  $V_{th}$  design with bias voltage = 0.3 V under random  $V_{th}$  Gaussian variations at  $V_{dd}$  = 0.25 V in PTM 32nm Bulk CMOS with  $W_n = 5L$  and  $W_p = 12L$ .

Figure 6.4 shows the HSPICE simulation results of EPC of single low  $V_{th}$  design of a 32-bit RCA under variations. Similarly, Figure 6.5 shows the HSPICE simulation results of EPC of dual  $V_{th}$  design of a 32-bit RCA under variations. The dual  $V_{th}$  design investigated

in Figure 6.5 is the optimal dual- $V_{th}$  generated by the proposed framework. This optimal dual- $V_{th}$  corresponds to the minimum EPC point at  $V_{dd} = 0.25V$  with optimal bias voltage = 0.3 V. In both figures, the red curve represents the simulation data and the blue curve represents the fitted normal distribution.

### Chapter 7

#### Conclusion and Future Work

This chapter provides the summary of this dissertation and some suggestions for the future work.

# 7.1 Summary

As the density of modern SoC integration grows very fast to the range of billions or trillions of transistors per square millimeter, the energy density per operation is going beyond bearable limits. Therefore, energy consumption reduction has became a critical consideration for digital and analog integrated circuits. This dissertation focuses on investigation and optimization of sub-threshold circuits. When the supply voltage scales down to the sub-threshold range, the energy consumption tremendously reduces. Moreover, sub-threshold circuits are proven to be energy-efficient since the minimum energy consumption point typically occurs when  $V_{dd}$  is in sub-threshold range.

In Chapter 2, detailed background knowledge are provided on CMOS logic behavior in sub-threshold region. Sub-threshold operation or weak-inversion operation of MOSFET transistors has been ignored for a long time. Until the 1970s, researchers started to contribute much work in this field and validate the existence of sub-threshold current. Dependent on this "leaking" current, it is sufficient for the transistors to complete logic level transition under a supply voltage below its threshold voltage. Literature review on the history of sub-threshold circuit research is also presented. Dual- $V_{th}$  technique is also reviewed in this chapter. Since minimum energy operation is one of the highlights for sub-threshold circuits, it is introduced as well.

In Chapter 3, single- $V_{th}$  sub-threshold circuit design is investigated. It is revealed by theoretical expression that the energy per cycle is not dependent on  $V_{th}$ . The increment of  $V_{th}$  in single- $V_{th}$  design does not reduce energy per cycle since the reduction of leakage power consumption and the increase of circuit delay cancel each other out. In addition, this theory is verified by simulating a 32-bit ripple carry adder (RCA) in HSPICE with PTM 32nm Bulk CMOS technology.

In Chapter 4, detailed interpretations of dual- $V_{th}$  design framework are presented. The framework consists of library characterization and dual- $V_{th}$  assignment procedure which is built on the gate slack based dual- $V_{th}$  algorithm. For any given circuit, the framework reads in a circuit's gate-level netlist, analyzes the single low  $V_{th}$  design to get its minimum EPC first, then finds out the optimal high  $V_{th}$  level, optimal  $V_{dd}$  and optimal dual- $V_{th}$  assignments. The generated dual- $V_{th}$  design reduces minimum EPC by 10% to 30% over its single- $V_{th}$  version. Experimental results on a 32-bit ripple carry adder, 4-by-4 multiplier and ISCAS85 benchmark circuits are shown.

In Chapter 5, theoretical analysis demonstrates the EPC saving from simulation results. In Chapter 6, the impact of process variation on sub-threshold circuits is discussed, followed by a brief introduction of the history of variation-aware sub-threshold circuit design.

# 7.2 Future Work

# 7.2.1 Challenge with Scaled Technology

As a result of technology scaling, the supply voltage and threshold voltage of MOSFET transistors need to be reduced. Since dynamic energy is only related to switching capacitance and supply voltage, it can remain in an acceptable range. However, the reduction of  $V_{th}$  brings significant increase of leakage energy. It is reported that the leakage has became a more serious issue as technology goes into smaller scale. However, in [178, 80], the authors suggested that the effectiveness of Body Bias as a leakage reduction method decreases as

technology scales down. Therefore, it is worth exploring dual- $V_{th}$  technique in sub-32nm technology.

# 7.2.2 Variation-Aware Design

In the proposed dual- $V_{th}$  algorithm, we do not take process variations into account. However, due to the exponential relation between sub-threshold current and threshold voltage, gate delay and leakage current are effected by  $V_{th}$  variations exponentially. Therefore, sub-threshold circuits are more sensitive to process variations, compared to above-threshold circuits. For variation-aware dual- $V_{th}$  design, the proposed gate slack based algorithm should update with some modifications. For example, during the procedure of library characterization stated in Chapter 2, gate delay should not be deterministic values. Instead, they should be characterized as a random variable which obeys log-normal distribution. Besides, the constraints in gate slack-based algorithm should be adjusted accordingly.

### Bibliography

- [1] A. Agrawal, D. Blaauw, and V. Zolotov, "Statistical timing analysis for intra-die process variations with spatial correlations," in *Proceedings of IEEE Intl. Conference on Computer Aided Design*, November 2003, pp. 900–907.
- [2] V. D. Agrawal, "Low-power design by hazard filtering," in *Proceedings of 10th International Conf. on VLSI Design*, January 1997, pp. 193–197.
- [3] V. D. Agrawal, M. L. Bushnell, G. Parthasarathy, and R. Ramadoss, "Digital circuit design for minimum transient energy and a linear programming method," in *Proceedings of 12th International Conf. VLSI Design*, 1999, pp. 434–439.
- [4] M. Allani, "Polynomial-time algorithms for designing dual-voltage energy efficient circuits," Master's thesis, Auburn University, ECE Department, Auburn, Alabama, December 2011.
- [5] M. Allani and V. D. Agrawal, "Energy-efficient dual-voltage design using topological constraints," *Journal of Low Power Electronics*, vol. 9, no. 3, pp. 275–287, October 2011.
- [6] M. Allani and V. D. Agrawal, "An efficient algorithm for dual-voltage design without need for level conversion," in *Proceedings of IEEE Southeastern Symp. on System Theory*, March 2012, pp. 51–56.
- [7] R. Amirtharajah and A. Chandrakasan, "Self-powered low power signal processing," in *Proceedings of Symp. on VLSI Circuits Digest of Technical Papers*, June 1997, pp. 25–26.
- [8] R. Amirtharajah and A. P. Chandrakasan, "Self-powered signal processing using vibration-based power generation," *IEEE Journal of Solid State Circuits*, vol. 33, no. 5, pp. 687–695, May 1998.
- [9] R. Amirtharajah, S. Meninger, J. Mur-Miranda, A. P. Chandrakasan, and J. Lang, "A micropower programmable DSP powered using a MEMS based vibration-to-electric energy converter," in *Proceedings of IEEE Intl. Solid State Circuits Conference Digest of Technical Papers*, February 2000, pp. 362–363.
- [10] S. Bang, Y. Lee, I. Lee, Y. Kim, G. Kim, D. Sylvester, and D. Blaauw, "A fully integrated switched-capacitor based PMU with adaptive energy harvesting technique for ultra-low power sensing applications," in *Proceedings of IEEE Symp. on Circuits and Systems*, May 2013, pp. 709–712.
- [11] R. W. J. Barker, "Small-signal subthreshold model for IGFETs," Electronics Letters, vol. 12, no. 10, pp. 260–262, May 1976.
- [12] M. B. Barron, "Low level currents in insulated gate field effect transistors," *Solid State Electronics*, vol. 15, no. 3, pp. 293–302, March 1972.
- [13] N. C. Beaulieu, A. A. Abu-Dayya, and P. J. McLane, "Estimating the distribution of a sum of independent lognormal random variables," *IEEE Trans. on Communications*, vol. 43, no. 12, pp. 2869–2873, December 1995.

- [14] D. Blaauw, D. Sylvester, and Y. Lee, "From digital processors to analog building blocks: enableing new applications through ultra-low voltage design," in *Proceedings of Subthreshold Microelectronics Conference*, October 2012, p. 1.
- [15] D. Blaauw and B. Zhai, "Energy efficient design for subthreshold supply voltage operation," in *Proceedings of IEEE Intl. Symp. on Circuits and Systems*, May 2006, pp. 21–24.
- [16] K. A. Bowman, B. L. Austin, J. C. Eble, X. Tang, and J. D. Meindl, "A physical alpha-power law MOSFET model," in *Proceedings of Intl. Symp. on Low Power Electronics and Design*, August 1999, pp. 218–222.
- [17] K. A. Bowman, B. L. Austin, J. C. Eble, X. Tang, and J. D. Meindl, "A physical alpha-power law MOSFET model," *IEEE Journal of Solid State Circuits*, vol. 34, no. 10, pp. 1410–1414, October 1999.
- [18] B. H. Calhoun and A. P. Chandrakasan, "Stand-by voltage scaling for reduced power," in *Proceeding of IEEE Custom Integrated Circuits Conference*, October 2003, pp. 639–642.
- [19] B. H. Calhoun and A. P. Chandrakasan, "Characterizing and modeling minimum energy operation for subthreshold circuits," in *Proceedings of Intl. Symp. on Low Power Electronics* and Design, August 2004, pp. 90–95.
- [20] B. H. Calhoun and A. P. Chandrakasan, "Analyzing static noise margin for sub-threshold SRAM in 65nm CMOS," in *Proceedings of European Solid State Circuits Conference*, September 2005, pp. 363–366.
- [21] B. H. Calhoun and A. P. Chandrakasan, "Modeling and sizing for minimum energy operation in sub-threshold circuits," *IEEE Journal of Solid-State Circuits*, vol. 40, no. 9, pp. 1778–1786, September 2005.
- [22] B. H. Calhoun and A. P. Chandrakasan, "Ultra-dynamic voltage scaling using sub-threshold operation and local voltage dithering in 90nm CMOS," in *Proceeding of IEEE Intl. Solid* State Circuits Conference, February 2005, pp. 300–301.
- [23] B. H. Calhoun and A. P. Chandrakasan, "A 256Kb sub-threshold SRAM in 65nm CMOS," in *Proceedings of IEEE Intl. Solid State Circuits Conference*, February 2006, pp. 628–629.
- [24] B. H. Calhoun and A. P. Chandrakasan, "Static noise margin variation for sub-threshold SRAM in 65nm CMOS," *IEEE Journal of Solid State Circuits*, vol. 41, no. 7, pp. 1673–1679, July 2006.
- [25] B. H. Calhoun and A. P. Chandrakasan, "Ultra-dynamic voltage scaling (UDVS) using sub-threshold operation and local voltage dithering," *IEEE Journal of Solid State Circuits*, vol. 41, no. 1, pp. 238–245, January 2006.
- [26] B. H. Calhoun and A. P. Chandrakasan, "A 256-kb 65-nm sub-threshold SRAM design for ultra-low-voltage operation," *IEEE Journal of Solid State Circuits*, vol. 42, no. 3, pp. 680– 688, March 2007.
- [27] B. H. Calhoun, D. C. Daly, N. Verma, D. F. Finchelstein, D. Wentzloff, A. Wang, S. Cho, and A. P. Chandrakasan, "Design considerations for ultra-low energy wireless microsensor nodes," *IEEE Trans. on Computers*, vol. 54, no. 6, pp. 724–740, June 2005.
- [28] B. H. Calhoun, A. Wang, and A. P. Chandrakasan, "Device sizing for minimum energy operation in subthreshold circuits," in *Proceedings of IEEE Custom Integrated Circuits Conference*, October 2004, pp. 95–98.

- [29] B. H. Calhoun, A. Wang, N. Verma, and A. P. Chandrakasan, "Sub-threshold design: the challenges of minimizing circuit energy," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, October 2006, pp. 366–368.
- [30] A. Chandrakasan, "Ultra low power digital signal processing," in *Proceedings of IEEE Intl. Conference on VLSI Design*, January 1996, pp. 352–357.
- [31] A. Chandrakasan, R. Amirtharajah, J. Goodman, and W. Rabiner, "Trends in low power digital signal processing," in *Proceedings of IEEE Intl. Symp. on Circuits and Systems*, May 1998, pp. 604–607.
- [32] A. Chandrakasan, A. Burstein, and R. W. Brodersen, "A low power chipset for portable multimedia applications," in *IEEE Intl. Solid State Circuits Conference Digest Papers*, February 1994, pp. 82–83.
- [33] A. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low power techniques for portable real-time DSP applications," in *Proceedings of IEEE Intl. Conference on VLSI Design*, January 1992, pp. 203–208.
- [34] A. Chandrakasan, M. B. Srivastava, and R. W. Brodersen, "Energy efficient programmable computation," in *Proceedings of IEEE Intl. Conference on VLSI Design*, January 1994, pp. 261–264.
- [35] A. Chandrakasan, I. Yang, C. Viori, and D. Antoniadis, "Design considerations and tools for low-voltage digital system design," in *Proceedings of IEEE Design Automation Conference*, June 1996, pp. 113–118.
- [36] A. P. Chandrakasan, A. P. Dancy, J. Goodman, and T. Simon, "A low-power wireless camera system," in *Proceedings of Intl. Conference on VLSI Design*, January 1999, pp. 32–36.
- [37] A. P. Chandrakasan, J. Goodman, J. Kao, and W. Rabiner, "Design of a low-power wireless camera," in *Proceedings of IEEE Computer Society Workshop on VLSI System Level Design*, April 1998, pp. 24–27.
- [38] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, "Low power CMOS digital design," *IEEE Journal of Solid-State Circuits*, vol. 27, no. 4, pp. 473–483, April 1992.
- [39] J. Chen, L. T. Clark, and Y. Cao, "Robust design of high fan-in/out subthreshold circuits," in Proceedings of IEEE Intl. Conference on Computer Design: VLSI in Computers and Processors, October 2005, pp. 405–410.
- [40] J. Chen, L. T. Clark, and T. Chen, "An ultra-low-power memory with a subthreshold power supply voltage," *IEEE Journal of Solid-State Circuits*, vol. 41, no. 10, pp. 2344–2353, October 2006.
- [41] T.-H. Chen, J. Chen, L. T. Clark, J. E. Knudsen, and G. Samson, "Ultra-low power radiation hardened by design memory circuits," *IEEE Tran. on Nuclear Science*, vol. 54, no. 6, pp. 2004–2011, December 2007.
- [42] Z. Chen, C. Diaz, J. D. Plummer, M. Cao, and W. Greene, "0.18um dual Vt MOSFET process and energy-delay measurement," in *Proceedings of Intl. Electron Devices Meeting*, December 1996, pp. 851–854.
- [43] S. H. Choi, B. C. Paul, and K. Roy, "Novel sizing algorithm for yield improvement under process variation in nanometer technology," in *Proceedings of Design Automation Conference*, July 2004, pp. 454–459.

- [44] D. C. Daly and A. P. Chandrakasan, "A 6b 0.2-to-0.9V highly digital flash ADC with comparator redundancy," in *IEEE Intl. Solid State Circuits Conference Digest Papers*, February 2008, pp. 554–635.
- [45] D. C. Daly and A. P. Chandrakasan, "A 6-bit 0.2V to 0.9V highly digital flash ADC with comparator redundancy," *IEEE Journal of Solid State Circuits*, vol. 44, no. 11, pp. 3030– 3038, November 2009.
- [46] A. Dancy and A. Chandrakasan, "Ultra low power control circuits for PWM converters," in *Proceedings of IEEE Power Electronics Specialists Conference*, June 1997, pp. 21–27.
- [47] A. P. Dancy, R. Amirtharajah, and A. P. Chandrakasan, "High-efficiency multiple-output DC-DC conversion for low-voltage systems," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 8, no. 3, pp. 252–263, June 2000.
- [48] A. P. Dancy and A. P. Chandrakasan, "A reconfigurable dual output low power digital PWM power converter," in *Proceedings of Intl. Symp. on Low Power Electronics and Design*, August 1998, pp. 191–196.
- [49] M. Degrauwe, J. Rijmenants, E. Vittoz, and H. D. Man, "Adaptive biasing CMOS amplifiers," *IEEE Journal of Solid State Circuits*, vol. 17, no. 13, pp. 522–528, June 1982.
- [50] M. Degrauwe, E. Vittoz, and I. Verbauwhede, "A micropower CMOS instrumentation amplifier," IEEE Journal of Solid State Circuits, vol. 20, no. 3, pp. 805–807, June 1985.
- [51] M. G. R. Degrauwe, O. N. Leuthold, E. Vittoz, H. Oguey, and A. Descombes, "CMOS voltage references using lateral bipolar transistors," *IEEE Journal of Solid-State Circuits*, vol. 20, no. 6, pp. 1151–1157, December 1985.
- [52] N. Drego, A. P. Chandrakasan, and D. Boing, "Lack of spatial correlation in MOSFET threshold voltage variation and implications for voltage scaling," *IEEE Trans. on Semiconductor Manufacturing*, vol. 22, no. 2, pp. 245–255, May 2009.
- [53] M. Eisele, J. Berthold, D. Schmitt-Landsiedel, and R. Mahnkopf, "The impact of intra-die device parameter variations on path delays and on the design for yield of low voltage digital circuits," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 5, no. 4, pp. 360–368, December 1997.
- [54] C. Enz, F. Krummenacher, and E. Vittoz, "An analytical MOS transistor model valid in all regions of operation and dedicated to low-voltage and low-current applications," *Journal of Analog Integrated Circuits and Signal Processing*, vol. 8, no. 1, pp. 83–114, July 1995.
- [55] L. Fenton, "The sum of independent lognormal probability distributions in scatter transmission systems," *IEEE Trans. on Communication Systems*, vol. CS-8, pp. 57–67, March 1960.
- [56] D. Flynn, R. Aitken, A. Gibbons, and K. Shi, Low Power Methodology Manual: For Systemon-Chip Design. New York: Springer, 2007.
- [57] F. Gao and J. P. Hayes, "Total power reduction in CMOS circuits via gate sizing and multiple threshold voltages," in *Proceedings of Design Automation Conference*, 2005, pp. 31–36.
- [58] C. G. Garrett and W. H. Brattain, "Physical theory of semiconductor surfaces," *Physical Review*, vol. 99, no. 2, pp. 376–387, July 1955.
- [59] L. A. Geddes, "Historical highlights in cardiac pacing," IEEE Engineering in Medicine and Biology Magazine, pp. 12–18, June 1990.

- [60] J. Goodman, A. P. Dancy, and A. P. Chandrakasan, "An energy/security scalable encryption process using an embedded variable voltage DC/DC converter," *IEEE Journal of Solid State Circuits*, vol. 33, no. 11, pp. 1799–1809, November 1998.
- [61] W. M. Gosney, "Subthreshold drain leakage currents in MOS field effect transistors," IEEE Trans. Electron Devices, vol. ED-19, no. 2, pp. 213–219, February 1969.
- [62] V. Gutnik and A. P. Chandrakasan, "Embedded power supply for low-power DSP," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 5, no. 4, pp. 425–435, December 1997.
- [63] S. Hanson, M. Seok, D. Sylvester, and D. Blaauw, "Nanometer device scaling in subthreshold circuits," in *Proceedings of IEEE Design Automation Conference*, June 2007, pp. 700–705.
- [64] S. Hanson, D. Sylvester, and D. Blaauw, "A new technique for jointly optimizing gate sizing and supply voltage in ultra-low energy circuits," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, October 2006, pp. 338–341.
- [65] S. Hanson, B. Zhai, D. Blaauw, D. Sylvester, A. Bryant, and X. Wang, "Energy optimality and variability in subthreshold design," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, September 2006, pp. 363–365.
- [66] S. Hanson, B. Zhai, M. Seok, B. Cline, K. Zhou, M. Singhal, M. Minuth, J. Olson, L. Nazhandali, T. Austin, D. Sylvester, and D. Blaauw, "Performance and variability optimization strategies in a sub-200mV, 3.5pJ/Inst, 11nW subthreshold process," in *Proceedings of IEEE Symp. on VLSI Circuits*, June 2007, pp. 152–153.
- [67] F. Hu, Process-Variation-Resistant Dynamic Power Optimization for VLSI Circuits. PhD thesis, Auburn University, ECE Department, Auburn, Alabama, May 2006.
- [68] I. Hyunsik, M. Song, T. Hiramoto, and T. Sakurai, "Physical insight into fractional power dependence of saturation current on gate voltage in advanced short channel MOSFETs (alphapower law model)," in *Proceedings of Intl. Symp. on Low Power Electronics and Design*, August 2002, pp. 13–18.
- [69] N. Ickes, Y. Sinangil, F. Pappalardo, E. Guidetti, and A. P. Chandrakasan, "A 10 pJ/cycle ultra-low-voltage 32-bit microprocessor system-on-chip," in *Proceedings of European Solid* State Circuits Conference, September 2011, pp. 159–162.
- [70] R. C. Jaeger and T. N. Blalock, Microelectronic Circuit Design Third Edition. New York, NY: McGraw-Hill Sicence/Engineering/Math, 2008.
- [71] R. Jaramillo-Ramirez, J. Jaffari, and M. Anis, "Variability-aware design of sub-threshold devices," in *Proceedings of European Solid-State Device Research Conference*, September 2012, pp. 58–61.
- [72] D. Jee, D. Sylvester, D. Blaauw, and J. Kim, "A 0.45V 423nW 3.2MHz multiplying DLL with leakage-based oscillator for ultra-low-power sensor platforms," in *Proceedings of IEEE Intl. Solid State Circuits Conference Digest of Technical papers*, February 2013, pp. 188–189.
- [73] D. Jeon, Y. Kim, I. Lee, Z. Zhang, D. Blaauw, and D. Sylvester, "A low-power VGA full-frame feature extraction processor," in *Proceedings of IEEE Intl. Conference on acoustics*, speech and signal processing, May 2013, pp. 2726–2730.
- [74] D. Jeon, M. Seok, C. Chakrabarti, and D. Blaauw, "A super pipelined energy efficient subthreshold 240 MS/s FFT core in 65n CMOS," *IEEE Journal of Solid State Circuits*, vol. 47, no. 1, pp. 23–34, January 2012.

- [75] J. Kao, A. Chandrakasan, and D. Antoniadis, "Transistor sizing issues and tool for multi-threshold CMOS technology," in *Proceedings of Design Automation Conference*, June 1997, pp. 409–414.
- [76] J. Kao, S. Narendra, and A. Chandrakasan, "MTCMOS hierarchical sizing based on mutual exclusive discharge patterns," in *Proceedings of Design Automation Conference*, June 1998, pp. 495–500.
- [77] J. Kao, S. Narendra, and A. P. Chandrakasan, "Subthreshold leakage modeling and reduction techniques IC CAD tools," in *Proceedings of IEEE Intl. Conference on Computer Aided Design*, November 2002, pp. 141–148.
- [78] J. T. Kao and A. P. Chandrakasan, "Dual-threshold voltage techniques for low-power digital circuits," *IEEE Journal of Solid State Circuits*, vol. 35, no. 7, pp. 1009–1018, July 2000.
- [79] J. T. Kao, M. Miyazaki, and A. P. Chandrakasan, "A 175-MV multiply-accumulate unit using an adaptive supply voltage and body bias architecture," *IEEE Journal of Solid State Circuits*, vol. 37, no. 11, pp. 1545–1554, November 2002.
- [80] A. Keshavarzi, S. Ma, S. Narendra, B. Bloechel, K. Mistry, T. Ghani, S. Borkar, and V. De, "Effectiveness of reverse body bias for leakage control in scaled dual Vt CMOS ICs," in Proceedings of Intl. Symp. on Low Power Electronics and Design, August 2001, pp. 207–212.
- [81] M. Kethar and S. S. Sapatnekar, "Standby power optimization via transistor sizing and dual threshold voltage assignment," in *Proceedings of Intl. Conference on Computer-Aided Design*, November 2002, pp. 375–378.
- [82] C. H. Kim, H. Soeleman, and K. Roy, "Ultra-low-power dlms adaptive filter for hearing aid applications," *IEEE Tran. Very Large Scale Integration (VLSI) Systems*, vol. 11, no. 6, pp. 1058–1067, December 2003.
- [83] D. Kim, G. Chen, M. Fojtik, M. Seok, D. Blaauw, and D. Sylvester, "A 1.85fW/bit ultra low leakage 10T SRAM with speed compensation scheme," in *Proceedings of IEEE Intl. Symp.* on Circuits and Systems, May 2011, pp. 69–72.
- [84] G. Kim, Y. Lee, S. Bang, I. Lee, Y. Kim, D. Sylvester, and D. Blaauw, "A 695 pW standby power optical wake-up receiver for wireless sensor nodes," in *Proceedings of IEEE Custom Integrated Circuits Conference*, September 2012, pp. 1–4.
- [85] J. Kim and K. Roy, "Double-gate MOSFET subthreshold circuit for ultralow power applications," *IEEE Trans. on Electron Devices*, vol. BME-31, no. 12, pp. 817–823, December 2004.
- [86] K. Kim, Ultra Low Power CMOS Design. PhD thesis, Auburn University, ECE Department, Auburn, Alabama, May 2011.
- [87] K. Kim and V. D. Agrawal, "Dual voltage design for minimum energy using gate slack," in Proceedings of IEEE Intl. Conf. on Industrial Technology, March 2011, pp. 419–424.
- [88] K. Kim and V. D. Agrawal, "Minimum energy CMOS design with dual subthreshold supply and multiple logic-level gates," in *Proceedings of IEEE Intl. Symp. on Quality Electronic Design*, March 2011, pp. 1–6.
- [89] K. Kim and V. D. Agrawal, "True minimum energy design using dual below-threshold supply voltages," in *Proceedings of IEEE Intl. Conf. on VLSI Design*, January 2011, pp. 292–297.

- [90] K. Kim and V. D. Agrawal, "Ultra low energy CMOS logic using below-threshold dual-voltage supply," *Journal of Low Power Electronics*, vol. 7, no. 4, pp. 460–470, December 2011.
- [91] T. Klein, "Technology and performance of integrated complementary MOS circuits," *IEEE Journal of Solid-State Circuits*, vol. 4, no. 3, pp. 122–130, June 1969.
- [92] J. Koomen, "Investigation of the most channel conductance in weak inversion," *Solid-State Electronics*, vol. 16, no. 7, pp. 801–810, July 1973.
- [93] F. Krummenacher, "Micropower switched capacitor biquadratic cell," *IEEE Journal of Solid State Circuits*, vol. 17, no. 3, pp. 507–512, June 1982.
- [94] F. Krummenacher, E. Vittoz, and M. Degrauwe, "Class AB CMOS amplifier for micropower SC filters," *Electronics Letters*, vol. 17, no. 13, pp. 433–435, June 1981.
- [95] J. Kwong and A. P. Chandrakasan, "Variation-driven device sizing for minimum energy sub-threshold circuits," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, October 2006, pp. 8–13.
- [96] J. Kwong and A. P. Chandrakasan, "Advances in ultra-low-voltage design," IEEE Solid State Circuits Society Newslettet, vol. 13, no. 4, pp. 20–27, Fall 2008.
- [97] J. Kwong and A. P. Chandrakasan, "Executive summary: advances in ultra-low-voltage design," *IEEE Solid State Circuits Society Newslettet*, vol. 13, no. 3, p. 59, Summer 2008.
- [98] J. Kwong and A. P. Chandrakasan, "An energy-efficient biomedical signal processing platform," *IEEE Journal of Solid State Circuits*, vol. 46, no. 7, pp. 1742–1753, July 2011.
- [99] J. Kwong, Y. K. Ramadass, N. Verma, and A. P. Chandrakasan, "A 65nm sub-threshold microcontroller with integrated SRAM and switched capacitor DC-DC converter," *IEEE Journal of Solid State Circuits*, vol. 44, no. 1, pp. 115–126, January 2009.
- [100] F. S. Lee and A. P. Chandrakasan, "A 2.5nJ/b 0.65V 3-to-5GHz subbanded UWB receiver in 90nm CMOS," in *IEEE Intl. Solid State Circuits Conference Digest Papers*, February 2007, pp. 116–590.
- [101] I. Lee, S. Bang, D. Yoon, M. Choi, S. Jeong, D. Sylvester, and D. Blaauw, "A ripple voltage sensing MPPT circuit for ultra-low power microsystems," in *Proceedings of IEEE Symp. on VLSI Circuits*, August 2011, pp. 1–4.
- [102] Y. Lee, B. Ghiridar, Z. Foo, D. Sylvester, and D. Blaauw, "A sub-nW multi-stage temperature compensated timer for ultra-low-power sensor nodes," *IEEE Journal of Solid State Circuits*, vol. 48, no. 10, pp. 2511–2521, October 2013.
- [103] Y. Lee, B. Giridhar, Z. Foo, D. Sylvester, and D. Blaauw, "A 660pW multi-stage temperature-compensated timer for ultra-low-power wireless sensor node synchronization," in *Proceedings of IEEE Intl. Solid State Circuits Conference Digest of technical papers*, February 2011, pp. 46–48
- [104] Y. Lee, M. Seok, S. Hanson, D. Blaauw, and D. Sylvester, "Standby power reduction techniques for ultra-low power processors," in *Proceedings of IEEE European Solid State Circuits Conference*, September 2008, pp. 186–189.
- [105] Y. Lee, M. Seok, S. Hanson, D. Sylvester, and D. Blaauw, "Achieving ultra-low standby power with an efficient SCCMOS bias generator," *IEEE Trans. on Circuits and Systems II*, vol. 60, no. 12, pp. 842–851, December 2013.

- [106] Y. Lee, D. Sylvester, and D. Blaauw, "Synchronization of ultra-low power wireless sensor nodes," in *Proceedings of IEEE Intl. Midwest Symp. on Circuits and Systems*, August 2011, pp. 1–4.
- [107] Y. Lee, D. Sylvester, and D. Blaauw, "Circuits for ultra-low power milimeter-scale sensor nodes," in *Proceedings of Conference on Signals Systems and Computers*, November 2012, pp. 752–756.
- [108] F. Leuenberger and E. Vittoz, "Complementary-MOS low-power low-voltage integrated binary counter," *Proceedings of The IEEE*, vol. 57, no. 9, pp. 1528–1532, September 1969.
- [109] Y. Lin, S. Hanson, F. Albano, C. Tokunaga, R. Haque, K. Wise, A. Sastry, D. Blaauw, and D. Sylvester, "Low-voltage circuit design for widespread sensing applications," in *Proceedings* of *IEEE Intl. Symp. on Circuits and Systems*, May 2008, pp. 2558–2561.
- [110] Y. Lin, D. Sylvester, and D. Blaauw, "A 150pW program-and-hold timer for ultra-low-power sensor platforms," in *Proceedings of IEEE Intl. Solid State Circuits Conference Digest of technical papers*, February 2009, pp. 326–327.
- [111] B. Liu, H. R. Pourshaghaghi, S. M. Londono, and J. P. Gyvez, "Process variation reduction for CMOS logic operating at sub-threshold supply voltage," in *Proceedings of Euromicro Conference on Digital System Design*, August 2011, pp. 135–139.
- [112] Y. Lu, Power and Performance Optimization of Static CMOS Circuits with Process Variation. PhD thesis, Auburn University, ECE Department, Auburn, Alabama, August 2007.
- [113] Y. Lu and V. D. Agrawal, "Leakage and dynamic glitch power minimization using integer linear programming for Vth assignment and path balancing," in *Proceedings of Intl. workshop* on Power and Timing Modeling, Optimization and Simulation, September 2005, pp. 217– 226.
- [114] Y. Lu and V. D. Agrawal, "Leakage and dynamic glitch power minimization using integer linear programming for Vth assignment and path balancing," in *Proceedings of Intl. Workshop* on Power and Timing Modeling, Optimization and Simulation, 2005, pp. 217–226.
- [115] Y. Lu and V. D. Agrawal, "Statistical leakage and timing optimization for submicron process variation," in *Proceedings of IEEE Intl. Conference on VLSI Design*, January 2007, pp. 439–444.
- [116] Y. Lu and V. D. Agrawal, "Total power minimization in glitch-free CMOS circuits considering process variation," in *Proceedings of IEEE Intl. Conference on VLSI Design*, January 2008, pp. 527–532.
- [117] Y. Lu and V. D. Agrawal, "CMOS leakage and glitvh minimization for power-performance tradeoff," *Journal of Low Power Electronics*, vol. 2, no. 3, pp. 378–387, December 206.
- [118] J. T. Ludwig, S. H. Nawab, and A. P. Chandrakasan, "Low-power digital filtering using approximate processing," *IEEE Journal of Solid State Circuits*, vol. 31, no. 3, pp. 395–400, Mar 1996.
- [119] J. D. Meindl and A. J. Ford, "Implantable telemetry in biomedical research," *IEEE Trans. on Biomedical Engineering*, vol. BME-31, no. 12, pp. 817–823, December 1984.
- [120] P. Mercier, M. Bhardwaj, D. C. Daly, and A. P. Chandrakasan, "A low-voltage energy-sampling IR-UWB digital baseband employing quadratic correlation," *IEEE Journal of Solid State Circuits*, vol. 45, no. 6, pp. 1209–1219, June 2010.

- [121] P. Mercier, D. C. Daly, and A. P. Chandrakasan, "An energy-efficient all-digital UWB transmitter employing dual capacitively-coupled pulse shaping drivers," *IEEE Journal of Solid State Circuits*, vol. 44, no. 6, pp. 1679–1688, June 2009.
- [122] S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, and J. Yamada, "1-V power supply high-speed digital circuit technology with multithreshold-voltage CMOS," *IEEE Journal of Solid State Circuits*, vol. 30, no. 8, pp. 847–854, August 1995.
- [123] S. Narendra, V. De, S. Borkar, D. Antoniadis, and A. P. Chandrakasan, "Full-chip subthreshold leakage power prediction model for sub-0.18 um CMOS," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, August 2002, pp. 19–23.
- [124] S. Narendra, V. De, S. Borkar, D. Antoniadis, and A. P. Chandrakasan, "Full-chip subthreshold leakage power prediction and reduction techniques for sub-0.18 um CMOS," *IEEE Journal of Solid State Circuits*, vol. 39, no. 3, pp. 501–510, March 2004.
- [125] L. Nazhandali, B. Zhai, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, T. Austin, and D. Blaauw, "Energy optimization of subthreshold voltage sensor processors," in *Proceedings* of *IEEE Intl. Symp. on Computer Architecture*, June 2005, pp. 197–207.
- [126] D. Nguye, A. Davare, M. Orshansky, D. Chinney, B. Thompson, and K. Keutzer, "Minimization of dynamic and static power through joint assignment of threshold voltages and sizing optimization," in *Proceedings of Intl. Symposium on Low Power Electronics and Design*, 2003, pp. 158–163.
- [127] M. Orshansky and K. Keutzer, "A general probabilistic framework for worst case timing analysis," in *Proceedings of IEEE Design Automation Conference*, 2002, pp. 556–561.
- [128] P. Pant, K. Roy, and A. Chatterjee, "Dual-threshold voltage assignment with transistor sizing for low power CMOS circuits," *IEEE Tran. on Very Large Scale Integration (VLSI) Systems*, vol. 9, no. 2, pp. 390–395, April 2001.
- [129] H. C. Pao and C. T. Sah, "Effects of diffusion current on characteristics of metal-oxide (insulator) semiconductor transistors," Solid State Electronics, vol. 9, no. 10, pp. 927–937, October 1966.
- [130] B. Paul, A. Raychowdhury, and K. Roy, "Device optimization for ultra-low power digital subthreshold operation," *IEEE Intl. Symp. on Low-Power Electronics and Design*, pp. 96–101, August 2004.
- [131] B. Paul, A. Raychowdhury, and K. Roy, "Device optimization for digital subthreshold logic operation," *IEEE Tran. on Electron Devices*, vol. 52, no. 2, pp. 237–247, February 2005.
- [132] B. Paul, H. Soeleman, and K. Roy, "An 8x8 sub-threshold digital CMOS carry save array multiplier," in *Proceeding of European Solid State Circuits Conference*, September 2001, pp. 377–380.
- [133] B. C. Paul and K. Roy, "Optimizing oxide thickness for digital sub-threshold operation," in *Proceedings of Device Research Conference*, June 2006, pp. 63–64.
- [134] M. J. M. Pelgrom, A. C. J. Duinmaijer, and A. P. G. Welbers, "Matching properties of MOS transistors," *IEEE Journal of Solid-State Circuits*, vol. 24, no. 5, pp. 1433–1439, October 1989.
- [135] A. P. Pentland, M. Petrazzouli, A. Gerega, A. P. Pentland, and T. Starner, "Digital doctor: An experiment in wearable telemedicine," in *Proceeding of Intl. Symp. on Wearable Computers*, October 1997, pp. 173–174.

- [136] M. Qazi, M. E. Sinangil, and A. P. Chandrakasan, "Challenges and directions for low-voltage SRAM," *IEEE Design and Test of Computers*, vol. 28, no. 1, pp. 32–43, February 2011.
- [137] T. Raja, V. D. Agrawal, and M. L. Bushnell, "CMOS circuit design for minimum dynamic power and highest speed," in *Proceedings of 17th International Conf. VLSI Design*, January 2004, pp. 1035–1040.
- [138] T. Raja, V. D. Agrawal, and M. L. Bushnell, "Variable input delay cmos logic for low power design," *IEEE Trans. on VLSI Systems*, vol. 17, no. 10, pp. 1534–1545, October 2009.
- [139] Y. Ramadass and A. P. Chandrakasan, "Minimum energy tracking loop with embedded DC-DC converter enabling ultra-low-voltage operation down to 250mV in 65nm CMOS," *IEEE Journal of Solid State Circuits*, vol. 43, no. 1, pp. 256–265, January 2008.
- [140] Y. K. Ramadass and A. P. Chandrakasan, "Minimum energy tracking loop with embedded DC-DC converter delivering voltages down to 250mV in 65nm CMOS," in *Proceedings of IEEE Intl. Solid State Conference*, February 2007, pp. 564–587.
- [141] R. Rao, A. Srivastava, D. Blaauw, and D. Sylvester, "Statistical analysis of subthreshold leakage current for VLSI circuits," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 12, no. 2, pp. 131–139, Feburary 2004.
- [142] A. Raychowdhury, S. Mukhopadhyay, and K. Roy, "A feasibility study of subthreshold SRAM across technology generations," in *Proceedings of IEEE Intl. Conference on Computer Design*, October 2005, pp. 417–422.
- [143] A. Raychowdhury, B. Paul, S. Bhunia, and K. Roy, "Computing with subthreshold leakage: Device/circuit/architecture co-design for ultra-low power subthreshold operation," *IEEE Tran. on VLSI Systems*, vol. 13, no. 11, pp. 1213–1224, November 2005.
- [144] R. Rithe, S. Chou, J. Gu, A. Wang, S. Datla, G. Gammie, D. Buss, and A. P. Chandrakasan, "Cell library characterization at low voltage using non-linear operating point analysis of local variations," in *Proceedings of Intl. Conference on VLSI Design*, January 2011, pp. 112–117.
- [145] K. Roy, J. P. Kulkarni, and M. Hwang, "Process-tolerant ultralow voltage digital subthreshold design," in *Proceedings of IEEE Topical Meeting on Silicon Monolithic Integrated Circuits in RF Systems*, Jan. 2008, pp. 42–45.
- [146] K. Roy, L. Wei, and Z. Chen, "Multiple-Vdd multiple-Vth CMOS (MVCMOS) for low power applications," in *Proceedings of IEEE Intl. Symp. on Circuits and Systems*, May 1999, pp. 366–370.
- [147] T. Sakurai, "A JSSC classic paper: the simple model of CMOS drain current," *IEEE Solid State Circuits Society Newsletter*, vol. 9, no. 4, pp. 4–5, October 2004.
- [148] T. Sakurai, "CMOS inverter delay and other formulas using alpha-power law MOS model," in *Proceedings of Intl. Conference on VLSI Design*, January 2011, pp. 112–117.
- [149] T. Sakurai and A. R. Newton, "Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas," *IEEE Journal of Solid State Circuits*, vol. 25, no. 2, pp. 584–594, April 1990.
- [150] G. Schrom and S. Selberherr, "Ultra-low-power CMOS technology," in *Proceedings of International Semiconductor Conference*, October 1996, pp. 237–246.

- [151] S. C. Schwartz and Y. S. Yeh, "On the distribution function and moments of power sums with lognormal components," *Bell System Technical Journal*, vol. 61, no. 7, pp. 1441–1462, September 1982.
- [152] J. Seo, D. Sylvester, D. Blaauw, H. Kaul, and R. Krishnamurthy, "A robust edge encoding technique for energy efficient multi-cycle interconnect," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 19, no. 2, pp. 264–273, February 2011.
- [153] M. Seok, D. Blaauw, and D. Sylvester, "Clock network design for ultra-low power applications," in *Proceedings of ACM/IEEE Intl. Symp. on Low-Power Electronics and Design*, August 2010, pp. 271–276.
- [154] M. Seok, S. Hanson, J. Seo, D. Sylvester, and D. Blaauw, "Robust ultra-low voltage ROM design," in *Proceedings of IEEE Custom Integrated Circuits Conference*, September 2008, pp. 423–426.
- [155] M. Seok, S. Hanson, D. Sylvester, and D. Blaauw, "Analysis and optimization of sleep modes in subthreshold circuit design," in *Proceedings of IEEE Design Automation Conference*, June 2007, pp. 694–699.
- [156] M. Seok, S. Hanson, D. Sylvester, and D. Blaauw, "Sleep mode analysis and optimization with minimal-sized power gating switch for ultra-low Vdd operation," *IEEE Trans. on Very Large Scale Integration Systems*, vol. 20, no. 4, pp. 605–615, April 2012.
- [157] M. Seok, G. Kim, D. Blaauw, and D. Sylvester, "Variability analysis of a digitally trimmable ultra-low power voltage reference," in *Proceedings of European Solid State Circuits Conference*, September 2010, pp. 110–113.
- [158] C. SeongHwan, T. Xanthopoulos, and A. Chandrakasan, "Design of low power variable length decoder using fine grain non-uniform table partitioning," in *Proceedings of IEEE Intl. Symp. on Circuits and Systems*, June 1997, pp. 2156–2159.
- [159] C. SeongHwan, T. Xanthopoulos, and A. P. Chandrakasan, "A ultra low power variable length decoder for MPEG-2 exploiting codeword distribution," in *Proceedings of IEEE Cus*tom Integrated Circuits Conference, May 1998, pp. 177–180.
- [160] S. Sheng, A. Chandrakasan, and R. W. Brodersen, "A portable multimedia terminal," *IEEE Communications magazine*, vol. 30, no. 12, pp. 64–75, December 1992.
- [161] T. Simon and A. P. Chandrakasan, "An ultra low power adaptive wavelet video encoder with integrated memory," *IEEE Journal of Solid State Circuits*, vol. 35, no. 4, pp. 572–582, April 2000.
- [162] M. Sinangil, N. Verma, and A. P. Chandrakasan, "A reconfigurable 8T ultra-dynamic voltage scalable (U-DVS) SRAM in 65nm CMOS," *IEEE Solid State Circuits Society Newslettet*, vol. 44, no. 11, pp. 3163–3173, November 2009.
- [163] A. Sinha and A. P. Chandrakasan, "Dynamic power management in wireless sensor networks," *IEEE Design & Test of Computers*, vol. 18, no. 2, pp. 62–74, April 2001.
- [164] H. Soeleman and K. Roy, "Ultra-low power digital subthreshold logic circuits," *IEEE Journal of Solid State Circuits*, pp. 473–484, April 1999.
- [165] H. Soeleman and K. Roy, "Digital CMOS logic operation in the sub-threshold region," in *Proceedings of IEEE Great Lakes Symp. on VLSI*, March 2000, pp. 107–112.

- [166] H. Soeleman and K. Roy, "Sub-domino logic: Ultra-low power dynamic sub-threshold logic," in *Proceedings of IEEE Intl. Conference on VLSI Design*, 2001, pp. 211–214.
- [167] H. Soeleman, K. Roy, and B. Paul, "Robust ultra-low power sub-threshold DTMOS logic," in *Proceedings of IEEE Intl. Symp. on Low-Power Electronics and Design*, 2000, pp. 25–30.
- [168] A. Srivastava, D. Sylvester, and D. Blaauw, "Power minimization using simultaneous gate sizing, dual-Vdd and dual-Vth assignment," in *Proceedings of 41st Design Automation Con*ference, July 2004, pp. 783–787.
- [169] A. Srivastava, D. Sylvester, and D. Blaauw, "Statistical optimization of leakage power considering process variations using dual-Vth and sizing," in *Proceedings of IEEE Design Automation Conference*, July 2004, pp. 773–778.
- [170] T. Starner, "Human-powered wearable computing," *IBM Systems Journal*, vol. 35, pp. 618–629, 1996.
- [171] P. A. Stolk, F. P. Widdershoven, and D. B. M. Klaassen, "Modeling statistical dopant fluctuations in MOS transistors," *IEEE Trans. on Electron Devices*, vol. 45, no. 9, pp. 1960–1971, September 1998.
- [172] N. Sugii, R. Tsuchiya, T. Ishigaki, Y. Morita, H. Yoshimoto, and S. Kimura, "Local Vth variability and scalability in silicon-on-thin-box SOTB CMOS with small random-dopant fluctuation," *IEEE Trans. on Electron Devices*, vol. 57, no. 4, pp. 835–845, April 2010.
- [173] V. Sundararajan and K. K. Parhi, "Low power synthesis of dual threshold voltage CMOS VLSI circuits," in *Proceedings of Intl. Symp. on Low Power Electronics and Design*, August 1999, pp. 139–144.
- [174] R. M. Swanson and J. D. Meindl, "Ion-implanted complementary MOS transistor in low-voltage circuits," *IEEE Journal of Solid-State Circuits*, vol. 7, no. 2, pp. 146–153, April 1972.
- [175] D. Sylvester, S. Hanson, M. Seok, Y. S. Lin, and D. Blaauw, "Designing robust ultra-low power circuits," in *Proceedings of Intl. Electron Devices Meeting*, December 2008, p. 1.
- [176] V. Sze, R. Blazquez, M. Bhardwaj, and A. P. Chandrakasan, "An energy efficient subthreshold baseband processor architecutre for pulsed ultra-wideband communications," in Proceedings of Ieee Intl. Conference on Acoustic, Speech and Signal Processing, May 2006, p. III.
- [177] R. R. Troutman and S. Chakravarti, "Subthreshold characteristics of insulated-gate field effect transistors," *IEEE Trans. Circuit Theory*, vol. CT-20, no. 6, pp. 659–665, November 1973.
- [178] T. F. Tsai, D. Duarte, M. Vijaykrishnan, and M. J. Irwin, "Implications of technology scaling on leakage reduction techniques," in *Proceedings of IEEE Design Automation Conference*, June 2003, pp. 187–190.
- [179] Y. Tsividis, "Eric Vittoz and the strong impact of weak inversion circuits," *IEEE Solid State Circuits Society Newsletter*, vol. 13, no. 3, pp. 56–58, Summer 2008.
- [180] Y. P. Tsividis and R. W. Ulmer, "A CMOS voltage reference," *IEEE Journal of Solid State Circuits*, vol. 13, no. 6, pp. 774–778, December 1978.

- [181] N. Verma and A. P. Chandrakasan, "A 65nm 8T sub-Vt SRAM employing sense-amplifier redundancy," in *Proceedings of IEEE Intl. Solid State Conference*, February 2007, pp. 328– 606.
- [182] N. Verma and A. P. Chandrakasan, "A 256kb 65nm 8T sub-threshold SRAM employing sense-amplifier redundancy," *IEEE Journal of Solid State Circuits*, vol. 43, no. 1, pp. 141–149, January 2008.
- [183] N. Verma, J. Kwong, and A. P. Chandrakasan, "Nanometer MOSFET variation in minimum energy subthreshold circuits," *IEEE Trans. on Electron Devices*, vol. 55, no. 1, pp. 163–174, January 2008.
- [184] C. Visweswariah, K. Ravindran, K. Kalafala, and S. G. Walker, "First-order incremental block-based statistical timing analysis," *IEEE Trans. on Computer-Aided Design of Inte*grated Circuits and Systems, pp. 2170–2180, October 2006.
- [185] E. Vittoz, "Micropower switched-capacitor oscillator," *IEEE Journal of Solid State Circuits*, vol. 14, no. 3, pp. 622–624, June 1979.
- [186] E. Vittoz, "Quartz oscillators for watches," in *Proceedings 10th International Congress of Chronometry*, 1979, pp. 131–140.
- [187] E. Vittoz, "Low power design: ways to approach the limits," in *IEEE Intl. Solid State Circuits Conference Digest Papers*, February 1994, pp. 14–18.
- [188] E. Vittoz, "The electronic watch and low-power circuits," *IEEE Solid State Circuits Newsletter*, vol. 13, no. 3, pp. 7–23, Summer 2008.
- [189] E. Vittoz, M. Degrauwe, and S. Bitz, "High-performance crystal oscillator circuits: Theory and applications," *IEEE Journal of Solid State Circuits*, vol. 23, no. 3, pp. 774–783, June 1988.
- [190] E. Vittoz and J. Fellrath, "New analog CMOS IC's based on weak inversion operation," in *Proceedings of 2nd European Solid State Circuit Conference*, September 1976, pp. 12–13.
- [191] E. Vittoz and J. Fellrath, "CMOS analog integrated circuits based on weak inversion operations," *IEEE Journal of Solid State Circuits*, vol. 12, no. 3, pp. 224–231, June 1977.
- [192] E. Vittoz and F. Krummenacher, "Micropower SC filters in Si-gate CMOS technology," in *Proceedings ECCTD'80*, 1980, pp. 61–72.
- [193] E. Vittoz and O. Neyroud, "A low-voltage CMOS bandgap reference," *IEEE Journal of Solid-State Circuits*, vol. 14, no. 3, pp. 573–579, June 1979.
- [194] A. Wang, B. H. Calhoun, and A. P. Chandrakasan, Sub-threshold Design for Ultra Low-Power Systems. Springer, 2006.
- [195] A. Wang, B. H. Calhoun, N. Verma, J. Kwong, and A. P. Chandrakasan, "Ultra-dynamic voltage scaling for energy starved electronics," Government Microcircuit Applications and Critical Technology Conference, pp. 451–454, March 2007.
- [196] A. Wang and A. P. Chandrakasan, "Energy-efficient DSPs for wireless sensor networks," *IEEE Signal Processing Magazine*, vol. 19, no. 4, pp. 68–78, July 2002.
- [197] A. Wang and A. P. Chandrakasan, "Energy-aware architectures for a real-valued FFT implementation," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, August 2003, pp. 25–27.

- [198] A. Wang and A. P. Chandrakasan, "A 180-mV FFT processor using subthreshold circuit techniques," in *Proceedings of IEEE Intl. Solid State*, February 2004, pp. 292–293.
- [199] A. Wang and A. P. Chandrakasan, "A 180-mV subthreshold FFT processor using a minimum energy design methodology," *IEEE Journal of Solid State Circuits*, vol. 40, no. 1, pp. 310–319, January 2005.
- [200] A. Wang, A. P. Chandrakasan, and S. V. Kosonocky, "Optimal supply and threshold scaling for subthrshold CMOS circuits," in *Proceedings of IEEE Intl. Computer Society Annual Symp. on VLSI*, April 2002, pp. 5–9.
- [201] Q. Wang and S. B. K. Vrudhula, "Static power optimization of deep submicron CMOS circuits for dual Vt technology," in *Proceedings of Intl. Computer-Aided Design*, November 1998, pp. 490–496.
- [202] L. Wei, Z. Chen, M. Johnson, and K. Roy, "Design and optimization of low voltage high performance dual threshold CMOS circuits," in *Proceedings of Design Automation Conference*, June 1998, pp. 489–494.
- [203] L. Wei, Z. Chen, K. Roy, M. C. Johnson, Y. Ye, and V. De, "Design and optimization of dual-threshold circuits for low-voltage low-power applications," *IEEE Tran. on VLSI Systems*, vol. 7, no. 1, pp. 16–24, March 1999.
- [204] L. Wei, Z. Chen, K. Roy, Y. Ye, and V. De, "Mixed Vth (MVT) CMOS circuit design methodology for low power applications," in *Proceedings of Design Automation Conference*, June 1999, pp. 430–435.
- [205] L. Wei, Z. Chen, Y. Ye, and V. De, "Mixed-Vth MVT CMOS circuit design methodology for low power applications," in *Proceedings of Design Automation Conference*, June 1999, pp. 430–435.
- [206] L. Wei, K. Roy, and V. De, "Low voltage low power CMOS design techniques for deep submicron ICs," in *Proceedings of IEEE Intl. Conference on VLSI Design*, January 2000, pp. 24–29.
- [207] L. Wei, K. Roy, and C. K. Koh, "Power minimization by simultaneous dual-vth assignment and gate-sizing," in *Proceedings of IEEE Intl. Custom Integrated Circuits Conference*, May 2000, pp. 413–416.
- [208] D. D. Wentzloff, F. S. Lee, , D. C. Daly, M. Bhardwaj, P. Mercier, and A. P. Chandrakasan, "Energy efficient pulsed-UWB CMOS circuits and systems," in *IEEE Intl. Conference on Ultra-wideband*, September 2007, pp. 282–287.
- [209] I. Yang, A. Lochtefeld, S. Narendra, and A. Chandrakasan, "Experimental exploration of ultra-low power CMOS design space using SOIAS dynamic Vt control technology," in *Pro*ceedings of IEEE Intl. SOI Conference, October 1997, pp. 76–77.
- [210] Y. Yang, C. Vieri, A. P. Chandrakasan, and D. A. Antoniadis, "Back-gated CMOS on SOIAS for dynamic threshold voltage control," *IEEE Trans. on Electron Devices*, vol. 44, no. 5, pp. 822–831, May 1997.
- [211] J. Yao and V. D. Agrawal, "Dual-threshold design of sub-threshold circuits," in *Proceedings* of IEEE SOI-3D-Subthreshold Microelectronics Technology Conference, October 2013, pp. 77–78.

- [212] B. Zhai, D. Blaauw, D. Sylvester, and S. Hanson, "A sub-200mV 6T SRAM in 130nm CMOS," in *Proceedings of IEEE Intl. Solid State Circuits Conference*, February 2007, pp. 332–333.
- [213] B. Zhai, S. Hanson, D. Blaauw, and D. Sylevester, "Analysis and mitigration of variablity in subthreshold design," in *Proceedings of ACM/IEEE Intl. Symp. on Low Power Electronics and Design*, August 2005, pp. 20–25.
- [214] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "Analysis and mitigation of variability in subthreshold design," in *Proceedings of IEEE Intl. Symp. on Low Power Electronics and Design*, August 2005, pp. 20–25.
- [215] B. Zhai, S. Hanson, D. Blaauw, and D. Sylvester, "A variation-tolerant sub-200 mv 6-T subthreshold SRAM," *IEEE Journal of Solid-State Circuits*, pp. 2338–2348, October 2008.
- [216] B. Zhai, L. Nazhandali, J. Olson, A. Reeves, M. Minuth, R. Helfand, S. Pant, D. Blaauw, and T. Austin, "A 2.60pJ/Inst subthreshold sensor processor for optimal energy efficiency," in *Proceedings of IEEE Symp. on VLSI Circuits*, June 2006, pp. 154–155.