Reducing ATE Test Time by Voltage and Frequency Scaling by Praveen Venkataramani A dissertation submitted to the Graduate Faculty of Auburn University in partial ful llment of the requirements for the Degree of Doctor of Philosophy Auburn, Alabama May 4, 2014 Keywords: ATE, external testing, power constrained test, test programming, test time reduction, VLSI testing Copyright 2014 by Praveen Venkataramani Approved by Vishwani D. Agrawal, James J. Danaher Professor of Electrical and Computer Engineering Fa Foster Dai, Professor, Electrical and Computer Engineering, Associate Director, AMSTC Adit D. Singh, James B. Davis Professor of Electrical and Computer Engineering Abstract During wafer sort, the fabricated chips are subjected to tests that verify if they meet the design speci cation. Test application time plays a critical role while verifying large volume of dice in a given period of time. These tests are carried out on an automatic test equipment (ATE). The time spent on the ATE directly a ects the nal cost of the device. Hence it is paramount to reduce test application time such that the device can be veri ed reliably while keeping the test time to a minimum. While reducing test application time is important, power dissipation is also important while considering reduction in test time. Power dissipation is often a trade o when deciding the test frequency and becomes a major limiting factor. One of the major approaches to test time reduction during circuit design is to implement multiple scan chains. This approach reduces test time drastically when compared to a same device implemented using a single scan chain. Other approaches involve manipulating test hardware and test patterns to reduce test time and testing many dice in parallel. The objective of this thesis is to obtain an optimum solution to the trade o and the feasibility of such approaches which can lead to new test methods in hardware and software. The problem is approached in two ways (i) by scaling the supply voltage, and (ii) by scaling the test frequency. Additionally, the two methods can be combined to reduce test time further. These methods can be used in tandem with existing methods to provide additional gain in test time reduction. The proposed methodologies are veri ed by simulation and through experiments. The experiments were carried on the Advantest T2000GS ATE located at Auburn University, Alabama. The simulations were performed using ISCAS?89 benchmark circuits and results show up to 50% reduction in test time. ii Acknowledgments First and foremost I would like to thank Prof. Vishwani Agrawal for his invaluable guidance throughout my work, our scheduled meetings helped me coarse tune my approaches in research and also extend it to my personal life. His enthusiasm in research will always inspire me. I would also like to thank Prof. Adit Singh and Prof. Foster Dai for being my committee members, Prof. Adit Singh?s VLSI Testing and VLSI design classes, and Prof. Foster Dai?s class on Analog circuits helped me understand the basics of testing methods and transistor behavior which served as the foundation to my work. I would like to thank Prof. Victor Nelson whose course on Computer Aided Design helped me learn the tools used for experiments in this work. I would like to thank Prof. Stuart Wentworth for his class on RF and Microwave Devices, and Prof. Stanley Reeves for his class on Digital signal processing, both of which gave me an opportunity to gain knowledge in the areas I was least exposed to. I would like to thank Prof. Sanjeev Baskiyar for agreeing to be my external reader. I would like to have my special thanks to Prof. Prathima Agrawal and Ms. Shelia Collins for managing my travel to several conferences and workshops; Ms. Jo Ann Loden for helping me with all the registrations and paperwork when I was delayed in India during Visa extension. I would like to thank all my friends especially, Gisel, Madhukar, Mahalingam and Sen- thuran for their emotional and nancial support during hard times, Ravi Tej, Ravi Kanth, Suraj, Swathi and Sindhu for helping me debug issues and accompanying me o hours in Broun 310 lab. Finally I would like to thank my parents and my sister for their immeasurable support throughout my studies, for which I am ever grateful, and dedicate this work and e orts to them. iii Table of Contents Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 VLSI Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Levels of Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Designs for Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.1 Scan-Based Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Built-In Self Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Compressor-Decompressor . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.4 SerDes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.5 Analog Bus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Prior Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2 VLSI Test Equipment and Procedure . . . . . . . . . . . . . . . . . . . . . . . . 14 2.1 Advantest T2000GS ATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Test Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.1.2 Test Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2 Time and Cost Relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 3 Test Time Theorem and Applications . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 Test Time Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.2 Applications of Test Time Theorem . . . . . . . . . . . . . . . . . . . . . . . 27 iv 3.2.1 Periodic Clock Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.2.2 Aperiodic Clock Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Scaling Supply Voltage to Reduce Periodic Clock Test Time . . . . . . . . . . . 35 4.1 Low Voltage Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.2 Reduced Supply Voltage Test . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4.3 Optimum Supply Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.1 SPICE Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3.2 Polynomial Equation to Obtain Minimum Supply Voltage . . . . . . 42 4.3.3 Solving for VDDopt, fopt and TTopt . . . . . . . . . . . . . . . . . . . . 44 4.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.5 Peak Power and Critical Path Frequency Measurements . . . . . . . . . . . . 49 4.5.1 Hardware Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 4.5.2 Peak Power and Frequency Measurements . . . . . . . . . . . . . . . 51 4.5.3 Minimizing Test Time for Given Peak Power Limit . . . . . . . . . . 53 5 Dynamic Scaling of Test Clock Period . . . . . . . . . . . . . . . . . . . . . . . 54 5.1 Aperiodic Clock Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 5.1.1 A Circuit Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.2 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5.1.3 Test Programming on ATE at Nominal Voltage . . . . . . . . . . . . 59 5.2 Optimum Voltage for Aperiodic Clock Test . . . . . . . . . . . . . . . . . . . 64 5.2.1 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.1 Adapting to At-Speed Testing . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 Adapting to Process-Voltage-Temperature Variations . . . . . . . . . . . . . 71 7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 v List of Figures 1.1 Illustration of a sequential circuit with 4 ip- ops. . . . . . . . . . . . . . . . . 7 1.2 Illustration of a sequential circuit with 4 ip- ops connected into a serial shift register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Illustration of a compressor-decompressor logic connected to multiple scan chains. 9 1.4 Illustration of a decompressor logic built using multiplexor [8]. . . . . . . . . . . 10 1.5 Illustration of a compressor built using XOR logic [8]. . . . . . . . . . . . . . . . 10 2.1 Advantest T2000 ATE at Auburn University, Alabama. . . . . . . . . . . . . . . 15 2.2 Mainframe of Advantest T2000GS at Auburn University. . . . . . . . . . . . . . 16 2.3 Test head of the Advantest T2000GS with an FPGA on the loadboard. . . . . . 17 3.1 Minimum test time as a function of supply voltage (VDD) for N-cycle periodic clock test. For a minimum test time TTsync supply voltage is Vsync which is lower than the nominal voltage Vnom. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2 Illustration of test power and test energy for every test cycle using periodic clock. The test clock period is determined by the cycle dissipating the maximum power. 31 3.3 Illustration of test power and test energy for every test cycle using aperiodic clock. The test clock period for every cycle is determined by the power dissipated during that cycle. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 vi 3.4 Minimum test time as a function of supply voltage (VDD) for N-cycle aperiodic clock test. For a minimum test time TTsync supply voltage is Vsync which is lower than the nominal voltage Vnom. . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.1 Comparison of the test time measured using SPICE simulations with the delay calculated using power law using s298 ISCAS?89 benchmark circuit. The test clock period is chosen as the functional period assuming that the test is not power constrained. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.2 Simulation and experimental test time plots to nd the optimum voltage for s298 benchmark circuit. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 Simulated and calculated curves using test period and functional period at various voltages. The direct approach using MATLAB (circled) matches the cross point of the curves obtained analytically using the periods calculated from equations (4.3) and (4.4) and the results obtained from SPICE (\plus" data points) in [66]. . . 47 4.4 Test setup for measuring peak power per cycle and maximum test frequency for an Altera DE2 FPGA board (with all its peripherals) using the NI ELVIS II+ bench-top prototyping board. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 4.5 Measured values of maximum power consumed per cycle (in blue) and maximum test frequency (in green) plotted as a function of the supply voltage for the Al- tera DE2 FPGA board tested using NI ELVIS II+ bench-top prototyping board. Switching power is dominated by the CMOS circuitry contained on the board. The FPGA itself is programmed with the function of s298 benchmark with scan. 52 5.1 Periodic and aperiodic clock simulation of 450-cycle scan test of ISCAS?89 bench- mark circuit s298. Periodic test clock frequency is 240MHz and test time is 1.87 s. Aperiodic clock test time is 1.31 s. . . . . . . . . . . . . . . . . . . . . . . . . . 56 vii 5.2 Aperiodic clock for 540-cycle scan test of s298 for a power budget of 1.23mW. Hor- izontal broken lines indicate four test clock periods available from the T2000GS ATE. Period used for a test cycle was the nearest higher ATE clock period. . . . 62 5.3 Periodic clock: ATE result for 540-cycle scan test of s298 benchmark circuit. Waveform shows 33 test cycles (cycles 13 through 46) of 500ns clock. Signals shown are scan-out, scan-in, scan enable, three primary outputs and clock. Green triangles under scan-out waveform are matching strobes. . . . . . . . . . . . . . 63 5.4 Aperiodic clock test: ATE result for 540-cycle scan test of s298 benchmark circuit. Waveforms shows 58 test cycles (cycles 13 through 71) taking the same time as taken by 46 cycles of periodic clock test in Figure 5.3. Clock periods used were 200, 300, 410 and 500 ns as shown in Figure 5.2. Signals shown are scan-out, scan-in, scan enable, three primary outputs and clock. Green triangles under scan-out waveform are matching strobes. . . . . . . . . . . . . . . . . . . . . . . 64 5.5 Aperiodic clock test time as a function of supply voltage showing the minimum test time voltage, Vasync. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.6 Minimum periodic and aperiodic clock test times for s298 circuit after selecting suitable supply voltages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 viii List of Tables 2.1 State of art IMA Tester Cost Analysis Data [62] . . . . . . . . . . . . . . . . . 22 4.1 Parameter values for s298 benchmark synthesized in 180nm CMOS technology (VDD = 1:8V, VTH = 0:39V, Critical Delay = 0.77ns). . . . . . . . . . . . . . . . 46 4.2 Optimum VDD for reduced test time of ISCAS?89 benchmark circuits. . . . . . . 48 4.3 Analytically obtained VDDopt and fopt for minimum scan test time of ISCAS?89 circuits in 180nm CMOS ( = 2, VTH = 0:39V). . . . . . . . . . . . . . . . . . . 50 5.1 Scan test time for ISCAS?89 circuits in TSMC 180nm technology. . . . . . . . . 58 5.2 Optimum voltage VDDopt for minimum aperiodic clock scan test time of ISCAS?89 circuits in 180nm CMOS ( = 2, VTH = 0:39V). . . . . . . . . . . . . . . . . . . 68 5.3 Test times for various methods normalized with respect to that of the conventional method (nominal 1.8V supply and periodic clock). . . . . . . . . . . . . . . . . 69 ix Chapter 1 Introduction 1.1 VLSI Testing An abstract form of testing is to observe the response of a device to known inputs under desired environmental conditions. For instance, consider a test to see how good a microwave oven works; here the oven will be considered as the device under test (DUT). To test the operation of the oven, one might try to cook some dish using the microwave oven for a preferred duration of time, where the dish becomes the test input and the time is the duration of test. If the food is cooked well, then the microwave oven operates as desired and thus it passes the test. If it did not cook well then either the microwave oven is faulty (if the food or the container is not hot), or it needs more time (if the food is warm but seems under cooked), i.e., insu cient test input, or the type of food cannot be cooked under the current conditions (if a bowl of rice is cooked without adding water), i.e., error in the process. Similarly, in very large scale integrated circuit testing (or simply VLSI testing), once the circuit is designed, it is tested for the correctness of operation. If the circuit fails the test it may be due to a fault in the design. Either the test was wrong to begin with, or if the design is fabricated then there might have been an error in the process, or the design could have been wrong, or the conditions under which it was tested were wrong. The role of testing is to verify if the design is free of manufacturing defects and the role of diagnosis is to identify the source of the failure [15]. 1.1.1 Levels of Testing The testing and diagnosis of the device can be classi ed into four types based on the purpose of each test [15]. The order in which they are performed is called as the test ow 1 and normally is considered as a standard procedure to make sure that the design performs as intended and to capture any anomalies early. Simulation This is the rst stage of testing where the design netlist is veri ed using computer programs called simulators. The netlist is a le that contains the structure of the design written in languages such as SPICE or VHDL. The simulators verify the correctness of these netlist by applying inputs and observing the output. Characterization Test The characterization test is done during the initial part of the design after fabrication, called the \silicon bring up", before it is sent to production. In this phase the industry designing the device obtains a sample of chips from a foundry that fabricates them. During this step the design is veri ed and debugged for the correctness of operation and whether the initial sample meets all the speci cations of the manufacturer. Functional tests are run and elaborate AC and DC measurements are performed. The thoroughness of the tests during this phase may involve probing internal nodes and the use of specialized tools using electron beam to observe the activity within the device. The step helps in identifying the correct operating limits for the device. These are obtained by performing tests under various voltage and frequency ranges and plotting the results as a Shmoo plot, which provides a graphical display of the sample over the operating range [15]. Production Test Once the chips pass the characterization test, they are sent to be produced in mass volumes. In the production test phase, the tests are less extensive but they still have to meet the manufacturer?s speci cations. During this phase the test time and hence the test cost plays a paramount role. To minimize the test time, a high coverage of faults is targeted 2 with minimum vectors possible. Since the chip is already designed and fabricated in mass volume the diagnosis of a failing chip is not performed, and only the pass/fail decision is made [15]. Binning of the dice based on the failing speci cations is also done in this stage to maximize yield. Burn-In Test The next phase in test ow is the burn-in test phase. The main idea of this test is to accelerate the age of the device. Due to various process variations the produced chips may not be identical. Though the process is controlled, some of these variations are unavoidable. It is found that once the devices are produced some fail early while others do not. Burn-in helps to accelerate the life of the chip by putting the device under test (DUT) at very high temperatures. During this test, the production tests are performed at very high temperatures and voltages. The test targets two types of failures, namely, infant mortality failures and freak failures. In infant mortality failure, the DUT fails very early due to weak resistive lines that burn out easily at a slightly accelerated environment. In freak failure, the DUT works properly as a good chip during normal conditions but fails after a very long time. Such devices are identi ed by putting the DUT through long hours of burn-in [15]. Incoming Inspection Once the chips pass the burn-in tests they are sold to various systems manufactures who integrate several components together on a board into one system. During that process, each component is tested to check correctness of operation and the thoroughness of the test can vary based on the system designed. The main idea during this phase is to minimize the e ort of replacing an individual defective component after it is integrated into the system [15]. For instance, a faulty graphics card integrated into a laptop and shipped to the customers, ended in recall and a signi cant loss was incurred by the device manufacturer and component manufacturer [6]. 3 At each step of tests described above, the design is veri ed for certain mismatches. These mismatches can be termed as defects, errors, or faults. A defect is de ned as the unintended di erence in the hardware structure from the actual design, an error is a mismatch in the output signal caused by the defect present in the design and a fault is the abstract form of the defect causing that error [15]. For instance, a device could have a process defect such as a weak interconnect between two logic gates, which may produce an error in the output signal and the test engineer will conclude that some fault in the design caused this erroneous output. 1.1.2 Fault Models There are two basic types of testing, called functional and structural testing. In func- tional testing, the circuit is tested for correct functional operation by giving functional vectors and verifying if the circuit works. In a structural test, the circuit is veri ed for any structural anomalies due to fabrication process errors. The structural tests are performed at every level of test described earlier and the test patterns may not have a functionally meaningful output. The structural test is performed based on certain fault models which describe the types of faults targeted. These fault models help to develop algorithms that enable the structural test. Some fault models generally used are given below. Gate Level Stuck-at Fault Model In a gate level stuck-at fault model, an input or an output signal line is considered to be stuck at a value 0 or 1 due to the defect in the design. In these tests the signal line is driven with a value opposite to the value being tested. For instance, to test an input line for a value stuck at 0, an input of 1 is applied on that line and the output is observed. If the line is not stuck at 0 then the output will be the expected value for the input value of 1 [15,29]. The work in this thesis mainly uses the stuck-at fault model for all experiments since it is the simplest fault model. 4 Transition Delay Fault Model Delay fault models checks if the DUT meets the timing speci cations. Resistive opens or shorts on interconnects can cause the signal to transition after a delay. In a transition delay fault model, the delay of a transition is measured by forcing a transition on the desired pin and observing that transition at the output. The transition fault model is similar to the stuck-at fault model in the sense that the stuck-at fault takes an in nite amount of time to transition. The di erence with the two fault models are that, unlike stuck-at fault models, the transition delay fault model is detected by a vector pair in which the rst vector initializes the pins to a value while the second vector cause the transition. In the transition fault model the transition is observed at any path the signal takes, irrespective of whether the path is a short or long path and the test passes if the transition is observed at the output within the speci ed timing threshold [15,29]. Path Delay Fault Model Path delay models are also called as lumped delay fault models. In this fault model, the delay of each gate in the path, if summed up, is claimed to cause more delay in a signal when traveling through that path. In contrast to the transition fault model, the test engineer has the exibility of choosing the path the signal should take. This could ensure that every path in the design meets the timing constraints [15,29]. Bridging Fault Model A bridging fault represents a short between two or more signal lines. The logic value on the signal can be modeled as a 1-dominant (OR Bridge), where a signal value of ?1? on one line forces the signal on the neighboring line to be ?1?, or a 0-dominant (AND bridge), where a signal value of ?0? on one line forces the signal on the other line to be ?0? [15,29]. 5 Transistor Level Stuck-at Fault Model In a transistor level stuck-at fault model the DUT is veri ed for stuck-open or stuck- short in the transistor. A stuck-open transistor fault would cause the transistor to behave as a dynamic level-sensitive latch, while a stuck-short fault would produce a direct path between the power supply line VDD and the ground line. These faults are not accurately modeled in the gate stuck-at fault model due to the complementary structure of nmos and pmos transistors in the complementary metal oxide semiconductor (CMOS) circuits. To monitor the stuck-short faults, a steady state current is supplied, this test is termed as IDDQ test. To monitor a stuck open fault two vectors are applied. The rst vector sensitizes the line to the opposite value, while the second vector propagates the value to an observing point [15,29]. 1.2 Designs for Test Test circuits can be combinational or sequential circuits. In a combinational circuit there are no registers, such as a D-type ip- op (DFF) to hold a previous value and hence is a simpler design. Due to the absence of state sensitive registers, testing combinational circuit is straight forward and the automatic test pattern generator (ATPG) will generate test patterns that will sensitize and propagate the fault to the output. In contrast, sequential circuits have registers that hold previous values until there is a change in the values and update the output on the next clock pulse, as shown in Figure 1.1, where the PI and PO are the primary inputs and primary outputs. The combinational logic consist of logic gates and the DFF are D-type ip- ops. 1.2.1 Scan-Based Tests Since registers are state sensitive, the correct output depends on their current state, which is determined by past values. An ATPG will have to create time frames with the preferred past states to generate sequential ATPG vectors and can be cumbersome to do so. 6 Figure 1.1: Illustration of a sequential circuit with 4 ip- ops. Due to this reason a sequential circuit is generally converted into a combinational circuit by connecting the registers serially into a serial shift register. The user can then set the register in the preferred state by shifting in the input values, and observe the response by shifting out the values. This type of test methodology is known as scan-based test; it helps to con- vert a complex sequential circuit into a more manageable combinational circuit. Figure 1.2 illustrates this method, where the registers are connected serially using a multiplexer with one input coming from the combinational logic and the other input coming from the scan input (SI) or from the previous register. The test vectors are serially shifted (scanned) in through the scan input (SI) pin and serially shifted (scanned) out through the scan output (SO) pin. The DUT operates in the normal mode when the scan-enable (SE) pin is ?0? and is switched to test mode by driving the SE pin high [29]. The work presented in this thesis uses scan-based methodology to synthesize the register transfer level (RTL) benchmark circuits. 1.2.2 Built-In Self Test Though scan-based test is widely used, one of the disadvantages of scan-based test methods is that, for a given fault coverage, the volume of test patterns generated by the ATPG can be very large for industrial circuits. Since automatic test equipment (ATE) 7 Figure 1.2: Illustration of a sequential circuit with 4 ip- ops connected into a serial shift register. has limited storage size, very large volumes of patterns can have serious overhead. The built-in self test (BIST) helps to mitigate this problem by incorporating a pattern generator along with the device under test (DUT), with little area overhead. It comprises a test pattern generator (TPG) built using a linear feedback shift register (LFSR), and output response analyzer (ORA) built using multiple input signature register (MISR) [60]. The ORA compacts the output responses from the DUT to form a \signature" which is compared with the \signature" from a known good circuit. The patterns are generated by providing an initial seed to the LFSR. Though the patterns generated by the LFSR may not be as random as in a scan-based test, they can be changed by varying the seed supplied to the LFSR. Random patterns help to capture hard-to detect faults by \chance", thereby providing high fault coverage in short time [29]. 1.2.3 Compressor-Decompressor In the current VLSI trend, designs have hundreds of thousands of sequential elements. When these elements are connected together to form a single shift register, also known as 8 Figure 1.3: Illustration of a compressor-decompressor logic connected to multiple scan chains. a single scan chain, the number of cycles needed to shift the input values through the scan chain could be large. To minimize this, the single scan chain is normally broken down into multiple smaller scan chains. Though the time to shift the values through the scan chains can be signi cantly reduced, the number of scan-in pins is increased. Since an ATE has limited amount of pins available, in order to drive the large number of pins on the chip, designs include a hardware circuit pair called compressor-decompressor, or in short codec. The job of the decompressor is to expand and broadcast the input values from the ATE or LFSR to multiple scan inputs within the circuit. The job of the compressor is to get the values from multiple scan out pins and shorten the length of the pattern before sending it to the ATE for storing or analysis. Figure 1.3 illustrates the compressor decompressor architecture of the Synopsys DFTMAX adaptive scan technology [8]. The decompressor is built using a multiplexer which can be used to switch from a 1:1 mode or broadcast mode shown in Figure 1.4, while the compressor is built using an exclusive OR (XOR) tree as shown in Figure 1.5. As the number of input pins for the decompressor increases the test circuit will have more controllability and hence the high test coverage can be obtained with fewer 9 Figure 1.4: Illustration of a decompressor logic built using multiplexor [8]. Figure 1.5: Illustration of a compressor built using XOR logic [8]. test patterns. As the output of the compressor increases, the length of the fault signature increases, thus providing better resolution to the signature, thereby reducing undesirable aliasing, where a faulty signature resembles the good signature [8,29]. 1.2.4 SerDes Though codecs are used with partitioned scan chains to minimize pin count, there may be more pin limitations, like having only one SI pin per chip. This necessitates the need for a separate functional block that takes in the test pattern serially and drives the scan inputs of the chip in parallel. This functional block is known as a serializer-deserializer logic, or in short, SerDes. The SerDes functional block contains a serial to parallel converter and a parallel to serial converter. Besides other applications, the use of SerDes has been suggested 10 for reducing the hardware area and power required for on-chip communication [33, 34]. In test application, patterns are normally shifted at high speed through the deserializer shift registers and then shifted at a slower speed through the scan chain. Likewise, the patterns in the serializer registers are shifted out at a high speed [16,41,52]. 1.2.5 Analog Bus Similar to SerDes, possibility exists for using analog signal transmission of test data [61]. It has been suggested that n-bit digital data can be converted into an analog voltage by a digital-to-analog converter (DAC), transmitted over a single wire, and then converted back to n digital bits by an analog-to-digital converter (ADC). Such scheme, as suggested for on-chip communication, reduces hardware and is shown to reduce power as well, though it must be carefully designed to limit noise related errors. 1.3 Prior Work Most digital VLSI circuits today are tested using the scan-based method [15]. This reduces the complexity of testing sequential circuits to that of testing combinational circuits. As mentioned earlier in the scan method, ip- ops are loaded and unloaded through a shift register mechanism for testing faults in the combinational logic. Custom system-on-chip (SoC) designs containing microprocessors, digital signal processors and memories use large numbers of clock cycles during scan-based tests. This directly impacts the nal cost of the chip [15]. In the era of low power devices that contain more than a billion gates, long test times have become a critical concern. While the large size of a device is one reason for long test times, the main limiting factor for test speed is the power dissipated during test due to signal transitions in the circuit. Test power dissipation is known to be 2 the functional power dissipation in central processing units (CPU) [47] and 4 the functional power dissipation in graphic processing units (GPU) [69]. If the power dissipated during tests go beyond the rated power of the 11 device then it is possible for a good device to fail or even be damaged. Several approaches have been investigated and implemented to reduce the total power dissipation of the circuit under test (DUT); however, these methods generally lengthen the test time [42]. Hence, in the current semiconductor industry, where devices continue to get denser and smaller, both test power and test time must be addressed together. Earlier approaches to reduce test time used pattern overlapping [20,23] and reusable scan chains [36] to eliminate unwanted scan chain operations through similar patterns to reduce the scan shift process. Reduction in test time depends on the availability of such patterns. Scan chain partitioning also reduces test time signi cantly but increases the number of scan input pins. Bonhomme et al. [13] and Chalkia et al. [17] proposed methods that can overcome this problem while achieving similar test time reduction as in multiple scan chains. Test time reduction for multi-core SoC designs requires power-constrained scheduling of tests [21, 22, 37]. Recent proposals by Sheshadri et al. [57{59] optimize SoC test schedules by selecting supply voltage and clock frequency. Shanmugasundaram and Agrawal [53{56] proposed a technique to reduce test time in power-constrained built-in self test (BIST) circuits. They implemented an activity monitor that increases the clock frequency if the monitor records low activity in the chain, otherwise it decreases the frequency. The method achieves 20 50% reduction in test time in BIST circuits with little area overhead. Hashempour et al. [32] implemented a system that uses both BIST and ATE in an e ort to reduce test time on the ATE. The methodology identi es all \easy-to-detect" faults using BIST and then uses ATE to identify the \hard-to-detect" faults. Implementing parallel testing, where multiple dice are tested in parallel, has also reduced test time when testing large volumes of dice. One noted disadvantage of such methods is that the time between two tests, also called as indexing time, becomes an overhead when one tester probe has to wait until the other probe completes its test. This can be mitigated by employing an aperiodic probe method, where in a dual-probe tester system, when one 12 probe detects a faulty die it has the exibility to check other dice for a good die by gross fault tests until the other probe nishes its lengthy tests [27,45]. This work presented in this thesis focuses on reducing the test application time and can work in tandem with above mentioned procedures. The test time reduction is achieved by implementing two methods: (1) by scaling supply voltage and (2) scaling test frequency. The methods are investigated mathematically to obtain dependencies and then each method is veri ed through simulations and experimentation on test equipment, such as Advantest T2000GS entry level ATE for the method using scaled frequency and National Instruments ELVIS [2] board test equipment for the method using scaled supply voltage. The research as it appears here has been presented as posters [63,64] and discussed at technical forums [10, 11,65{68]. 13 Chapter 2 VLSI Test Equipment and Procedure An undergraduate or a graduate student majoring in Electrical and Computer Engineer- ing would have several lab sessions dealing with digital and/or analog circuits. During those lab courses the student would use numerous transistors and integrated circuits to understand the practical applications of what they studied. Students implement the circuit on a cir- cuit board by plugging in and interconnecting components. They then supply input values through a computer or a voltage source connected to the board and observe the output on an oscilloscope or simply on an LED. The output is then veri ed on the monitor against the output they pre-calculated according to the instructions in their lab manual. If the output matches then they claim that the circuit works and move on, else their circuit does not work and they analyze it closely to x the faults. This is the most basic form of testing, where for a given circuit a set of input patterns is applied and the output response of that circuit is veri ed by comparing with known response. If the response matches then the circuit is good, else the circuit is bad and the test engineer then nds the source of the problem and attempts to x it using diagnostic tools. In an industry the testing happens from the day the chips are designed using a hardware description language (HDL) until the day the chips are shipped out. In the rst half of the chip?s design life, the software tools play a major role in test and debug of the design. Here the design is constantly veri ed for di erent process corners, such as power, voltage, and timing, using a transistor-level model. However, once the design is fabricated, it is more challenging to meet the required process corners for which the chip is being designed. During this second half of the chip?s life in the industry, automatic test equipment (ATE), or simply the tester, plays an important part in making sure that the chip has the expected design and 14 Figure 2.1: Advantest T2000 ATE at Auburn University, Alabama. is capable of operating with the desired performance. The basic function of the ATE is to drive the inputs with the test patterns and then monitor the output response from the chip. 2.1 Advantest T2000GS ATE Auburn University, Alabama, houses the automatic test equipment (ATE)- Advantest Model T2000GS, shown in Figure 2.1. The ATE is an entry level system manufactured by Advantest and can perform digital, mixed signal and RF tests. The test equipment consists of three units, the mainframe, a user interface console and a test head. Mainframe The mainframe, shown separately in Figure 2.2, supplies the main system power. It also houses the system controller, site controller and the bus matrix. The system controller 15 Figure 2.2: Mainframe of Advantest T2000GS at Auburn University. provides the tools and applications required by the test engineer to verify and debug the device under test (DUT) placed on the test head. It controls the user interface, such as keyboard and mouse connected to the system controller. Any information related to the test plan, such as test patterns and test program are stored in the system controller. The site controllers communicate with di erent modules placed in the test head. It executes the test program on the DUT or test site. Test head The test head, shown in Figure 2.3, consists of di erent modules used to test the DUT. The modules in the test head include a 500mA device power supply module (DPS 500mA), 250MHz Digital module (250MDM) and a sync generator. The sync generator provides the capability of generating multiple time domains or frequencies for the digital module. It 16 Figure 2.3: Test head of the Advantest T2000GS with an FPGA on the loadboard. generates the synchronization clock to synchronize the clock with the patterns. The DPS 500mA module has 32 channels and supplies the power to the DUT. The 250MDM consists of 32 I/O channel digital logic to drive and observe the signal on the DUT I/O pins. 2.1.1 Test Programming Once the device under test (DUT) is fabricated, it is tested to sort good and bad chips in the wafer. For the tester to accurately test the DUT, the test engineers have to provide the tester with three main inputs, namely the test program, the test vectors and the analog test waveform [15]. The production test pattern can be generated by the test software tools such as Mentor graphics Tessent Fastscan [7] or Synopsis Tetramax. Most of the testers in the industry are compatible with the test pattern format called standard test interface 17 language (STIL) [3, 40] test pattern le. It is the language used to de ne the test vectors applied to the DUT. The STIL les contain the following information required for the ATE: 1. De nition of each signal pin in the signal block, 2. Timing and waveform information in the timing block, 3. DC signal levels which are applied and expected, and 4. De nition of test patterns. The test program contains a sequence of instructions that describes the test ow, the patterns to be used and the test environment condition. Once the test program is loaded the tester uses its test pattern generator (TPG) and the frame processor to generate test patterns and the clock, respectively. The Advantest T2000GS ATE uses a native Open Architecture Test System (OPEN- STAR) Test Programming Language or OTPL for short. It is a modular programming solution that enables user to write procedures dealing with various aspects of the test in- dividually that can later be used with the test plan to obtain a complete test program. Apart from T2000s OTPL, the test plan can also be written using C++ though this requires complete knowledge of the test system and the test object model. To test a device on the T2000GS ATE the user will have to describe the DUT and the type of test to be performed. Unlike the STIL le, in OTPL the timing information and the de nition of each signal are de ned separately from the pattern le. To take advantage of the modular programming of T2000, several les are created. These are described next. Pin Description File This le de nes the signal and power pins available on the DUT and associates each pin with the resource type in the test system. For instance, any I/O signal pin is speci ed with digital pin resource or \dpin" and the power pins are speci ed with digital power supply 18 500mA or \dps500mA" resource. Within each resource de nition, the pins can be grouped and labeled for better readability. This le is typically named with a .pin extension. Socket File This le speci es the mapping between the DUT pins and the ATE connectors. This le does not o er any grouping of the pins and does a general mapping of each pin with the ATE connectors. Every signal pin is speci ed in this le and the corresponding ATE connectors can be found in the resource folder. Speci cation File The device speci cations, such as the supply voltage range, current range, timing and slew rate are speci ed in this le. The speci cations are de ned as a variable with a data type that indicates the type of speci cation (voltage, current or time) to provide more readability and consistency across other les. Each variable can be speci ed with a range of values, such as min, max and typical. The range is user de ned, and the range for each speci cation speci ed in the same order by separating values with commas. Levels File The levels le speci es the voltage and current levels at each signal and power pin. The values at each pin can be either xed or assigned a variable name from the speci cation le. In this le the levels can be speci ed common to a group in the pin description le or to an individual pin. Timing and Timing Map File The timing description related to the test clock period and the behavior of the signal (waveform) at each pin or pin group is speci ed in the timing le. Each timing group can be speci ed with 4 periods and up to 8 waveforms. The input value speci ed in the pattern le 19 must be in relation to the values speci ed in the timing waveform, i.e., if the input values such as ?X? or ?Z? indicating a don?t care bit or high impedance, respectively, are speci ed in the pattern le, the signal behavior for those values must be speci ed in the timing le. Test Condition File The preferred type of operating condition for a given test is speci ed in the test condition le. It includes the type of voltage levels and their speci cations, and the timing information for the test. Every test can be provided with a unique test condition. The test condition for every test can be unique and can either have new speci cation or use the range from the speci cation le. Pattern File The pattern le has the test patterns or test vectors to be applied during functional tests. The OTPL allows several types of pattern descriptions, such as algorithmic pattern generation (ALPG) , SCAN pattern generation, or a simple pattern list that speci es the values at each input and output pins. The tester?s pattern generator will generate the signals based on the values speci ed in the pattern le and timing behavior provided in the timing le. Test Plan File This is the main le that organizes the test ow and calls all the test condition and resource les. Every test ow can also be provided with an option of binning, which logs the failed device and the levels at which the failure occurred. The patterns that each test will use, is also called from this le. 20 2.1.2 Test Data Analysis Analyzing the test data helps to identify or sort the good chips from the bad ones. From the bad chips the test engineer can understand the fabrication process and ne tune the process for the next design to minimize the defects. The analysis also provides information about the design weaknesses [15]. The data also provide information on the quality of the test that indicates how thorough the test has been in sorting good chips. A chip that fails can easily be sorted as faulty, however if the chip passes it may be a case that it had passed for the given test model but can fail in some other scenario. Process variations play a key role in the discrepancies that occur during fabrication. It is quite possible that in the same wafer di erent chips may have, say, di erent operating frequencies due to process variations. Failure mode analysis of the failing chips can pro- vide information to improve the fabrication process. Normally chips failing due to process variations have similar failing patterns. The T2000GS o ers several graphical user interface (GUI) tools of which the logic an- alyzer and the oscilloscope are used in this work. The logic analyzer provides a digital representation of the signal activity during the test and the oscilloscope represents an analog representation of the signal. The oscilloscope provides the tools to measure signal charac- teristics such as rise and fall times, voltage and current values. These tools also provide indication of the expected and observed values and the time at which the event is observed. 2.2 Time and Cost Relationship Test time depends on the type of test conducted on the ATE. There are two categories of tests that are performed called the parametric tests and the functional tests. Parametric test are performed with slow speeds and the test time depends on the number of pins that are tested. Functional tests are performed at higher speeds than parametric tests and the test time depends on the number of vectors applied and the frequency at which they are 21 Table 2.1: State of art IMA Tester Cost Analysis Data [62] ATE Purchase Price $985K Depreciation 20% [27] Maintenance 4% Operating Cost 10% [15] Production weeks/yr 52 Production days/week 7 Production shifts/day 3 production hours/shift 8 Devices per slot 7000 Good devices test time 5 seconds Bad devices test time 0.3 seconds Yield 98% applied. Testing cost can be de ned as the cost incurred for the amount of time spent on the tester. This cost can be quanti ed as [15], Running cost = Depreciation + Maintenance + Operating Cost Consider the test data example given in Table 2.1. Using the data in the table we can calculate the test cost for a single chip. Running Cost = $985;000(0:2 + 0:04 + 0:1) = $334;900 Tester usage = weeks=yr days=week number of shifts hours=shift 3600 sec Tester usage = 52 7 3 8 3600 seconds ) Tester usage = 31;449;600 seconds Testing cost = Running costTester usage cents=second Testing cost = 334;90031;449;600 = 10 cents=second 22 Total test time = Total time for good devices + Total time for bad devices seconds Total test time = 7000 (0:98 5 + 0:02 0:3 ) = 34;342 seconds Total cost = Total test time testing cost Total cost = 34;342 seconds 10 cents=second = 343;420 cents Cost per die = Total costNumber of good dice = 50 cents Cost per die = 343;420(7000 0:98) = 50 cents Though parallel testing dominates the industry, cost of testing will still be signi cant owing to the volume of chips produced and the number of parallel sites available to run these tests. Having many parallel sites is also added to the cost of tester as a whole and involves maintenance costs. Hence reducing test time reduction is still a major concern in testing. 23 Chapter 3 Test Time Theorem and Applications In the previous section we saw how test time a ects the cost of a single chip. In this section we lay the foundation for the proposed methods by stating a theorem for minimum test time. 3.1 Test Time Theorem Theorem. For power constrained testing where the peak power during any clock cycle must not exceed PPEAKfunc, the test time (TT) has a lower bound, ETOTAL PPEAKfunc TT = ETOTAL PAVG (3.1) where ETOTAL is the total energy and PAVG is the average power consumed by the test. Proof: Consider a test that runs for N clock cycles and for cycle i, we de ne: Ti as period of the clock cycle, Edi as dynamic energy consumed during the cycle, Pli as leakage power dissipated during the cycle, and Ei as total energy consumed during the cycle. Then, test time and total energy are given by, TT = NX i=1 Ti (3.2) 24 ETOTAL = NX i=1 Ei = NX i=1 (Edi +Ti Pli) (3.3) In particular, for a periodic clock test, Ti = Ttest, i.e., all clock cycles have the same period Ttest, TT = N Ttest (3.4) The equality in equation (3.1) follows from the standard de nitions of energy and power. PAVG is the rate of energy usage, averaged over the test duration TT. Therefore, total energy is ETOTAL = TT PAVG. To prove the lower bound, the power constraint that each clock cycle must satisfy is examined. The clock cycles are assumed to have di erent periods and thus a conventional periodic clock would be a special case. Thus, Edi Ti +Pli PPEAKfunc; 8 1 i N (3.5) or Ti Edi +Ti PliP PEAKfunc ; 8 1 i N (3.6) Hence, from equations (3.2) and (3.3), TT 1P PEAKfunc NX i=1 (Edi +Ti Pli) = ETOTALP PEAKfunc (3.7) This proves the lower bound on test time in equation (3.1). Leakage power plays an interesting role. Notice that in inequality (3.6), Ti appears on both sides. For given PPEAKfunc as clock period Ti is increased to satisfy the power constraint, the right hand side also increases, though at a slower rate because of small Pli. The minimum period for ith clock cycle is, 25 Ti = EdiP PEAKfunc Pli (3.8) To determine Ti we must know dynamic energy Edi and leakage power Pli, both of which are functions of the input vector applied to the circuit in clock cycle i. For now, let us neglect the leakage power and thus equation (3.8) will take a simpler form, Ti = EdiP PEAKfunc EiP PEAKfunc (3.9) For a given set of test patterns generated by an automatic test pattern generator (ATPG), the total energy consumed during test remains unchanged irrespective of how tests are applied. The total test time is dependent only upon the average power consumed. In order to reduce the test time, it is required that the test be run with the smallest clock period possible while dissipating power less than the rated power. Since the minimum pe- riod is limited by the critical path delay of the DUT, test time is dependent on both the rated power and the structural delay of the circuit. The two constraints that determine the minimum test clock period can be de ned as follows, 1. Power Constraint - A test is power constrained if the minimum test clock period is limited by the maximum rated power for the circuit. We de ne this period as Tpower = EMAX(test)=PPEAKfunc where PPEAKfunc is the maximum power dissipated during functional operation or the rated maximum for the DUT and EMAX(test) is the maximum energy dissipated during any test cycle. 2. Structure Constraint - A test is structure constrained if the minimum test clock period is limited by the structural (critical path) delay of the DUT. We de ne the fastest clock as fstructure = 1=Tstructure where Tstructure is the structure constrained clock period. 26 Based on the above de nitions, the minimum test clock period would have to satisfy both power and structure constraints, i.e., Ttest = maxfTstructure; Tpowerg (3.10) In a power constrained test, the test clock period is Tpower >Tstructure, that is, Ttest = Tpower = EMAX(test)P PEAKfunc (3.11) Substituting equation (3.11) in equation (3.4) we get the total test time for power constrained test as; TTmin = N EMAX(test)P PEAKfunc (3.12) Equation (3.12) can also be represented as TTmin = ETOTALE AVG Tsynch = ETOTALP AVG (3.13) 3.2 Applications of Test Time Theorem In section 3.1, it was shown that for a given rated power, test time is limited by the total energy dissipated during test. Conventionally, energy can be reduced by modifying the test vectors. For instance, to increase the probability of identifying a fault with a given pattern set, the automatic test pattern generator (ATPG) uses \0?s" and \1?s" randomly to ll the \don?t care" bits during pattern generation. However it causes excessive switching in the scan chain during scan shift and thus increases the shift power. This e ect can be avoided by conservatively lling the \don?t care" bits with adjacent ll, where the \don?t care" bits are lled with the same value of the bits adjacent to it, or with only \0?s" or only \1?s". Since this procedure is done mainly to reduce power, for a given allowable power the ATPG normally increases the number of test patterns to achieve the desired test coverage. 27 Thus this increases test time and often is the trade o . Now the question is whether test time can be reduced using a given set of vectors, rated power, and the critical period for the device under test (DUT). Based on the theorem stated earlier, we describe two scenarios in this section and examine the feasibility of reducing test time with the given constraints. 3.2.1 Periodic Clock Test The rst scenario considers a test using a xed clock period for every cycle during test. This is the conventional method of testing and let us name it as periodic clock test, where every cycle has the same period as its neighboring cycle. Now to minimize the test time of a periodic clock test, let us assume a test with N clock cycles with a period Ttest and frequency ftest = 1=Ttest. As described in 3.1 the test clock period is constrained by rated power and the critical path delay of the circuit. Based on equations (3.10) and (3.11) the test clock period is limited by, Tstructure - the critical path delay which limits the minimum period, EMAXtest - the maximum cycle energy dissipated for a given set of vectors PPEAKfunc - the maximum allowable rated power for the device under test (DUT). In a power constrained test, the maximum power that any cycle can dissipate is limited to PPEAKfunc, hence PPEAKfunc can be assumed as a constant. Then based on equation (3.11) we infer that the only way to minimize Ttest is to minimize the numerator EMAXtest. Since for a given test the test vectors are practically unchanged, the switched capacitance during the test will not vary and thus the energy dissipated during any cycle will be proportional to the quadratic value of the supply voltage applied to the DUT during test. So reducing the supply voltage can signi cantly reduce the energy during test. Doing so, we now have lot of head room between the power dissipated during test and the allowable peak power. If we want to maintain the same power dissipation, PPEAKfunc, the frequency of the test must be increased. The new test clock period Ttest can be obtained using the equation (3.11) with 28 the energy dissipated at the new supply voltage. This way the test time can be reduced using the new power constrained test clock period at the new voltage. The idea of using the low supply voltage and increasing the frequency would work very well if not for one caveat, when the voltage is reduced the gates tend to switch slower due to the now increased time in charging the load capacitance. This indicates that the critical path delay can increase and in worst case change the critical path. Assuming that there is no change in the critical path, when the voltage is reduced the critical path delay increases. From equation (3.10), the test clock period is structure constrained if Tstructure > Tpower, and any reduction in voltage will increase the delay and hence the test clock period must increase. Hence it should be ensured that the voltage cannot be low enough that the power constrained test clock period is shorter than the structural delay. The optimum supply voltage should be such that the test clock period Ttest = Tpower = Tstructure. Thus for a periodic clock testing at optimum voltage, PPEAKfunc = EMAXtestT structure (3.14) and the minimum test time for a periodic clock test is given by TTsync = N Tstructure (3.15) Figure 3.1 illustrates the minimum test time as a function of supply voltage. if a test is performed at the nominal supply voltage, e.g., 1.8V for 180nm CMOS technology, the test clock period is limited by the maximum power dissipated by the DUT during any clock cycle. If the rated power is lower than the maximum power dissipated during test the test clock period must be wide enough to ensure that the test power does not exceed the rated power. If we reduce the voltage then the EMAXtest reduces and Tstructure increases. Based on equation (3.11) if the power dissipated is held constant to PPEAKfunc then the test clock period decreases. Repeating the experiment several times at each voltage level, as long as 29 Figure 3.1: Minimum test time as a function of supply voltage (VDD) for N-cycle periodic clock test. For a minimum test time TTsync supply voltage is Vsync which is lower than the nominal voltage Vnom. the test is still power constrained, we achieve a reduction in test time. At a certain supply voltage Vsynch < Vnom, the energy dissipated becomes low enough that the test is no longer power constrained and the structural delay of the circuit starts to dominate the test clock period. Thus, Figure 3.1 can be partitioned into two regions, the region on the right side indicates that the test time is power constrained and region on the left side indicates that the test time is structure constrained. The minimum value of test time occurs at the boundary of the two regions. The voltage at this boundary is the optimum voltage at which the test will be fastest. Any reduction in voltage beyond Vsynch, i.e. in the structure constrained region, will increase the test time signi cantly. 30 Figure 3.2: Illustration of test power and test energy for every test cycle using periodic clock. The test clock period is determined by the cycle dissipating the maximum power. 3.2.2 Aperiodic Clock Test In Section 3.2.1 we considered the scenario where the clock period was xed and thus the power constrained test clock period was determined by the maximum power dissipated during test. Then according to a theorem in Section 3.1 the periodic clock test serves as the upper bound of test time. In the second scenario, the goal is to achieve the lower bound for test time in the theorem. Consider the illustration in Figure 3.2, which shows the energy and power dissipated during a given test of N cycles (N = 8, here). The power constrained test clock period in a periodic clock test is determined by the cycle that consumes the most power. Though the maximum power is now limited within the allowed rated power for the DUT, there will be some cycles that dissipate lower power than the maximum power. Hence, in a power constrained test scenario equation (3.13) may not be the optimum solution for the minimum test time, since the denominator can be small if there are many cycles consuming lower power. This means that the power constrained test time can be reduced if the denominator can be larger. In mathematics, the arithmetic mean of any positive valued function is maximum when all the values in that function have the value equal to the maximum value in the function. Thus, we infer that in order to increase the value of PAVG, every cycle should 31 Figure 3.3: Illustration of test power and test energy for every test cycle using aperiodic clock. The test clock period for every cycle is determined by the power dissipated during that cycle. consume the same power equal to the rated maximum power for that device, i.e., each cycle will now dissipate the same maximum power equal to the rated power PPEAKfunc. This is achieved by using aperiodic clock test where the period of each clock cycle can be unique and may di er from the period of the neighboring cycle. This is illustrated in Figure 3.3, where every cycle has a unique period that is determined by the amount of power dissipated during that cycle. Though the period of each cycle is determined by the power dissipated during that cycle, the resulting period must not cause any setup or hold time violations. Hence the minimum clock period allowed is limited to the critical delay of the circuit. The period for each cycle in a aperiodic clock test will then be given by, Ti = maxfTstructure; EiP PEAKfunc g (3.16) where Ei; i = 1; 2; 3; ; N; is the energy consumption during the ith clock cycle, and Ti is the test clock period of the ith cycle and it must not be shorter than Tstructure. Notice that since the energy is independent of the chosen time period, the device still dissipates 32 the same amount of energy for the given test vectors as in the periodic clock test. Equa- tion (3.16) indicates that each cycle can be structure constrained or power constrained based on the energy dissipated during that cycle, i.e., the cycle is structure constrained if Ei PPEAKfunc Tstructure and the cycle is power constrained, if Ei >PPEAKfunc Tstructure. For instance, in Figure 3.3 since energy E5 and E7 are high, cycles T5 and T7 will de nitely be power constrained. However, because energy E1 to E3 are low the corresponding cycles could be structure constrained. Revisiting equation (3.6), we can notice that in an aperiodic clock test the leakage energy during the cycles with shorter time period will be lower. The test time for an aperiodic clock test is bounded by, NX i=1 maxfTstructure; EiP PEAKfunc g TTasync (3.17) TTasync TTsync = N EMAXtestP PEAKfunc (3.18) Equation (3.17) is true when there are a mix of low power and high power test cycles, and the equality in equation (3.18)will occur when all the cycles dissipate same amount of energy. While from equation (3.18) we can conclude that at any given voltage it is possible that, as long as the test is power constrained, the time taken by an aperiodic clock test will be lower than the time taken by a periodic clock test. So, as described for periodic clock test, there should be an optimum voltage at which the aperiodic clock test is fastest. The optimum voltage for an aperiodic test can in fact be inferred by back tracing from the optimum voltage of a periodic test. Consider the plot in Figure 3.4, which is an extension of the illustration in Figure 3.1. Here the point ?A? indicates the optimum voltage Vsynch at which the periodic clock test is fastest. If we increase the voltage from point ?A? then the test will become power constrained, and hence, as we discussed earlier, using a periodic clock test will have a mix of low and high power cycles and the clock period will be based on the cycle that consumes most power. If 33 Figure 3.4: Minimum test time as a function of supply voltage (VDD) for N-cycle aperiodic clock test. For a minimum test time TTsync supply voltage is Vsync which is lower than the nominal voltage Vnom. we use an aperiodic clock test beyond point ?A?, because there is a mix of low power and high power cycles, the low power cycles will use the structural period for the test to run periodically, while the cycles with higher power will be expanded aperiodically to dissipate same amount of power. In the region between point ?A? and point ?B? there will be a mix of structure constrained and power constrained cycles and the test is mostly dominated by the structure constrained cycles. The minimum test time for an aperiodic clock test will be at the supply voltage at which there are more structure constrained cycles than power constrained cycles, and the structural delay is at the minimum. This point is shown in the gure as Vasync, which is the optimum aperiodic supply voltage, and Vasynch > Vsynch always. From Figure 3.4 we can imply that the periodic clock test at the optimum voltage will be a special case of aperiodic test when every cycle of the aperiodic clock test is structure constrained. In the following chapters we will discuss more about the applications of the theorem with experimental example on a benchmark circuit and provide enough evidence to support the theorem with transistor level simulation results using several benchmark circuits. 34 Chapter 4 Scaling Supply Voltage to Reduce Periodic Clock Test Time 4.1 Low Voltage Tests Testing at low voltage has several advantages. Hao and McCluskey [31] have shown that manufacturing defects such as interconnect bridging and gate-oxide shorts become more visible (testable) at reduced voltage. Such defects are the main causes for early life failures and reliability issues in circuits but they often escape the test at nominal voltage [18,19,31]. When the voltage is reduced, the resistance of the short does not change and the voltage drop across these resistive shorts becomes high. According to Chang and McCluskey [18,19] the voltage at which these defects are detected lies between 2VTH to 2:5VTH. Roehr [49] indicates that for a reasonable yield, the voltage can be obtained through statistical analysis of min-VDD tests on a large sample of chips. Reducing power supply has a quadratic e ect on the dynamic power dissipation, hence it is an attractive option in testing, especially during scan shift operation [24]. For instance, a test pattern set that causes a lot of signal transitions in the device under test (DUT), due to random ll to obtain better fault coverage with fewer vectors, can perform the test at lower power supply voltages and avoid the power dissipation to exceed the rated power for the DUT. A cited disadvantage of reduced voltage testing is the possible change in critical paths [18], which can force an increase in the test clock period. Qian et al. [46] have suggested novel timing tests as an alternative solution to the conventional logic tests to identify gate oxide defects at very low power supply. 35 4.2 Reduced Supply Voltage Test As indicated in Section 4.1, testing at low voltages has its advantages and disadvan- tages. It was mentioned in Section 3.1 of Chapter 3 that the speed of a test is constrained by power dissipated by the DUT during test and the structural delay of the DUT. With regards to power, reducing the power supply has signi cant advantage over lowering the test power. In fact, even slightly lowering the voltage can have signi cant reduction in dynamic power dissipation and even more reduction in gate oxide and sub-threshold leakage power dissipation [35]. With respect to test time reduction, reducing power could enable us to increase the speed of testing, thus maintaining the same power dissipation. However, by reducing the supply voltage the gates switch slowly, thus increasing the critical path delay and sometimes a change in critical path. Hence, the question arises, how low can the voltage be reduced? In this section we will examine the lowest possible voltage without changing the critical path. The operational speed of a circuit is characterized by the time taken for a signal to propagate from one register to the next through a combinational path. The accumulated delays of individual gates in a path through which the signals propagate determine the total delay of that path. The path that has the longest delay becomes the critical path, and any path with a delay less than the critical path is considered as a non-critical path. The propagation delay of a gate represents the time to charge and discharge the load capacitor. When the gate switches, it operates in the saturation region and the drain current is directly proportional to the square of the di erence in gate-source voltage and the threshold voltage. More generally, in the region of saturation, the drain current can be shown to be directly proportional to (VGS VTH) [51], where is the velocity saturation index. The relation between gate delay and supply voltage is shown quantitatively by Sakurai and Newton [51] and by Nose and Sakurai [43]. A simpli ed proportionality relation between delay and supply voltage is given by Sakurai [50] and is shown below, 36 td / K VDD(V DD VTH) (4.1) According to Sakurai and Newton [51] the velocity saturation index ranges from 1 to 2 based on the channel length. Several methods [14, 51] can be used to nd a value for . For the work presented in this thesis, the value of is found to be near 2. However, the experiments can be performed for any value between 1 and 2 based on the available technology. To determine the accuracy of the delay calculated using equation (4.1), the delay ob- tained by triggering the critical path of a DUT can be compared with the calculated delay value using equation 4.1. In this experiment, the s298 ISCAS?89 benchmark circuit is synthe- sized using 180nm CMOS technology and we assume that the test is not power constrained, i.e., the test is only limited by the structural delay of the circuit. It was observed that the critical path determined by the Leonardo Spectrum [7] static timing analysis (STA) tool was a false path and hence a path was chosen between the two registers of the critical path. Using an ATPG tool, such as Mentor graphics Fastscan [5], a path delay vector set was obtained for a path with 6 out of 7 gates speci ed in the critical path. The initial path delay, measured using SPICE, was used to calculate the value of the constant K in the equation (4.1). With an assumption that the critical path will not change as the voltage reduces, the value for the delay was calculated using equation (4.1) for every voltage reduced from the nominal voltage of 1.8V down to 0.6V in steps of 0.1V. The new value was used as the new clock period and the SPICE simulation was performed again. If the expected transitions occurred in the path chosen, then the path delay was noted for that voltage. If the expected transitions did not occur, then the test clock period was increased and the test was repeated. Figure 4.1 shows the comparison of the calculated and measured values for minimum test time at each voltage reduction. The measurement assumes that the test is only structure constrained and hence the test runs at the functional speed. From the results it was observed that the delay calculated using the power law equation (4.1) was in correspondence with 37 Figure 4.1: Comparison of the test time measured using SPICE simulations with the delay calculated using power law using s298 ISCAS?89 benchmark circuit. The test clock period is chosen as the functional period assuming that the test is not power constrained. the measure values while reducing the supply voltage by up to half of the nominal supply voltage, beyond which the clock period had to be increased to obtain the expected results. So the experiment provides evidence that it is safe to assume that the critical path will not change for small reductions in supply voltage and for a given value of K and the delay can be found using the approximation in equation (4.1). As it will be described in the following sections, this conclusion helps to obtain the optimum voltage for test time reduction in periodic clock test. 4.3 Optimum Supply Voltage 4.3.1 SPICE Experiment In Chapter 3 it was stated that in a power constrained test, the test clock period is limited by the maximum allowable power of the circuit. In general test clock period can be 38 related as PMAXtest = EMAXtestT power )Tpower = EMAXtestP MAXtest = CL V 2 DD PMAXtest (4.2) where Tpower is the test clock period at a given peak power limit PMAXtest, EMAXtest is the maximum energy dissipated by any clock cycle during the entire test, and CL is the total switched capacitance in clock cycle that consumes most energy due to rising signal transitions. Since the technique is implemented for stuck at fault tests, the signal transitions in both scan shift and capture are accounted for to nd the cycle with maximum switching activity. The maximum allowable power of the device is usually the maximum power dissipated during its functional operation for which the hardware is designed. Hence in a power con- strained tests, the maximum allowable power during test must not exceed the maximum power dissipated during functional operation, i.e. PMAXtest PPEAKfunc. The power con- strained test clock period Tpower is, Tpower CL V 2 DD PMAXfunc (4.3) The leakage power dissipation depends on the current ow in the circuit when it is in the steady state. Hence the power dissipation due to leakage will remain the same during test as during functional operation [29]. Since the strategy is to lower the voltage and shrink the test clock period, the net e ect will be to reduce the leakage power as well as leakage energy per cycle during the test. In the following analysis, the dynamic power, which is a function of both signal transitions and short circuit power, is considered to dominate the total power dissipation. In this section we aim to nd the best voltage at which a power constrained test can run with minimum test time without exceeding the peak power or violating the critical path delay constraint of the circuit. As mentioned in the previous sections the test time can be 39 reduced while limiting the power by reducing the supply voltage. However, there exists a point where the voltage will not be enough to charge the output load capacitance within the right amount of time. Thus the value at the output will be wrong. At this point the circuit is considered structure constrained and the test time is now dependent on the critical path delay of the circuit. The gate delay of the circuit can be characterized by using the power law delay model in equation (4.1) proposed by Sakurai [51] [50]. This allows expressing the smallest structure constrained test clock frequency as, Tcritical K VDD(V DD VTH) (4.4) where K is a proportionality constant, which depends upon the critical path structure, timing margin, etc., and is the velocity saturation index. If the test is only structure constrained then the total test time can be given as TTstructure = N Tcritical (4.5) where TTstructure is total test time using structure constrained clock period TCLK in an N cycle test. To minimize the test time we nd the smallest test clock period, Topt, that will satisfy the power constraint (4.3) and critical path constraint (4.4). Thus, at any given voltage the optimum test period is given by Topt = maxfminTpower;minTcriticalg (4.6) then the minimum test time will be, TTmin = max(TTpower; TTstructure) (4.7) 40 The optimum voltage at which a power constrained test will run with the fastest clock and in least overall test time will be the voltage at which both TTpower and TTstructure are equal. An experiment to identify the optimum voltage was performed on the s298 sequential benchmark circuit. In order to observe the peak power dissipated by the circuit during test, as well as the critical path delay, scan vectors generated by the ATPG for stuck-at fault tests were combined with the vectors generated to trigger the critical path. The critical path obtained by the static timing analysis (STA) tool for the DUT was found to be a false path and hence the next longest path was found to be having a delay of 0:77ns, including the setup time. This path was considered in the experiments and correspondingly the value for K was calculated. Simulations were performed using Nanosim SPICE simulator [4] tool by varying the voltage from 1.8V down to 0.6V in steps of 0.1V. The peak energy dissipated and the critical path delay were measured at each voltage point. Using equations (4.2) and (4.5), the values for test times were calculated and the maximum of the two values is the total time as given by equation (4.7). Figure 4.2 shows the graph for the simulated and calculated results to nd the optimum voltage points. The point labeled minimum VDD is the cross point at which the power- constrained test clock period and the critical path delay of the circuit are approximately the same. Reducing the voltage beyond this point increases the test time as the critical path delay increases above the power-constrained clock period. Hence, the test must slow down. The dotted line, labeled \measured", shows the result from simulation. The best voltage is 1.07V with a total test time of approximately 1 s and a test clock frequency of 532MHz is the same as the functional frequency. The minimum of the \measured" curve exactly coincides with the cross-over point of the two calculated curves from equations (4.2) and (4.5). This validates the calculation of the best supply voltage for minimum test time is obtained from equation (4.7). Thus, the SPICE simulation, which is expensive for a large circuit, is not required for many voltages. Once EMax and tpd have been obtained at one 41 Figure 4.2: Simulation and experimental test time plots to nd the optimum voltage for s298 benchmark circuit. voltage (say, nominal voltage or 1.8V in our example), both equations (4.2) and (4.5) can be characterized. The optimum voltage can also be obtained directly for a given value of K, , switched capacitance CL and the rated power for the device PPEAKfunc. 4.3.2 Polynomial Equation to Obtain Minimum Supply Voltage From equation (4.3) we observe that as the voltage is reduced Tpower reduces. But from equation (4.4) Tcritical increases as the voltage is reduced. Thus if we plot equations (4.3) and (4.4) with respect to voltage, as the voltage reduces the two functions will cross each other at a point. The voltage VDDopt at which the test time is minimum must satisfy: Topt = Tpower = Tcritical (4.8) This relation was evident in Figure 4.2. To obtain a straight forward solution for the optimum voltage problem we make the following assumptions in our analysis: 42 1: Variation in threshold voltage VTH due to changes in supply voltage is not drastic and VTH is assumed to be constant for the supply voltage interval of interest. 2: The critical path remains unchanged as supply voltage changes. Thus, the value of K is assumed to be independent of the supply voltage. We equate the right hand sides of equations (4.3) and (4.4) according to (4.8) and substitute VDD = VDDopt: Topt = CL V 2 DDopt PMAXfunc = K VDDopt (VDDopt VTH) (4.9) We make two useful observations about the test conducted at supply voltage VDDopt that satis es equation (4.9): For shortest test time, the test clock period Topt is the minimum allowed by the critical path delay at VDDopt. The maximum power for a test cycle, CL VDDopt=Topt, equals the peak power speci- cation PMAXfunc. These observations help us experimentally nd the optimum test time parameters. To ana- lytically obtain VDDopt we derive a polynomial equation: V 1 +1DDopt VTH V 1 DDopt (K PMAXfuncC L ) 1 = 0 (4.10) or V +1x VTH Vx = 0 (4.11) where Vx = V 1 DDopt and = (K PMAXfuncC L ) 1 Since = 1 when the device is completely velocity saturated and = 2 if the device has no velocity saturation [50,51], equation (4.11) is a polynomial of degree three or lower, 43 which is solvable. Knowing the voltage VDDopt for the shortest test time, the corresponding shortest test clock period can be obtained from (4.9) as, Topt = CL V 2 DDopt PMAXfunc (4.12) The optimum test frequency is then fopt = 1T opt Here fopt is the highest power constrained test frequency. The polynomial equation (4.11) can be solved using any mathematical solver such as MATLAB. The values for K, , PMAXfunc and e ective maximum switched load capacitance CL during any test cycle can be obtained through simulation at nominal voltage. 4.3.3 Solving for VDDopt, fopt and TTopt The optimum voltage VDDopt will be the minimum voltage at which the test can run fastest without exceeding the maximum power limit of the device and without being struc- turally constrained due to an increase in the critical path delay because of scaling the supply voltage. Let us solve for the optimum voltage VDDopt using equation (4.11) for the s298 IS- CAS?89 sequential benchmark circuit synthesized for scan test in TSMC 180nm technology. The DUT is synthesized using Mentor Graphics Leonardo Spectrum tool [7]. The nominal voltage for this technology is 1:8V and the threshold voltage is 0:39V. The critical path de- lay obtained through static timing analysis (STA) using Leonardo Spectrum [7] was 1:5ns or 666MHz. To nd VDDopt using equation 4.11 we need values for the proportionality constant K, maximum allowable power limit PMAXfunc and the maximum switched capacitance CL, that will determine . 44 The alpha power law model given in equation (4.4) is an approximate method to nd the critical path delay for any circuit for a given supply voltage and threshold voltage. The value for , the velocity saturation index, in equation (4.4) ranges between 1 and 2 [50,51] and can be found using methods described in [14] and [51]. It can also be obtained from a simple curve tting to delay values at di erent voltages for a chain of inverters. In our experiment for 180nm technology, the value for was found to be 2 using the latter method. We can now rewrite equation 4.4 to nd the value for K as follows: K = Tcritical (VDD VTH) VDD To trigger the critical path for observing the delay we obtained a path delay vector set using Mentor Graphics Fastscan [5] for the path used in section 4.3.1. The STA for this path was given as 0:77ns including setup time. Post synthesis timing simulation of the DUT using Mentor Graphics Modelsim with a period of 0:77ns, was found to pass the test. The value for the proportionality constant K for this path was calculated to be 0:85. The value of K depends on the critical path of the circuit, hence based on assumption 2 in Section 4.3.2, the value of K is kept constant. The maximum allowable power limit for a circuit is normally given as a speci cation in the datasheet. In a power-constrained test the power dissipated during test must be kept under that limit. In the absence of a known power limit for our DUT, we determined the maximum allowable power by simulating 1,000 random patterns in functional mode and measured the power dissipated per cycle using Synopsys Nanosim transistor level simulator at the nominal voltage of 1:8V and a frequency of 500MHz. The maximum power over the entire functional operation is assumed to be the upper bound for the power during test. For the DUT in this example the upper bound was measured as 1:2mW. The next unknown in equation (4.10) is the maximum switched capacitance CL. It is de ned as the e ective switched load capacitance of the circuit during maximum rising signal 45 Table 4.1: Parameter values for s298 benchmark synthesized in 180nm CMOS technology (VDD = 1:8V, VTH = 0:39V, Critical Delay = 0.77ns). Parameter Value PMAX(func) 0.0012W CL 2.04pF K 0.85 2 transitions caused by any test cycle. Energy consumed during that cycle is, EMAXtest = CL V2DD where CL = maximum switched capacitance of the test pattern that causes the most rising signal transitions. Therefore, CL = EMAXtestV2 DD The value of EMAXtest can be obtained by simulating the test patterns at any arbitrary (slow) frequency f and measuring the maximum power PPEAKfunc for a clock cycle, i.e., EMAXtest = PPEAKfuncf where f is any frequency slower than the maximum allowed by the critical path. Once the value for EMAXtest is obtained, CL can be obtained from the equation above. For the DUT in this example the value for CL is obtained as 2:04pF. Table 4.1 summarizes the values obtained in this manner. Substituting these values into the expression for following equation (4.11) we get = 0:7158 and the equation becomes, V3X 0:39VX 0:7158 = 0 We use a numerical solver in MATLAB to nd the roots for VX. We obtain 3 roots, two complexes and one real. Since the supply voltage is a real number, it is logical to consider only the real root and discard the two complex roots. Solving for VDDopt from VX we get 46 Figure 4.3: Simulated and calculated curves using test period and functional period at various voltages. The direct approach using MATLAB (circled) matches the cross point of the curves obtained analytically using the periods calculated from equations (4.3) and (4.4) and the results obtained from SPICE (\plus" data points) in [66]. VDDopt = 1:0727V. This is the optimum voltage at which the test can run at maximum speed would better. Since at this voltage the test is still power constrained, we can calculate Topt from equation (4.3) where Ttest = Topt which gives us Topt = 1:95ns)fopt 512MHz. The total test time for the DUT can be calculated as TTopt = N Topt where N is the total number of test cycles. For the DUT in this example N = 498, hence the total test time is TTopt = 0:971 s. Figure 4.3 shows the calculated test time plots using equations (4.3) and (4.4), at various voltages for the s298 benchmark circuit. The circled data point indicates the optimum voltage value obtained from the numerical analysis. The values measured from SPICE at various voltages in Section 4.3.1 are shown in the curve \Spice Measurement". It is readily observed 47 Table 4.2: Optimum VDD for reduced test time of ISCAS?89 benchmark circuits. Total Peak per Nominal voltage Optimum Test Circuit scan test cycle (1.8V) test voltage test time name cycles power Test Test Supply Test Test reduction freq. time voltage freq. time (W) (MHz) ( s) (Volts) (MHz) ( s) (%) s298 450 0.0012 216.3 2.08 1.1 562 0.8 61.5 s298 498 0.0012 187 2.66 1.07 512 0.971 63.0 s298 540 0.0012 184 2.92 1.07 529 1.02 65.0 s382 704 0.0029 292 2.41 1.44 457 1.54 36 s713 810 0.0015 137.28 5.9 1.33 249.23 3.25 45.0 s1423 6975 0.0045 135.9 51.3 1.6 148.4 47.0 13.0 s1423 6975 0.0030 90.58 77 1.49 132.1 52.7 31.5 s13207 62237 0.0213 168 369 1.62 208.8 298 19.2 s15850 101708 0.240 67.35 1510 1.30 128.9 789 47.7 s38584 224113 0.350 88.54 2531 1.30 172.3 1300 48.6 from the graph that the numerical analysis to obtain the optimum voltage is in accordance with the SPICE measurement. 4.4 Results The procedure described in this section was repeated for several ISCAS?89 benchmark and the results are tabulated in Table 4.2. It gives the results based on Nanosim transistor level [4] simulation. The values slightly di er from [66]; the di erence is the number of cycles and the peak power used. The vectors and the power values are chosen such that they are consistent throughout this document. Though the results vary, for the given peak power and vectors, both results are valid. In the SPICE simulations, the optimum voltage is obtained through transistor level simulations for closely spaced voltages to nd the point before the circuit becomes structure constrained. In Table 4.2 it was observed that, if the value for the chosen PPEAKfunc is closer to the power dissipated during test, then the reduction obtained in test time using reduced voltage is not much. This is because, when the power dissipated by the test is closer to the rated power, the test runs at a speed closer to the functional speed and any reduction in supply voltage makes the test structure constrained. This is 48 seen in circuits s1423 and s15850. On the other hand, if the power dissipated during test is signi cantly greater than the rated power then signi cant reduction in test time is observed, as in s298 and s382. Most circuits today have the test power 2 the functional power in CPUs [47] and 4 the functional power in GPUs [69], hence signi cant reduction in test time is attainable. Using the polynomial method explained in Section 4.3.2, the optimum voltage was obtained analytically for the same benchmark circuits used in the SPICE experiments de- scribed above. The results shown in Table 4.3, correlate well with the simulation results in Table 4.2. Thus knowing the design speci cations such as, the velocity saturation index , proportionality constant K, maximum allowable power PPEAKfunc, and the maximum switched capacitance CL obtained from the maximum energy cycle, using the polynomial equation (4.11) the optimum voltage can be obtained in less time. Note that to solve the polynomial equation (4.11) for a given PMAXfunc, we need to simulate the circuit only once at the nominal voltage to nd the constants. For instance if the optimum voltage using SPICE simulations is achieved after 10 simulations, the time taken to obtain the optimum supply voltage and test time using the numerical analysis, is reduced by 110. It should be noted that since the test patterns generated for periodic test was able to nd all faults in the optimum VDD tests, the defect and fault coverage for stuck-at tests should be the same as in periodic test at nominal voltage. 4.5 Peak Power and Critical Path Frequency Measurements The scenario described in this chapter was used to perform experiments on a FPGA con gured with a benchmark circuit. The motive of this experiment is to experimentally observe the e ect of scaling supply voltage on power and frequency. Due to the absence of a method to measure power on the y using the Advantest T2000GS ATE, a bench test equipment by National Instruments [2] is used. 49 Table 4.3: Analytically obtained VDDopt and fopt for minimum scan test time of ISCAS?89 circuits in 180nm CMOS ( = 2, VTH = 0:39V). Propor- Maximum Total Peak Nominal voltage Optimum voltage Test Circuit tionality switched scan per cycle 1.8V test test time name constant capaci- test power Test Test Supply Test Test reduc- K tance cycles PMAXfunc freq. time VDDopt freq.fopt time tion ( 10 9) CL (pF) N (W) (MHz) ( s) (Volts) (MHz) ( s) (%) s298 0.85 1.76 450 0.0012 216.3 2.08 1.1 562 0.8 61.5 s298 0.85 2.04 498 0.0012 187.0 2.66 1.073 512 0.971 63.0 s298 0.85 2.06 540 0.0012 184.0 2.92 1.07 529 1.02 65.0 s382 1.75 3.07 704 0.0029 292.0 2.41 1.44 434 1.54 36.0 s713 2.79 3.36 810 0.0015 132.28 5.90 1.34 249.23 3.25 45.0 s1423 6.38 10.22 6975 0.0045 135.9 51.30 1.66 158.5 44 14.2 s1423 6.38 10.22 6975 0.0030 90.58 77.00 1.49 131.7 52.9 31.2 s13207 4.64 38.9 62237 0.0213 168 369 1.62 208.8 298 19.2 s15850 5.12 109 101708 0.0240 67.35 1510 1.31 128.9 789 47.7 s38584 4.03 156.8 224113 0.0350 88.54 2531 1.31 172.4 1300 48.6 4.5.1 Hardware Setup National Instruments ELectronic Virtual Instrumentation Suite II+ (NI ELVIS) [2] serves equally well as a bench-top test instrument and prototyping board. We used NI ELVIS to measure peak power per cycle and the maximum circuit test frequency for a given supply voltage. The circuit used for the measurements was the Altera DE2 Field Programmable Gate Array (FPGA) board [12]. The DE2 board houses an Altera Cyclone-II 2C35 FPGA. Benchmark circuit s298 was programmed on this FPGA. Figure 4.4 shows the test setup for the power and maximum test frequency measurements. The DE2 board was powered through the variable power supply available on NI ELVIS. For the s298 circuit, all inputs and outputs, including scan-in and scan enable, were con gured as external pins of the DE2 board. These pins were in turn connected to the programmable digital Input/Output (IO) pins available on NI ELVIS. The test program was written in LabVIEW [1] on a PC, and the test patterns were sent to NI ELVIS through a Universal Serial Bus (USB) connection. Stored test patterns were then applied to the circuit under test (in our case the DE2 board) from NI ELVIS, and the response was captured and compared for every test vector. 50 Figure 4.4: Test setup for measuring peak power per cycle and maximum test frequency for an Altera DE2 FPGA board (with all its peripherals) using the NI ELVIS II+ bench-top prototyping board. 4.5.2 Peak Power and Frequency Measurements Figure 4.5 shows the peak power per cycle and maximum test frequency, plotted as a function of the supply voltage. As the DE2 board comprises a number of peripherals, like the seven-segment display, several LEDs, several di erent IO drivers, etc., the absolute power numbers measured from the supply voltage and current product will be dominated by these peripheral components rather than the actual circuitry on the FPGA. We, therefore, corrected the measured supply-power by removing the steady state power component in each cycle. The remaining power component, which is the switching (or dynamic) power, is presumably dominated by CMOS circuitry on the FPGA. The dynamic power curve is shown in blue with circular markers at the measured voltage points on the graph in Figure 4.5. We found that the peak dynamic power per cycle increases as a square of the supply voltage in the range of 1.8V{5.4V, well in agreement with theory. For supply voltages below 1.8V, even very 51 Figure 4.5: Measured values of maximum power consumed per cycle (in blue) and maximum test frequency (in green) plotted as a function of the supply voltage for the Altera DE2 FPGA board tested using NI ELVIS II+ bench-top prototyping board. Switching power is dominated by the CMOS circuitry contained on the board. The FPGA itself is programmed with the function of s298 benchmark with scan. low test frequencies result in erroneous outputs, which is plausible since the nominal voltage speci ed for the board is 3.3V, and one or more of the IO drivers may not be operational at voltages below 1.8V. Even though the commonly used nominal supply voltage for CMOS logic circuits at the 90nm technology node is about 1.2V, we could only control the supply to the DE2 FPGA board in the range 1.8V to 5.4V. Because these tests are destined for s298 circuit implemented on the FPGA chip and were applied through edge connectors and other logic on the board, the whole process ran essentially like a board test rather than a chip test. The maximum test frequency, in practice, is limited by the structural critical path delay of the circuit; however, in the current setup, it is limited by the speed of the IO drivers on the FPGA board and the maximum allowable sampling frequency of NI ELVIS. The maximum test frequency at each supply voltage also corresponds to frequency at which maximum power per cycle is dissipated. This curve is shown in green with diamond markers at the 52 measured voltage points in Figure 4.5. The maximum operating frequency at each supply voltage step was found by starting at an initial frequency and increasing it until the point where the circuit output no longer matches the ideal output. The highest frequency at which the circuit output matches the ideal output is taken as the peak operating frequency. 4.5.3 Minimizing Test Time for Given Peak Power Limit For a circuit under test with a given peak power limit, PMAXfunc, the experimental data of Figure 4.5 readily gives both the supply voltage VDDopt and test frequency fopt that minimize the test time of the power constrained test. This is done by using the two observations made following equation (4.9). For example, suppose we have a peak power limit PMAXfunc = 0:5mW. At the nominal supply voltage of 3.3V, the test power dissipation is 1.428mW and maximum structural clock frequency is 16.4 kHz. To keep the test power under 0.5mW, the test must be run at 16:4 0:5=1:428 = 5:74kHz. From Figure 4.5, for PMAXtest = PMAXfunc = 0:5mW, we should lower VDD to VDDopt = 2:5V, which gives a test frequency of fopt = 12:5kHz. Thus, test time will be reduced by a factor 5:74=12:5 = 0:46. 53 Chapter 5 Dynamic Scaling of Test Clock Period 5.1 Aperiodic Clock Test In Chapter 4 it was proposed that, for a periodic clock test where the test clock period is held constant throughout the test, the test time can be reduced by choosing an optimum supply voltage that is lower than the nominal supply voltage. While using a periodic clock test, at any voltage the dynamic power dissipation for a given test pattern set is not constant and randomly varies throughout the test, based on the switching activity caused by the test patterns during each cycle. If every cycle in a periodic clock test can be modi ed such that the power remains constant and within the allowed limit for the entire test by choosing unique clock periods, then it will be possible to reduce test time signi cantly. This method of testing is termed as the aperiodicclock test, where the period of each cycle can be di erent from the period of its neighboring cycle. This was brie y described in Section 3.2.2. In an aperiodic clock test, it is possible for every clock cycle in a test to be either structure constrained or power constrained. The test clock period of an aperiodic test can be given by, Ti = maxfTstructure; EiP PEAKfunc g (5.1) For the stuck-at fault tests, the capture cycle will also be at the same clock period as the scan shift cycles; hence the period for the capture cycle can also be reduced based on the power dissipated during that cycle. When considering delay testing, based on whether we use single clock capture (Launch-on Shift) or double clock capture (Launch-on Capture), test cycle period for the shift cycles can be modi ed and the capture cycles can be left unaltered, since the capture cycles uses functional clock period to identify delay faults. 54 Using equation (5.1) the total test time for an aperiodic clock test can be given as, TTasynch = NX i=1 maxfTstructure; EiP PEAKfunc g (5.2) where TTasynch is the aperiodic test time for a test with N cycles. 5.1.1 A Circuit Example We examine the proposed aperiodic clock test using an ISCAS?89 sequential benchmark circuit. For simplicity, let us choose the s298 benchmark circuit that contains 14 ip- ops, 3 primary inputs and 6 primary outputs. The circuit is synthesized using Mentor Graphics Leonardo Spectrum [7] with TSMC 180nm technology. The Spectrum tool also provides the critical path delay via static timing analysis (STA) of the circuit. More accurate critical path delay information can be obtained after the routing of the circuit with inserted scan chains. Statistical static timing analysis (SSTA) can also be used to consider process variations during delay calculations [9,28]. All ip- ops in the circuit were daisy chained to form a single scan chain, using Mentor Graphics DFT Advisor. Once the scan chain was inserted, a set of deterministic ATPG test vector patterns for stuck-at faults were generated using Mentor Graphics Tessent Fastscan [5]. A transistor level simulation was performed using Synopsys Nanosim [4] at the nominal voltage of 1.8V. The transistor level description of the netlist was generated using Mentor Graphics Design Architect and the SPICE le was imported into Nanosim. Using Nanosim the energy dissipated per cycle during the entire test was measured. Based on the report obtained through transistor level simulation, we determined the test period for each cycle. For each cycle the test period would be constrained both by structure, as given by STA, and by maximum rated power. The maximum rated power depends on the functional characteristics, physical design, packaging, etc., and is part of the speci cation of the circuit. In the absence of available data, for our analysis we measured the maximum power in functional mode through simulation of 1,000 random vectors, which 55 Figure 5.1: Periodic and aperiodic clock simulation of 450-cycle scan test of ISCAS?89 bench- mark circuit s298. Periodic test clock frequency is 240MHz and test time is 1.87 s. Aperiodic clock test time is 1.31 s. was 1.23mW. Once the time period for each cycle was obtained the circuit was simulated again to calculate the power dissipated during each test cycle. Figure 5.1 shows the simulation results for the s298 benchmark circuit. The plot com- pares the test performed using periodic ( xed) and aperiodic (varying) clock periods. The x-axis shows time as test was run and the y-axis shows the power dissipated during each test cycle. As observed from the gure, when a periodic clock is used the power dissipated during each cycle does not reach the maximum rated power at most cycles. Hence the test clock periods for cycles dissipating less power can be safely reduced until the cycle power is close to the rated power. This e ect is seen in the simulation results using the aperiodic clock. When the particular cycle dissipates low power, the period is reduced such that the power for that cycle increases to a value closer to the rated power. However, while trying to 56 do so if the period becomes shorter than the critical path delay then the period is set to the value of the critical path delay. Thus, we ensure that the power constrained period is small without violating any timing constraints. This limitation on the minimum period will force the circuit to dissipate signi cantly less than the rated power and hence the \dips" in the aperiodic clock plot. In this example, the total test time with the periodic clock was 1:87 s and the test time with the aperiodic clock was 1:31 s. This represents a reduction of 30%. Greater reduction is achievable if the average power of the entire test is signi cantly lower than the maximum power. Thermal analysis [70] and characterization of test power can be performed to determine a safe operating point for testing and the test can be modi ed appropriately, such that if the thermal issues are a concern, the method can be used at the safe operating point. 5.1.2 Simulation Results Table 5.1 shows the simulation results for several ISCAS?89 benchmark circuits using the procedure described in Section 5.1.1. All circuits were synthesized using TSMC 180nm technology. The nominal supply voltage for this technology is 1.8V. For s298 three di erent sets of test patterns were used for each circuit to observe the e ect of test power while reducing test time. This is discussed later in this section. Column 2 of Table 5.1 shows the number of scan test clock cycles used for each circuit. This is determined by the number of ip- ops in the scan chain and the total number of vectors, along with one cycle per vector for capture. Since vectors were generated for stuck at faults, only one capture cycle is used for response capture at the end of each test. The maximum rated power (PPEAKfunc) shown in column 3 is normally given in the circuit datasheet. However, for these benchmark circuits we obtained a value by simulating the DUT in functional mode at its fastest frequency for 100 random vectors. In some cases the power value thus obtained might be closer to the power calculated during test, but employing an 57 Table 5.1: Scan test time for ISCAS?89 circuits in TSMC 180nm technology. Total Per cycle Max per Total Periodic Asynch- Circuit scan peak cycle energy clock ronous Test name test power Energy of test Freq- Test clock test time clock PPEAKfunc EMAX(test) ETOTAL uency time time reduction cycles (W) (pJ) (nJ) (MHz) ( s) ( s) % s298 450 0.0012 5.71 1.83 216.34 2.08 1.48 28.5 s298 498 0.0012 6.61 1.83 187.2 2.66 1.49 44.0 s298 540 0.0012 6.68 1.89 184.9 2.92 1.54 47.2 s382 704 0.0029 9.96 4.69 100 2.41 1.62 32.7 s713 810 0.0015 10.9 4.22 137.28 5.9 2.83 52.0 s1423 6975 0.0045 33.14 166.68 135.96 51.3 39.8 22.4 s1423 6975 0.0030 33.14 166.68 90.58 77 55.6 28.0 s13207 62237 0.0213 126.3 660 168.66 369 312.2 15.0 s15850 101707 0.0240 213.8 2610 67.35 1510 1088.0 27.8 s38584 224112 0.0350 609.8 9470 88.54 2531 2101.5 17.0 aperiodic clock to reduce test time can still be shown. Column 4 shows the maximum energy (EMAXtest) dissipated due to signal transitions in the clock cycle that consumes the most energy. Column 5 shows the total energy (ETOTAL) consumed by the entire test. These were obtained by simulation as discussed in Section 5.1.1. For the s1423 an additional experiment was performed with a value of 0.030mW, which is less than the simulated peak value, to observe the e ect of power on test time reduction without a change in the energy dissipation. Columns 6 and 7 give the test frequency and test time for the periodic clock test. The periodic clock period TPOWER is obtained from equation (3.11), using the data from columns 3 and 4. The test frequency in column 6 is 1T POWER . The total test time for the periodic clock, in column 7, is calculated using equation (3.15). Column 8 shows the total test time taken when an aperiodic clock is used and the corresponding test time reduction over that of column 7 for periodic clock is given in column 9. An interesting observation here is that aperiodic to periodic test time ratio for power constrained testing is the ratio of average energy to the maximum energy per cycle. For example, consider s298 in the rst row of Table 5.1. Average energy per cycle is EAVG = 1:83nJ=498 = 4:067pJ. The ratio EAVG=EMAX(test) = 4:067=5:71 = 0:71 is about the 58 same as the test time ratio 1:48=2:08. In cases where a signi cant number of clock cycles are structure constrained the test time ratio may move toward unity. If every cycle consumes signi cantly low energy compared to a few cycles that consume very high energy, then it is possible to achieve a large reduction in test time. This is because, based on equation (5.1) all low energy cycles will only be limited by the critical path delay and only those cycles that consumes high energy will run at the slowest clock period. On the other hand, if the number of cycles consuming very high energy is signi cantly larger than the number of cycles consuming low energy then the reduction in test time will be less. This e ect was examined for the s298 circuit in Table 5.1. Using alternative sets of vectors with one test pattern having high energy consuming cycles and the rest of the patterns consuming low energy, the test time reduction improved from 28% to 47% for s298. Once again, since the test patterns generated for periodic test was able to nd all faults in aperiodic test, the defect and fault coverage for stuck-at fault test should be the same as in periodic test. 5.1.3 Test Programming on ATE at Nominal Voltage Experimental Setup The aperiodic clock technique was experimentally veri ed on the Advantest T2000GS ATE at Auburn University. The ATE can be operated at a maximum speed of 250MHz and has 128 bi-directional tester channels. The power supply to the DUT is provided by the ATE through a digital power supply module DPS500mA, which has a power supply range of 2 to 8V and a output current range up to 500mA. The test plan is programmed using the native Open architecture Test system Programming Language, in short OTPL. Provisions to place a chip on the tester head are available. For our experiments with benchmark circuits, we used a Xilinx Spartan 3 FPGA XC3S50 soldered on a printed circuit board. The DUT used for our experiment was the s298 benchmark circuit with daisy chained mux-type scan 59 ip- ops con gured on the FPGA. The FPGA is con gured on the run by the ATE using the bit le generated by the Xilinx ISE tool [39]. The ATE has a frame processor and a pattern generator, which are synchronized with the rate generator. The rate generator generates a xed rate clock pulse and triggers the pattern at the start of each pulse. Based on the waveform set by the frame processor and the corresponding pattern value, the pattern is applied to the DUT mounted on the tester head. The test plan for the FPGA consists of three steps. The rst two steps account for the con guration of the FPGA using the ATE. In the rst step, the FPGA is powered by the ATE with a supply voltage of 2.5V and the con guration memory is cleared during this process. The second step downloads the bit le generated by Xilinx ISE using a slave serial mode. In this mode, the con guration data is provided through the DIN input pin of the FPGA and clocked externally using the ATE. A successful con guration of the FPGA is indicated by a High output value on the DONE pin. The third step performs the functional test on the DUT now con gured on the FPGA. External Test The clock period required for the scan-based functional test is determined prior to the external testing. Certain limitations of the tester framework set only allow 4 unique clock periods can be provided for each test ow this limits the granularity in its variations. Hence, the periods for each test cycle are obtained through simulations and split into four groups. The latency of the analog measurement modules is included in the selected period. The longest cycle period corresponds to the pulse width determined by the cycle during which we achieve maximum switching. The shortest period corresponds to the lowest test period using which we achieve signi cant reduction in test time. Each test cycle is assigned to a period that is closer to, but not less than, the required period for that cycle. Based on the periods obtained earlier, synchronization with the rate generator is con- trolled by specifying the periods in the test program using a timing block. The timing block 60 has information about the rates at which the pattern should be applied at each input and the behavior of the signal at each pin corresponding to the value in the pattern le. Since the patterns are applied at the start of each period, the pulse provided by the rate generator is not used as a clock to the scan circuit of the DUT, but instead it is used to synchronize the pattern generation. The clock pin is considered as an input pin and the duty cycle is set to 50% of the period set by the rate generator. This way we avoid any race conditions caused during the application of the inputs at the start of the each period of the rate generator. The pattern for each cycle contains the signal value needed at each input pin and the response to be observed at each output pin. The period for each cycle is speci ed by mapping the cycle with the waveform information that is uniquely de ned to match one period. ATE Results The proposed method was applied by the ATE to the s298 benchmark circuit con g- ured on the Xilinx. We simulated 36 deterministic combinational ATPG patterns used for simulation of the s298 circuit in Table 5.1 row 3. The cycle times required for each period were obtained through a perl script based on the energy consumption per cycle reported by Nanosim [4]. Though in this work the energy is obtained using NanoSim, power can be calculated per cycle during the actual test on the ATE by implementing a microcontroller on the test head. The minimum clock period that was used with the DUT was 100ns. For clarity of our experiment, the clock periods obtained through simulation were multiplied by 100. Four unique clock periods were then obtained such that we achieve signi cant reduction in test time. Figure 5.2 shows the test clock periods on the y-axis for each corresponding test cycle on the x-axis. The horizontal broken (red) lines show the four unique test cycle periods. A test cycle will use the test clock in the dotted line just above the period as shown in Figure 5.2. For the periodic clock test the maximum period in Figure 5.2 will be used as the xed clock period. 61 Figure 5.2: Aperiodic clock for 540-cycle scan test of s298 for a power budget of 1.23mW. Horizontal broken lines indicate four test clock periods available from the T2000GS ATE. Period used for a test cycle was the nearest higher ATE clock period. The four clock periods used in this experiment were determined from a visual inspection of the plot in Figure 5.2 and are not optimal. It is possible to algorithmically nd the best clock periods for any given number of periods that an ATE may support [30]. The waveforms for the ATE tests are shown in Figures 5.3 and 5.4, as viewed in the logic analyzer of the Advantest T2000GS system. The two gures have the same time scale. Figure 5.3 shows 33 cycles (13 to 46) which account for two scan sequences of the periodic clock test using a 500ns clock. The cycle number is indicated in the rst row, followed by the period for each cycle as indicated above the rst waveform. The labels on the left side of each waveform correspond to scan out, scan in, scan enable, three primary inputs and clock pins. The value expected at the scan out signals are indicated by X, L or H, at the beginning of each period and the strobe instants at which the output response is veri ed are indicated by downward/upward triangles, placed at the end of each period. The strobe points are located such that there is enough time for the signal to settle after a clock pulse is applied. The input waveforms are indicated along with the pattern that is applied at 62 Figure 5.3: Periodic clock: ATE result for 540-cycle scan test of s298 benchmark circuit. Waveform shows 33 test cycles (cycles 13 through 46) of 500ns clock. Signals shown are scan- out, scan-in, scan enable, three primary outputs and clock. Green triangles under scan-out waveform are matching strobes. the start of that period. A ?1? pattern for the clock during each period indicates that the clock is turned on and based on the 50% duty cycle for the clock during that period, the corresponding waveform is generated by the frame processor. The test pattern used in this experiment is a deterministic test pattern generated by Fastscan ATPG for stuck-at faults having lower power cycles than high power cycles. For the periodic clock test of Figure 5.3, which used a xed clock period of 500ns for the entire test, the total time for 540 cycles was 270 s. Figure 5.4 shows the ATE waveforms using an aperiodic clock with periods, 500, 410, 300 and 200 ns, as selected for each cycle based on the corresponding activity it produces in the DUT. The test clock period is determined from Figure 5.2. Thus, the peak activity in the DUT is the same for both periodic and aperiodic clock tests. Both Figures 5.3 and 5.4 show the waveforms for a time interval of 16:5 s. Because the aperiodic clock test runs at varying clock period, more cycles are run in this time. Hence, in Figure 5.4 we observe 58 cycles (13 to 71) within the same time frame of 16:5 sec as 33 cycles (13 to 46) for the periodic clock test. The total test time for 540 cycles is now 157:7 s, which corresponds to a reduction of 38% over the periodic clock test time. 63 Figure 5.4: Aperiodic clock test: ATE result for 540-cycle scan test of s298 benchmark circuit. Waveforms shows 58 test cycles (cycles 13 through 71) taking the same time as taken by 46 cycles of periodic clock test in Figure 5.3. Clock periods used were 200, 300, 410 and 500 ns as shown in Figure 5.2. Signals shown are scan-out, scan-in, scan enable, three primary outputs and clock. Green triangles under scan-out waveform are matching strobes. This test time reduction is dependent on the relative clock schedules between periodic and aperiodic clock tests and hence can be compared with 47.2% reduction reported for the 540 cycle test of s298 in Table 5.1. There are two reasons for ATE time saving being lower. Firstly, the granularity of clocks, i.e., four ATE clocks versus individual clock for each vector and secondly, the selection of the four ATE frequencies was ad-hoc and we believe a better selection can improve the test time reduction. 5.2 Optimum Voltage for Aperiodic Clock Test We saw in the previous section that by using an aperiodic clock test, it is possible to reduce test time at nominal voltage. However, is it possible to reduce the test time further? In this section we examine the possibility of an optimum test time for aperiodic clock test at which the test can run the fastest. The reduced voltage approach discussed in chapter 4 can be extended to further reduce the aperiodic clock test time. From equation (4.2), the period for power constrained cycle is proportional to the voltage used for test. Hence the width of the power constrained period can be reduced to improve the test time by reducing the voltage for the test. However, 64 there are a few points to consider when using aperiodic clock tests. First, some cycles in the aperiodic clock test may have already been compressed to the minimum period permitted by the structure constraint and as the supply voltage is reduced for test the critical path delay increases. Thus, as the voltage is reduced, more cycles become structure constrained. With further reduction in power supply, the voltage is further reduced and the test starts to lose its aperiodic clock property and becomes fully periodic. However, at some voltage before the test becomes periodic most of the cycles will be structurally constrained except for a very few cycles that are power constrained. The point at which the test still retains the aperiodic clock property will be the optimum test time for aperiodic clock test. Figure 3.4 illustrates this e ect and is an extension of Figure 3.1. Based on equation (3.17) and (3.18), the aperiodic clock test time is bounded by the slowest frequency of the periodic clock test, limited by the power constraint and the fastest frequency limited only by the structure constraint. Point A represents the optimum voltage for periodic clock test. From point A, if the supply voltage is increased in steps such that V >Vsynch then some cycles start becoming power constrained, so those cycles have to be expanded to accommodate the power dissipation within the rated power. An optimum voltage for the aperiodic clock test, Vasynch, will lie in the region A-B at a point where the structural delay is lower and the test has more structure constrained cycles than power constrained cycles. Analysis of this method is performed on the s298 benchmark circuit. The circuit was synthesized in 180nm technology using Leonardo Spectrum [7] and the scan chains were inserted using Mentor Graphics DFT Advisor. We used deterministic vectors generated by Fastscan [5] for stuck-at faults test, which also include one path delay vector to trigger the critical path of the device. The aperiodic clock test was performed for decreasing power values and the corresponding test times were noted. Figure 5.6 shows the results, plotted with test time on the y-axis and supply voltage on the x-axis. At each voltage we estimate the critical path delay using the alpha power law approximation [50,51]: 65 Figure 5.5: Aperiodic clock test time as a function of supply voltage showing the minimum test time voltage, Vasync. Tstructure = K VDD(V DD VTH) (5.3) A few assumptions were made when solving for the critical path delay, 1. Critical path does not change as voltage is reduced; found valid for small voltage changes. 2. Threshold voltage does not vary. The maximum rated power was found by simulating the circuit at nominal voltage for 1000 random vectors in the functional mode. The resulting maximum power was noted to be 1:23mW. The value for the velocity saturation index was found to be 2 by curve tting the delay of a chain of inverters at di erent voltages with those obtained through simulation. Once the value of alpha is known, the value for K for the s298 benchmark, found using the delay obtained through STA in the section 4.3.1 at nominal voltage. The value for K is found to be 0:85. We can now nd the delay of the critical path at every voltage step. 66 Figure 5.6: Minimum periodic and aperiodic clock test times for s298 circuit after selecting suitable supply voltages. The method to calculate the aperiodic clock period at each voltage was based on the ex- planation provided in Section 5.1. At the optimum voltage of Vasync = 1:25V, the correspond- ing minimum aperiodic clock test time using this method is found to be TTasynch = 0:77 s, which is a 71% reduction in test time compared to the test time (TTNominal) of 2:66 s using periodic clock at the nominal voltage of 1.8V, as shown in Figure 5.6. 5.2.1 Simulation Results Table 5.2 shows the optimum voltage for aperiodic clock test obtained through simu- lation. All circuits were synthesized using TSMC 180nm CMOS technology using Mentor Graphics Leonardo Spectrum [7]. The scan chains were inserted and stitched into a single scan using DFT Advisor. Patterns used in this experiment were the same set of patterns used in periodic clock test experiment in chapter 4 and for the aperiodic experiment in 67 Table 5.2: Optimum voltage VDDopt for minimum aperiodic clock scan test time of ISCAS?89 circuits in 180nm CMOS ( = 2, VTH = 0:39V). Proport- Maximum Total Peak Nominal voltage Optimum Test Circuit onality switched scan per (1.8V) test voltage test time name constant capaci- test cycle power Test Test Supply Test reduc- K tance cycles PMAXfunc clock freq. time VDDopt time tion ( 10 9) CL (pF) N (W) (MHz) ( s) (Volts) ( s) (%) s298 0.85 1.76 450 0.0012 216.0 2.08 1.20 0.72 65.4 s298 0.85 2.04 498 0.0012 187.0 2.66 1.25 0.77 71.0 s298 0.85 2.06 540 0.0012 184.9 2.92 1.25 0.81 72.2 s382 2.18 3.07 704 0.0029 292: 2.41 1.59 1.37 43.5 s713 3.31 3.36 810 0.0015 137.28 5.9 1.62 2.49 57.7 s1423 6.38 10.22 6975 0.0045 135.9 51.3 1.82 39.7 26.7 s1423 6.63 10.23 6975 0.0030 90.6 77 1.55 49.5 35.7 s13207 4.64 38.9 62237 0.0213 168 369 1.7 281 23.8 s15850 5.22 109 101707 0.0240 67.35 1510 1.43 702 53.5 s38584 4.03 156.8 224112 0.0350 88.54 2531 1.41 1290 49.0 Table 5.1. The patterns were generated for stuck-at fault model using Mentor Graphics Tessent Fastscan [5]. The critical path delay for the structure constraint was obtained using the static timing analysis (STA) tool of Leonardo Spectrum. The power law model in equation (5.3) was then used to calculate the path delay value at each supply voltage to set the structure constraint for the aperiodic clock period. The critical path is assumed to be constant and hence the initial value for the proportionality constant was obtained at the nominal voltage at the velocity saturation value = 2. Column 1 of Table 5.2 shows the ISCAS?89 benchmark circuits used for this experiment, column 2 gives the proportionality constant used to calculate the delay at each voltage using the delay law model. The maximum allowable power used in these experiments is shown in column 5. The maximum power chosen here is measured by simulating each circuit in normal mode with 1000 random patterns and measuring the maximum power over the entire test. Columns 6 and 7 show the test clock frequency and total test time for a conventional test using a periodic clock period at the nominal voltage of 1.8V. Column 8 shows the optimum test supply voltage for an aperiodic clock test and the corresponding test time at 68 the optimum voltage is given in column 9. Finally the reduction in test time by using an aperiodic clock under reduced supply voltage over the periodic clock at nominal voltage is shown in column 10. The results show signi cant reduction in test time when compared to the methods discussed earlier. Since, the calculated delay is pessimistic, in a practical setting more reduction in test time is possible. Table 5.3: Test times for various methods normalized with respect to that of the conventional method (nominal 1.8V supply and periodic clock). Circuit Nominal Voltage 1.8V Periodic clock Aperiodic clock Name Periodic Aperiodic Optimum Reduction Optimum Reduction clock clock voltage ratio voltage ratio s298 1 0.52 - 0.71 1.07 - 1.1 0.35 - 0.38 1.20 - 1.25 0.27 - 0.34 s382 1 0.67 1.44 0.63 1.59 0.56 s713 1 0.48 1.33 0.55 1.62 0.42 s1423 1 0.72 1.49 0.68 1.55 0.64 s13207 1 0.84 1.62 0.80 1.70 0.76 s15850 1 0.72 1.30 0.52 1.43 0.46 s38584 1 0.83 1.30 0.51 1.41 0.49 Table 5.3 summarizes the test time reduction using the proposed methods, normalized with the conventional method using a periodic clock at the nominal voltage of 1.8V for CMOS 180nm technology. Summarizing the results, it is evident that for most of the circuits the test time is reduced signi cantly (> 30%) by either reducing the power supply or by scaling the frequency. 69 Chapter 6 Discussion 6.1 Adapting to At-Speed Testing Timing related defects are often targeted during at-speed testing. This necessitates the need for clock pulses generated at functional speeds. Due to high costs of automatic test equipment (ATE) and the growing clock speeds, implementing at-speed testing using ATEs will not be economical [15], due to this reason at-speed clocks during capture cycles are generated internally using phase lock loops (PLL) that generates clock pulses with xed frequency that can be a multiple of a reference slow clock [38]. On chip PLLs also serve another purpose; to eliminate clock skews [48]. In the case of at-speed testing the use of aperiodic clock can be restricted only to improve scan shift timing, since the majority of test time is taken while shifting. Besides, during at-speed the capture cycle (and the launch cycle for launch on capture) already run at functional speed. The slow clock during scan shift can be supplied by the ATE or generated internally using a divide by N counter. In the case where the clock for scan shift is generated within the chip, modi cations can be done to the architecture such that the tester chooses the clock to be used. To describe the feasibility of this procedure let us consider the serdes architecture used in [52], where a deserializer is used to shift patterns at high speed from the tester and the internal scan chain shifts patterns at a slow speed. The shift clock to the internal scan chains is a slow clock with a frequency that is a fourth of the ATE clock. This slow clock could be generated by implementing a simple divide by N counter. With aperiodic clock, the divide by N counter could be designed to produce the required four clocks.. The ATE can then be used to send the bits that sets the value for N and the corresponding output is used as clock. When the pattern is shifted out, the output from the scan chains is 70 fed to the serializer module which shifts out the pattern to the ATE at high speed. Since the patterns from the ATE and to the ATE are shifted at the same speed, there should not be any con ict due to internal aperiodic clock when comparing the output with the expected value. It is to be noted that, while choosing the clock period for the aperiodic test, any additional delay constraints due to long interconnects during scan shift that is considered for timing closure will be added to the structure constraint in the aperiodic clock. Further analysis will be done in future works that explores this feasibility. 6.2 Adapting to Process-Voltage-Temperature Variations As technology scales to small feature sizes intra-die and inter-die variations in the opera- tional parameters of the chips are more pronounced. These variations can be due random or systematic defects due to fabrication process and can often a ect the timing during voltage and temperature changes. Testing the circuit under test (CUT) under di erent process- voltage-temperature (PVT) corners, namely worst, typical, and best cases, identi es the reliability of the chip and its operating range [44]. Such testing normally checks for the functionality of the device at each corners. Designers and process engineers often deter- mine these corners and the corresponding delays are characterized to the chips structural constraints [25, 26]. The CUT is tested at these corners for its functionality and therefore corner testing is more critical during timing based tests. Hence, during corner testing the concern is for the CUT to meet the timing speci cation of the capture cycle. The shift cycle (and launch cycle in case of launch on shift) must include the interconnect delay and the setup and hold time constraints imposed due to these variations. Using the proposed method with the corner testing would account the same constraints when determining the structural constraint for the test clock period and while nding the optimum voltage during scan shift at each of these corners. The voltage can then be switched to the corner voltage during the capture cycles either internally if the chip has a dynamic voltage regulator [24] or externally by the ATE. 71 Further analysis will be performed in future extensions of this work to analyze the e ect of process variations in obtaining the optimum values in aperiodic test and voltage at each corners and explore the feasibility to increase the supply voltage dynamically for capture cycle using ATE. 72 Chapter 7 Conclusion Advanced technologies for CMOS VLSI design for low power applications require power constrained tests that could result in longer test time and high testing costs. New methods are required to reduce test time while conforming to the allowable power. In this work, employing scan-based test, ISCAS?89 benchmark circuits were simulated for the maximum energy dissipated using periodic clock period. Di erent methodologies such as scaling down supply voltage and scaling up the frequency were shown to help reduce test time. Results have shown that a reduction of more than 50% is attainable in some cases. Maximum reduc- tion in test time is observed when the peak energy dissipated by the circuit is signi cantly greater than the average energy dissipated. The feasibility of aperiodic clock test was demon- strated on an ATE. The results presented in this work suggests that a test engineer can have multiple options to reduce test time based on the available hardware and improvements in tester framework and hardware can help reap full bene ts of the proposed methods. It was shown that by reducing the power supply voltage at which a DUT is tested the test time can be reduced. Future investigations should involve the use of the proposed method for delay testing with the focus of nding the correlation of low voltage timing measurements with actual nominal voltage of the delay. E ects of leakage power that occur in advanced technologies will also have to be investigated in the future. We believe that the methods presented here will remain bene cial. The dynamic power considered in this work, was the cycle-average peak power, which also includes the instantaneous peak within a clock cycle. Although smoothing due to suitable design of power grid and decoupling capacitors may justify the use of cycle-average, any possible issues related to power supply noise and coupling e ects will have to be examined in the future. 73 Future analysis would include process variations when nding aperiodic clock periods for a DUT. Such an analysis would be bene cial for the proposed method and we believe that it will not be a limitation in determining optimum clock periods. System on chip (SoC) testing could have severe test time and power problems when multiple cores are tested in parallel. There will be bene ts if core tests are optimized by aperiodic clock periods. However, the distributions of clocks through the test access mechanism (TAM) of SoC and test program details have to be analyzed in the future. Such limitations may not be encountered when using reduced power supply voltage with periodic clock test. Since the netlist used in this work did not have information on wiring delay after place and route, the work presented in this thesis assumes that the critical path delay on the scan path and functional path to be similar. However, the critical path for scan mode may have very little logic but it is not optimized for delay in the physical design, future work will have to examine the issue in depth. The work presented in this thesis uses an ad-hoc method to group clock periods used on the ATE, future work can investigate di erent non-linear algorithms that could provide the optimum values for the clock periods. The test programming used in the Advantest T2000GS ATE is not a conventional one and the method have to be adapted to the STIL format [3, 40], which is more commonly used in the industry. Some circuits include a dual voltage logic that can operate at di erent voltage levels. With that in mind, a dynamic voltage mechanism can be devised in the same way as the aperiodic clock, where the voltage for each cycle will be determined by the amount of energy dissipated during that cycle. At the cost of some added complexity to the test set up the added degree of freedom will further reduce the test time. It is suggested that such a scheme should be studied in the future. The clock signal during test may be generated in several possible ways. The clock may be supplied directly by the ATE. Alternatively, it may be generated by a phase lock loop (PLL) circuit on the device under test (DUT) with a synchronizing signal provided from the ATE. In the rst case the control of the clock frequency would be done by the test program. 74 However, the manipulation of test clock frequency would be more complex in case of test compression where the test data transmission rate between ATE and DUT would di er from the internal clock on DUT. Such schemes require serious study. Another important issue is the distribution of clock on DUT. Typically, a clock tree is designed to correctly transmit the functional clock or any xed rate slower clock. However, faster clock as may be used for reduced voltage periodic test or variable clock rates used for aperiodic test would require a careful study of the time constant of the clock distribution network. 75 Bibliography [1] \LabVIEW System Design Software, National Instruments." http://www.ni.com/labview/ (accessed Oct. 26, 2012). [2] \NI ELVIS: Educational Design and Prototyping Platform, National Instruments." http://www.ni.com/nielvis/ (accessed Oct. 26, 2012). [3] \IEEE Standard Test Interface Language (STIL) for Digital Test Vector Data," IEEE Std 1450-1999, pp. i{, 1999. [4] Nanosim User Guide. Synopsys, San Jose, CA, 2008. [5] ATPG and Failure Diagnosis Tools. Mentor Graphics Corp., Wilsonville, OR, 2009. [6] \NVIDIA Takes Charge for Faulty Graphics," Aug. 2009. Computerworld, IDG News Service, http://www.computerworld.com. [7] Leonardo Spectrum User Guide. Mentor Graphics Corp, Wilsonville, OR, 2011. [8] DFTMAX Compression User Guide. Synopsys Incorporated, Mountain View, CA, 2013. [9] A. Agarwal, D. Blaauw, and V. Zolotov, \Statistical Timing Analysis for Intra-Die Process Variations With Spatial Correlations," in Proc. International Conf. Computer Aided Design, 2003, pp. 900{907. [10] V. D. Agrawal, \Pre-Computed Asynchronous Scan (Invited Talk)," in 13th IEEE Latin Amer- ican Test Workshop, Quito, Ecuador, Apr. 2012. [11] V. D. Agrawal, \Reduced Voltage Test Can be Faster," in Proc. International Test Conf., Nov. 2012. Elevator Talk. [12] Altera, \DE2 Development and Education Board." http://www.altera.com/education/univ/ materials/boards/de2/unv-de2-board.html (accessed Oct. 26, 2012). [13] Y. Bonhomme, T. Yoneda, H. Fujiwara, and P. Girard, \An E cient Scan Tree Design for Test Time Reduction," in Proc. 9th IEEE European Test Symp., 2004, pp. 174{179. [14] K. A. Bowman, B. L. Austin, J. C. Eble, X. Tang, and J. D. Meindl, \A Physical Alpha-Power Law MOSFET Model," IEEE Journal of Solid-State Circuits, vol. 34, no. 10, pp. 1410{1414, Oct. 1999. [15] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Springer, 2000. [16] K. Chakravadhanula, V. Chickermane, D. Pearl, A. Garg, R. Khurana, S. Mukherjee, and P. Nagaraj, \SmartScan - Hierarchical Test Compression for Pin-limited Low Power Designs," in Proc. International Test Conference, 2013, pp. 1{9. Paper 4.2. 76 [17] M. Chalkia and Y. Tsiatouhas, \The Leafs Scan-Chain for Test Application Time and Scan Power Reduction," in Proc. 19th IEEE International Conf. Electronics, Circuits and Systems, Dec. 2012, pp. 749{752. [18] J. T. Y. Chang and E. J. McCluskey, \Detecting Delay Flaws by Very-Low-Voltage Testing," in Proc. International Test Conference, Oct. 1996, pp. 367{376. [19] J. T. Y. Chang and E. J. McCluskey, \Quantitative Analysis of Very-Low-Voltage Testing," in Proc. 14th IEEE VLSI Test Symposium, 1996, pp. 332{337. [20] M. Chloupek, O. Novak, and J. Jenicek, \On Test Time Reduction Using Pattern Overlap- ping, Broadcasting and On-Chip Decompression," in Proc. IEEE 15th International Symp. on Design and Diagnostics of Electronic Circuits Systems (DDECS), Apr. 2012, pp. 300{305. [21] R. M. Chou, K. K. Saluja, and V. D. Agrawal, \Power constraint scheduling of tests," in Proc. 7th International Conference VLSI Design, Jan. 1994, pp. 271{274. [22] R. M. Chou, K. K. Saluja, and V. D. Agrawal, \Scheduling Tests for VLSI Systems Under Power Constraints," IEEE Trans. VLSI Systems, vol. 5, no. 2, pp. 175{185, June 1997. [23] W. Daehn and J. Mucha, \Hardware Test Pattern Generation for Built-In Testing," in Proc. International Test Conf., 1981, pp. 110{120. [24] V. R. Devanathan, C. P. Ravikumar, R. Mehrotra, and V. Kamakoti, \PMScan: A Power- Managed Scan for Simultaneous Reduction of Dynamic and Leakage Power During Scan Test," in Proc. IEEE International Test Conf., Oct. 2007. Paper 13.3. [25] L. G. e Silva, J. Phillips, and L. M. Silveira, \E ective Corner-Based Techniques for Variation- Aware IC Timing Veri cation," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 29, no. 1, pp. 157{162, Jan. 2010. [26] L. G. e Silva, L. M. Silveira, and J. R. Phillips, \E cient Computation of the Worst-Delay Corner," in Proc. Design, Automation Test in Europe Conference Exhibition, 2007. DATE ?07, Apr. 2007, pp. 1{6. [27] A. C. Evans, \Applications of Semiconductor Test Economics, and Multisite Testing to Lower Cost of Test," in Proc. International Test Conference, 1999, pp. 113{123. [28] C. Forzan and D. Pandini, \Why We Need Statistical Static Timing Analysis," in Proc. 25th International Conf. Computer Design, 2007, pp. 91{96. [29] P. Girard, N. Nicolici, and X. Wen, Power Aware Testing and Test Strategies for Low Power Devices. New Jersey: Prentice-Hall, second edition, 2004. [30] S. Gunasekar, Algorithms for Finding Optimum Frequencies for Aperiodic Clock Testing. Mas- ter?s thesis, Auburn University, Auburn, Alabama, USA, May 2014. In preparation. [31] H. Hao and E. J. McCluskey, \Very-Low-Voltage Testing for Weak CMOS Logic ICs," in Proc. International Test Conference, Oct. 1993, pp. 275{284. [32] H. Hashempour, F. J. Meyer, and F. Lombardi, \Test Time Reduction in a Manufacturing Environment by Combining BIST and ATE," in Proc. 17th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, 2002, pp. 186{194. [33] A. Kedia, Design of a Serialized Link for On-chip Global Communication. Master?s thesis, University of British Columbia, Canada, 2006. 77 [34] A. Kedia and R. Saleh, \Power Reduction of On-Chip Serial Links," in Proc. IEEE Interna- tional Symp. Circuits and Systems, 2007, pp. 865{868. [35] R. K. Krishnarnurthy, A. Alvandpour, V. De, and S. Borkar, \High-Performance and Low- Power Challenges for Sub-70 nm Microprocessor Circuits," in Proc. IEEE Custom Integrated Circuits Conference, 2002, pp. 125{128. [36] W.-J. Lai, C.-P. Kung, and C.-S. Lin, \Test Time Reduction in Scan Designed Circuits," in Proc. 4th European Conference on Design Automation, Feb. 1993, pp. 489{493. [37] E. Larsson, Introduction to Advanced System-on-Chip Test Design and Optimization. Springer, 2005. [38] X. Lin, R. Press, J. Rajski, P. Reuter, T. Rinderknecht, B. Swanson, and N. Tamarapalli, \High-frequency, At-Speed Scan Testing," IEEE Design & Test of Computers, vol. 20, no. 5, pp. 17{25, Sept. 2003. [39] P. Mangilipally and V. P. Nelson, \Emulation of Slave Serial Mode to Con gure the Xilinx Spartan 3 XC3S50 FPGA Using Advantest T2000 Tester," Technical report, Auburn Univer- sity, 2011. [40] G. A. Maston, T. R. Taylor, and J. N. Villar, Elements of STIL: Principles and Applications of IEEE Std. 1450. Springer, 2003. [41] J. Moreau, T. Droniou, P. Lebourg, and P. Armagnat, \Running Scan Test on Three Pins: Yes We Can!," in Proc. International Test Conference, 2009, pp. 1{10. Paper 18.1. [42] N. Nicolici and B. M. Al-Hashimi, Power Constrained Testing of VLSI Circuits. Springer, 2002. [43] K. Nose and T. Sakurai, \Optimization of VDD and VTH for Low Power and High-Speed Applications," in Proc. Asia and South Paci c Design Automation Conf., Jan. 2000, pp. 469{ 474. [44] S. Pasricha, Y.-H. Park, F. J. Kurdahi, and N. Dutt, \Incorporating PVT Variations in System-Level Power Exploration of On-Chip Communication Architectures," in Proc. 21st International Conference on VLSI Design, Jan. 2008, pp. 363{370. [45] R. J. Powers, \Throughput Advantages of Asynchronous Prober Control," IEEE Design & Test of Computers, vol. 5, no. 3, pp. 56{63, 1988. [46] X. Qian, C. Han, and A. D. Singh, \Detection of Gate Oxide Defects with Timing Tests at Reduced Power Supply," in Proc. 30th IEEE VLSI Test Symposium, 2012, pp. 120{126. [47] S. Ravi, \Power-Aware Test: Challenges and Solutions," in Proc. International Test Conf., Oct. 2007, pp. 1{10. Lecture 2.2. [48] B. Razavi, Design of Analog CMOS Integrated Circuits. Tata McGraw Hill, 2002. [49] J. L. Roehr, \Very-Low Voltage (VLV) and VLV Ratio (VLVR) Testing for Quality, Reliability, and Outlier Detection," in Proc. International Test Conference, Oct. 2006, pp. 1{6. Paper 31.1. [50] T. Sakurai, \Alpha Power-Law MOS Model," Solid-State Circuits Society Newsletter, vol. 9, pp. 4{5, Oct. 2004. [51] T. Sakurai and A. R. Newton, \Alpha Power-Law MOS Model," IEEE Jounal of Solid State Circuits, vol. 25, pp. 584{593, Oct. 1990. 78 [52] A. Sanghani, B. Yang, K. Natarajan, and C. Liu, \Design and Implementation of a Time- Division Multiplexing Scan Architecture Using Serializer and Deserializer in GPU Chips," in Proc. 29th IEEE VLSI Test Symposium, 2011, pp. 219{224. [53] P. Shanmugasundaram, Test Time Optimization in Scan Circuits. Master?s thesis, Auburn University, Auburn, Alabama, USA, Dec. 2010. [54] P. Shanmugasundaram and V. D. Agrawal, \Dynamic Scan Clock Control for Test Time Reduction Maintaining Peak Power Limit," in Proc. 29th IEEE VLSI Test Symposium, May 2011, pp. 248{253. [55] P. Shanmugasundaram and V. D. Agrawal, \Dynamic Scan Clock Control in BIST Circuits," in Proc. 43rd IEEE Southeastern Symp. System Theory, Mar. 2011, pp. 237{242. [56] P. Shanmugasundaram and V. D. Agrawal, \Externally Tested Scan Circuit with Built-In Activity Monitor and Adaptive Test Clock," in Proc. 25th International Conf. VLSI Design, Jan. 2012, pp. 448{453. [57] V. Sheshadri, V. D. Agrawal, and P. Agrawal, \Optimal Power-Constrained SoC Test Sched- ules With Customizable Clock Rates," in Proc. IEEE International SOC Conf. (SOCC), Sept. 2012, pp. 271{276. [58] V. Sheshadri, V. D. Agrawal, and P. Agrawal, \Optimum Test Schedule for SoC with Speci ed Clock Frequencies and Supply Voltages," in Proc. 26th International Conf. VLSI Design, Jan. 2013, pp. 267{272. [59] V. Sheshadri, V. D. Agrawal, and P. Agrawal, \Power-Aware SoC Test Optimization Through Dynamic Voltage and Frequency Scaling," in Proc. 21st IFIP/IEEE International Conf. Very Large Scale Integration (VLSI-SoC), (Istanbul, Turkey), Oct. 2013, pp. 105{110. [60] C. E. Stroud, Designers Guide to Built-in Self Test. Springer, 2002. [61] F. N. Taher, A Low-Power Analog Bus for On-Chip Communication. Master?s thesis, Auburn University, Auburn, Alabama, USA, June 2013. [62] L. Van Eck, \IMA: Cost E ective Testing in the New Era," Nov. 2010. http://www.ltxc.com. [63] P. Venkataramani and V. D. Agrawal, \Reducing ATE Time for Power Constrained Scan Test by Asynchronous Clocking," in Proc. International Test Conf., Nov. 2012. Poster P13. [64] P. Venkataramani and V. D. Agrawal, \Test-Time Reduction in ATE Using Asynchronous Clocking," in Proc. 6th IEEE International Workshop on Design for Manufacturability and Yield, June 2012. Poster. [65] P. Venkataramani and V. D. Agrawal, \ATE Test Time Reduction Using Asynchronous Clock Period," in Proc. International Test Conference, Sept. 2013. Paper 15.3. [66] P. Venkataramani and V. D. Agrawal, \Reducing Test Time of Power Constrained Test by Optimal Selection of Supply Voltage," in Proc. 26th International Conf. VLSI Design, Jan. 2013, pp. 273{278. [67] P. Venkataramani, S. Sindia, and V. D. Agrawal, \A Test Time Theorem and Its Applications," in Proc. 14th IEEE Latin AmericanTest Workshop (LATW), 2013, pp. 1{5. [68] P. Venkataramani, S. Sindia, and V. D. Agrawal, \Finding Best Voltage and Frequency to Shorten Power-Constrained Test Time," in Proc. 31st IEEE VLSI Test Symposium, 2013, pp. 19{24. 79 [69] B. Yang, A. Sanghani, S. Sarangi, and C. Liu, \A Clock-Gating Based Capture Power Droop Reduction Methodology for At-Speed Scan Testing," in Proc. Design, Automation Test in Europe Conf. and Exhibition, 2011, pp. 1{7. [70] C. Yao, K. K. Saluja, and P. Ramanathan, \Thermal-Aware Test Scheduling Using On-chip Temperature Sensors," in Proc. 24th International Conf. VLSI Design, Jan. 2011, pp. 376{381. 80