Practically Realizing Random Access Scan
Except where reference is made to the work of others, the work described in this thesis is
my own or was done in collaboration with my advisory committee. This thesis does not
include proprietary or classified information.
Anandshankar S. Mudlapur
Certificate of Approval:
Adit D. Singh
James B. Davis Professor
Electrical and Computer Engineering
Vishwani D. Agrawal, Chair
James J. Danaher Professor
Electrical and Computer Engineering
Victor P. Nelson
Professor
Electrical and Computer Engineering
Stephen L. McFarland
Acting Dean
Graduate School
Practically Realizing Random Access Scan
Anandshankar S. Mudlapur
A Thesis
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulfillment of the
Requirements for the
Degree of
Master of Science
Auburn, Alabama
May 11, 2006
Practically Realizing Random Access Scan
Anandshankar S. Mudlapur
Permission is granted to Auburn University to make copies of this thesis at its discretion,
upon the request of individuals or institutions and at their expense. The author reserves
all publication rights.
Signature of Author
Date of Graduation
iii
Vita
Anandshankar S. Mudlapur, son of Mrs. Durga Shivakumar and Mr. M. A. Shivaku-
mar, was born in Bangalore, Karnataka, India. He graduated from Kendriya Vidyalaya
NAL Bangalore in 1999. He earned the degree Bachelor of Engineering in Electronics and
Communications from Bangalore Institute of Technology affiliated to Visvesvaraya Techno-
logical University, Bangalore, India in 2003.
iv
Thesis Abstract
Practically Realizing Random Access Scan
Anandshankar S. Mudlapur
Master of Science, May 11, 2006,
(B.E., Visvesvaraya Technological University, 2003)
81 Typed Pages
Directed by Vishwani D. Agrawal
The number of clock cycles in a serial scan (SS) test is often prohibitive as the number
of flip-flops (FF) increases. Besides, scan-in and scan-out sequences result in unwanted
circuit activity. This increases the test power enormously. The scan process activates all
flip-flops in the scan chain, although very few flip-flops need to be set for a targeted fault
and only a subset of all the flip-flops needs to be observed. A technique known as Random
Access Scan (RAS) can solve these problems. Here every flip-flop is addressed uniquely.
In RAS, only the required number of flip-flops is set or reset for a given test and
this reduces the set up time of flip-flops significantly. Due to the flexibility of setting
the required flip-flops randomly, the test power drastically reduces to a bare minimum.
Thus two complementary problems are addressed using the single technique of RAS. These
advantages come at a cost of increased area overhead and that is often unacceptable.
In this work, we have addressed the problem in such a way that the implementation is
practical and the additional area overhead is justified. We have developed a new RAS cell,
which minimizes the number of signals otherwise routed to it compared to earlier designs.
This improvement saves silicon area.
v
Another contribution of this work is a new RAS cell without a scan-in signal and an
added toggle feature. This flip-flop toggles its state when addressed and hence any desired
state can be achieved by just addressing it if the current state is known. The scan out
structure is also designed in such a way that when a flip-flop is addressed or toggled, the
existing value of the flip-flop is read out. This is done using a hierarchical bus structure that
drives the data from the addressed flip-flops to a primary output. Considering the limited
drive capability of the flip-flops, the hierarchical bus restricts the load that the addressed
flip-flop must drive.
The flip-flops are addressed using a grid structure controlled by row and column de-
coders. We evaluated different decoding schemes and concluded that the grid scheme re-
quires the least routing overhead. The intersection of selected row and column addresses
lines sets a flip-flop in the scan mode of operation. The address inputs to the decoders are
provided from primary input pins.
Using this design we have shown that the test cycles can be reduced by 60% compared
to a single chain serial scan and the test power saving can be as high as 99% compared to
the serial scan. We also provide an algorithm to further decrease the test cycles.
vi
Acknowledgments
I would like to thank my advisor, Prof. Vishwani Agrawal for his guidance and di-
rection. He has been the major source of inspiration to pursue this work and in life as a
whole. I thank Prof. Adit Singh who motivated and encouraged me to pursue work related
to electronic testing during my first semester of graduate study. I would also like to thank
Prof. Victor Nelson for being on my committee and helping me with the numerous doubts
I might have had during the course of my study. My sincere thanks to my parents without
whose encouragement and numerous sacrifices I wouldn?t be what I am today. The other
people whom I would like to thank are my sister Ambika and my friends Srinath, Sunil,
Gowri, Bikram, Ajay, Abhilash, Vidyadharan, Harish, Arun, Rohit and all my friends in
Auburn University.
My special thanks to Mr. Alok Doshi for having taken a summer off and implement
this work in an industrial circuit in Texas Instrument India Pvt. Ltd.
vii
Style manual or journal used LATEX: A Document Preparation System by Leslie
Lamport (together with the style known as ?aums?).
Computer software used The document preparation package TEX (specifically LATEX)
together with the departmental style-file aums.sty. The images were generated using XFig.
viii
Table of Contents
List of Tables xi
List of Figures xii
1 Introduction 1
1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Contribution of this Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Organization of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 3
2.1 Need for efficient testability . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Basic Concept of Scan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Serial Scan Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Traditional Serial Scan Design Rules . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Limitations of Serial Scan Techniques . . . . . . . . . . . . . . . . . . . . . 9
2.6 Alternate solutions to Serial Scan . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Why Random Access Scan? . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Previous Work on Random Access Scan 14
3.1 Previous and Current Work on Random Access Scan . . . . . . . . . . . . . 14
3.1.1 Ando et al.?s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.1.2 Wagner et al.?s Method . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1.3 Ito et al.?s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1.4 Baik et al.?s Initial Method . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.5 Baik et al.?s Modified Methods . . . . . . . . . . . . . . . . . . . . . 26
3.2 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Toggle Random Access Scan 32
4.1 Toggle flip-flop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Working . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Decoder Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.1 Gate area overhead of RAS . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Algorithm to compact test vectors . . . . . . . . . . . . . . . . . . . . . . . 42
4.6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.7 Modifying ATPG to Further Decrease the Number of Vectors . . . . . . . . 45
ix
4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5 Scan-out Design 48
5.1 Macro-cell design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6 An Experimental Study 52
6.1 Physical Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
6.2 Design and Implementation Issues . . . . . . . . . . . . . . . . . . . . . . . 56
7 Conclusions 58
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.1.1 Delay Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
7.1.2 Random-Pattern BIST using RAS . . . . . . . . . . . . . . . . . . . 59
Bibliography 60
Appendices 64
A Description of the programs used to implement the vector compacting
algorithm 65
B Description of the programs used to calculate the power dissipation
during test 68
x
List of Tables
3.1 Hardware Requirements for ARAS. . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Testability circuit size per chip [32]. . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Example vectors [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 Peak and average switching activity during scan [7]. . . . . . . . . . . . . . 26
3.5 Circuit statistics & test data volume and test application time reduction [7]. 27
4.1 RAS signals. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Gate overhead of RAS vs Serial Scan. . . . . . . . . . . . . . . . . . . . . . 41
4.3 Results of Vector Compaction for various Benchmark Circuits. . . . . . . . 44
4.4 Power estimation based on number of transitions at the inputs for various
Benchmark Circuits. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
A.1 Example vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
xi
List of Figures
2.1 Sequential system. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Shift-register modification (from Williams and Angell 1973; c?1973 IEEE). 6
2.3 Standard D flip-flop (DFF). . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Typical scan flip-flop (SFF). . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.5 A two-clock scan flip-flop. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 A scan design schematic. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.7 BIST process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.1 Input MPX type addressable latch. . . . . . . . . . . . . . . . . . . . . . . . 14
3.2 Set/Reset type addressable latch. . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Random access scan-In/Out network. . . . . . . . . . . . . . . . . . . . . . 16
3.4 Scannable (master) latch as described in [61]. . . . . . . . . . . . . . . . . . 18
3.5 Delay testing between latches as described in [32]. . . . . . . . . . . . . . . 20
3.6 Abstracted structure of RAS. . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.7 RAS scan-in operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.8 Test application using RAS [7]. . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.9 Mux based RAS [7]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.10 RAM-based RAS [4]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.11 Test generation procedure [5]. . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 Toggle random access scan flip-flop. . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Decoder design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
xii
4.3 Design of RAS as described in [7]. . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Decoder built using pass transistors [65]. . . . . . . . . . . . . . . . . . . . . 40
5.1 Macro level description of scan-out structure. . . . . . . . . . . . . . . . . . 49
5.2 Scan-out Macro-cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3 Hierarchical scan-out illustration. . . . . . . . . . . . . . . . . . . . . . . . . 51
A.1 Vector compaction program flow. . . . . . . . . . . . . . . . . . . . . . . . . 66
xiii
Chapter 1
Introduction
It?s a human tendency to seek perfection. But one seldom achieves it in the first at-
tempt. It?s a recursive process to attain perfection. It is no different when it comes to
integrated circuits (IC) where millions of components have to work together. The guaran-
teed working of these components each time necessitates testing them effectively. This is a
dimension in itself, as the complexity of these ICs are humanly unperceivable. Errors may
occur at various steps in the life cycle of an IC. For instance the cause of failure may be
due to a faulty fabrication process, or an incorrect design, or the test itself may not have
been appropriate, or due to invalid specifications, or any other reason which may not be
obvious. Testing can be broadly classified as two types: to verify if the device is faulty, and
diagnosis, which describes what exactly went wrong.
As the complexity of a digital circuit increases, the difficulty to test it increases. Some
factors that are very crucial limiting the test effectiveness are the increased chip clock rates,
increased transistor density and integration of analog and digital devices onto one chip.
A good test must ensure that all the parts of the circuit are working correctly. Testing
is usually assisted by adding extra logic. This field of engineering is called design for
testability. A circuit or a design is modified to incorporate the assistance needed for testing
the remaining circuit faster and more accurately.
The extra logic added has to satisfy certain rules. It?s a compromise between the ease
of testing and the extra area of silicon needed. It is our en devour here to present a design
which can achieve the two complementary objectives simultaneously.
1
1.1 Problem Statement
The problem solved in this thesis is: A design and algorithm to practically implement
Random Access Scan.
1.2 Contribution of this Thesis
We have developed a new flip-flop design known as ?TOGGLE? random access scan flip-
flop to implement random access scan (RAS). This design eliminates the need of two globally
routed wires namely, Scan-in and Test Control from the earlier RAS designs. We have
developed an algorithm to compact the test vectors which is best suited for the ?TOGGLE?
flip-flop.
We have shown that our method reduces the area overhead significantly compared to
other existing RAS designs. An estimate of the increase in area is provided compared to
serial scan. Results of the the vector compaction and test power reduction is illustrated.
Papers describing this work have been accepted at the International Test Conference
(ITC) 2005 [41] and VLSI Design and Test Symposium (VDAT) 2005 [40].
1.3 Organization of Thesis
The organization of the thesis is as follows. In Chapter 2 we discuss the general concepts
of testing in a broader sense and the need for design for testability. In chapter 3, we review
the previous and current work in the area of RAS development in depth. In Chapter 4 we
describe the design and operation of the ?TOGGLE? flip-flop. In chapter 5, we describe
the Scan-out structure. Chapter 6 focuses on the Experimental study performed at Texas
Instruments India Pvt. Ltd. and the last Chapter concludes this work and identifies areas
for future work.
2
Chapter 2
Background
A digital circuit is made up of combinational logic elements called gates and sequential
logic elements called flip-flops. Any digital system can be represented as shown in Figure
2.1. The functioning of combinational logic elements are independent of the past inputs
applied and depend only on the present inputs. On the other hand, sequential elements
store the previous inputs in some form or other along with current inputs.
Hence a digital system has a set of discrete inputs and discrete outputs. Digital testing
is the engineering to assert that the outputs obtained are a correct consequence of the input
values. So testing can be defined as [63] ?A process of evaluating a circuit/system to detect
the presence of hardware failure due to faults and also to locate such faults to facilitate
repair activities?.
Testing involves application of test stimuli, known as vectors or patterns, at the inputs
of a device under test (DUT) and analysis of the corresponding responses to the applied
test by collecting the data observed at the outputs of the DUT. The expected responses are
Sequential
Combinational
Logic
Logic
Figure 2.1: Sequential system.
3
matched against the actual responses from the DUT and once a deviation from the correct
value is observed the device is said to have a fault. This change of actual value compared
to the expected value is said to have detected the fault in the device.
A fault may be due to a physical failure or defect of one or more components in a
digital circuit/system caused by the manufacturing process, extreme operating conditions,
or wear out of the physical components [63].
2.1 Need for efficient testability
Testing is usually integrated with the design process. It is one of the dominating costs
in an IC design process [10] amounting to 30% or more of the total cost. Hence testing is
of utmost importance in the design process. The objective is to minimize time and money
associated with testing.
A combinational circuit is easier to test compared to a sequential circuit. All the
output states in a combinational circuit are directly controlled by the stimuli at the inputs
of the circuit. In the case of a sequential circuit, the output depends on the states of the
internal sequential elements too. Controlling all the states of the sequential elements is very
intractable. Hence setting the circuit to any given state requires a greater effort compared
to a combinational circuit.
Circuits typically are required to have a test coverage of nearly 100% before they are
shipped. This guarantee depends on the fault models used and how exactly they represent
the various manufacturing defects. To ensure that a circuit passes all tests at an economical
cost a designer may utilize design for testability (DFT).
Electronic systems contain three types of components namely, digital logic, memory
blocks and analog or mixed signal circuits. There are specific DFT techniques of each
type of component such as scan and partial scan for digital logic, built-in self-test (BIST)
4
for memory and digital logic, and boundary-scan and analog test bus, to provide access to
components embedded in a complex system for system level DFT.
2.2 Basic Concept of Scan
The main idea of scan design is to obtain control and observability for flip-flops. A
test mode is added in the scan design when all the flip-flops functionally form one or more
shift registers. These are known as scan registers. The inputs to these scan registers are
coupled with the primary inputs and the outputs of the scan registers are multiplexed with
the primary outputs. This way any flip-flop can be set to a desired value during the test
mode by shifting appropriate values from the primary inputs. Similarly the logic states of
the flip-flops are observed by shifting out the values from the scan registers.
All flip-flops can be set or observed in a time (in terms of clock periods) that equals
the number of flip-flops in the longest scan register. These operations can be performed
simultaneously. When one set of values from the scan registers is read, a new set of values
is shifted in, which relate to the next test to be applied.
The concept of scan for hardware test was first illustrated by Williams et. al. [66].
The design is shown in Figure 2.2. In the paper the authors indicate the cost feasibility of
Shift-Register Modifications for Synchronous Circuits.
The procedure for testing such circuits is as follows: Switch to shift-register mode and
load the initial state for a test pattern into the flip-flops. Return to the normal-function
mode and apply the test input pattern. Switch to the shift-register mode and shift out the
final state while shifting in the starting state for the next test [23]. This way, one can design
a sequential circuit such that it can be treated as a purely combinational circuit, with the
flip-flop inputs and outputs treated as pseudo primary inputs and pseudo primary outputs,
respectively. There are several variations of scan flip-flop/Latch designs illustrated in [22].
5
Control p
SW
SW SW SW
Combinational Logic Circuit
FF FFFF
Clock
Figure 2.2: Shift-register modification (from Williams and Angell 1973; c?1973 IEEE).
SlaveMaster
inactive clock
active clock
D Q
Q
Clock
Figure 2.3: Standard D flip-flop (DFF).
2.3 Serial Scan Architectures
For a circuit to have scan capability, first the designer uses only a D type flip-flop
(DFF) with one or more clock signals, all of which are controlled from primary inputs. A
typical DFF is shown in Figure 2.3. Once the circuit is functionally verified, the DFFs are
replaced by scan flip-flops (SFF). One typical SFF is shown in 2.4. Here a multiplexer and
two new signals, scan-data SD and test control TC, are added to the DFF. The original
data input D is stored in the flip-flop when TC is 1 and SD is stored when TC is 0.
Another popular design style, called level-sensitive scan design LSSD, uses two non-
overlapping clock signals. Figure 2.5 shows a scan flip-flop with two function clocks, MCK
and SCK. When MCK is high, data D is latched in the master latch. When SCK is high,
the state of the master latch is copied to the slave latch. For a proper operation of a general
6
active clock
inactive clock
Input from Combinational logic
Scan?in input
Test Control input
MUX MASTER SLAVE
Q
Q
CP
D
Figure 2.4: Typical scan flip-flop (SFF).
MASTER SLAVE
Q
Q
MCK
SCK
D
SD
TCK
Figure 2.5: A two-clock scan flip-flop.
sequential circuit, MCK and SCK are never turned high simultaneously. In the scan mode,
MCK is held low and scan data SD is latched in by using clocks TCK and SCK as master
and slave clocks, respectively [22].
The TCK (or TC for the single-clock flip-flop of Figure 2.4) inputs of all scan flip-flops
are supplied by a new primary input. The SD input of one SFF is supplied by another new
primary input SCANIN. All Scan flip-flops are chained by connecting the Q output of one
SFF to the SD input of the next SFF. The Q output of the last SFF in the chain is a new
primary output SCANOUT. The complete design is given in Figure 2.6, with the wiring
added for scan design shown in broken lines. This design has the advantage of reducing the
7
TC
Combinational 
logic
SFF
SFF
SFF
Primary
Inputs
Primary 
Outputs
SCANOUT
SCANIN
Figure 2.6: A scan design schematic.
effort of test generation, especially for the case of full-scan, where all flip-flops are scanned.
A combinational ATPG program (much simpler than sequential ATPG) can produce tests
for all stuck-at faults in the circuit. There are several results presented to enhance and
reduce the test patterns in LSSD [50].
2.4 Traditional Serial Scan Design Rules
A circuit is designed to meet its functional requirements. After the functional correct-
ness of the design is verified, it is modified to include the scan function. In order to be able
to make it scan-testable, the designer must adhere to certain rules during the functional
design. In general, these rules depend upon the specific design environment, which may
dictate choices such as single versus multiple clocks, etc. The following four rules, however,
are found to be useful:
R-1: Only D-type master-slave flip-flops should be used. This rule prohibits the use of other
types of flip-flops (JK, toggle, etc.) or other forms of asynchronous logic (unclocked
RS latches, combinational feedback elements.)
8
R-2: At least one primary input pin must be available for test. In general, flip-flops can
be connected as multiple scan registers, each of which will require a scan-in and a
scan-out terminal. If extra pins are not available, then any normal primary input
can be used as scan-in and any primary output pin can be multiplexed as scan-out.
This is illustrated in Figure 2.2 where Control P is the only pin added. One ordinary
primary input pin serves as SCANIN and a primary output pin is multiplexed with
SCANOUT.
R-3: All flip-flop clocks must be controllable from primary inputs. This rule is necessary
for flip-flops to function as a scan register. Some violations of this rule, if they exist,
can be removed by a simple work-around.
R-4: Clocks must not feed data inputs of flip-flops. A violation of this rule can potentially
lead to a race condition in the normal mode. Thus the value captured in the flip-flop
cannot be guaranteed to be the state of the signal produced by the combinational
logic. In scan design, flip-flops play a dual role. They capture combinational data in
the normal mode and then carry the data out for observation in the scan mode. The
test procedure relies on the flip-flop correctly capturing data in the normal mode and
hence no race condition is permitted.
2.5 Limitations of Serial Scan Techniques
As the number of flip-flops increase in a circuit, the setup time increases proportionally
in serial scan. The values need to be serially scanned in and this process takes a very
large time. This problem can be overcome if multiple scan chains are used but there are
other constraints that come into picture; for example, the number of tester pins need to be
minimized as much as possible. Hence a trade off has to be made with the usable number
9
TEST
ROM
Hardware
Pattern
Generator
Input
MUX
Test
Controller
Circuit?Under?Test
(with optional
modifications)
Primary
Outputs
P
Primary
Inputs
Compacter
Response
Output
Signature
Comparator
Good/Faulty
Reference
Signature
Figure 2.7: BIST process.
of tester pins. The test data volume that has to be stored in the tester is a general point of
concern.
Another disadvantage of scan testing, in general, is that the scan flip-flops increase the
delay through the circuit because of the addition of a multiplexer. The test also has to be
performed at a slow speed in order to have full control over the circuit operation.
Performing delay testing in serial scan circuits is not very straight forward and requires
other modifications. Serial scan induces unnecessary circuit activity during scan-in/scan-
out. The continuous change of states causes constant power dissipation in the combinational
logic. This may cause serious problems to the circuit during test. A functionally working
circuit may be subjected to extreme heat dissipation, resulting in defects in the circuit.
2.6 Alternate solutions to Serial Scan
There are several alternate testing methods apart from serial scan. These include built-
in self-test (BIST), random access scan (RAS), boundary scan and variations of the kind. In
BIST, a linear feedback shift register (LFSR) is built into the circuit and is used to generate
pseudo exhaustive patterns to test the circuit. The responses are collected in multiple input
signature register (MISR) to verify the correctness of the circuit. The block diagram of the
BIST process is shown in Figure 2.7.
10
In random access scan (RAS), every flip-flop is addressed individually. An address
decoder is used to select a particular flip-flop and the required value is stored in it for
testing. This scheme reduces the test time and test data volume significantly compared to
serial scan. The test power, which is a serious matter of concern, is reduced drastically.
But these advantages are marred since the implementation of RAS has not been practically
feasible. This work is aimed at realizing the objective of RAS with a minimal increase in
hardware overhead. A detailed description of RAS follows in chapter 3.
In order to invoke the BIST procedures and facilitate their correct execution at the
board, module or system level, certain design rules must be applied. In 1990, a new testing
standard was adopted by the Institute of Electrical and Electronics Engineers, Inc., and it is
now defined as the IEEE Standard 1149.1, IEEE Standard Test Access Port and Boundary-
Scan Architecture [47]. The details of it can be found in [37].
2.7 Why Random Access Scan?
Although serial scan enables the application of combinational test generation algorithm,
alternative methods are sought after because of some inherent drawbacks like increased test
time and test power consumption. Several methods have been suggested and implemented
to circumvent this problem. A widely successful method is partial scan [1], but it provides
a trade off between the ease of testing and the costs associated with scan design. The
problem of efficiently selecting the scan registers is still widely open to research. Cross check
methodology [14] provides a comprehensive solution to test sequential circuits and almost
solves all the problems related to test application time and provides massive controllability
and observability.
Power consumption during testing is much higher than during normal circuit operation.
It is important and vital to maintain low power dissipation during testing, since excessive
heat can damage the circuit under test. The long scan-in/scan-out sequences trigger random
11
circuit activity resulting in high power consumption. Test scheduling is a common approach
to avoid the damage of complex devices, such as SOC [16, 17]. As a result, test parallelism
is reduced and testing time eventually increases. It is a well known fact that serial scan
operation may create unacceptably high activity due to frequent transitions in the scan
chain. To circumvent this problem the scan clock is slowed down [12]. This increases the
test application time, which is undesirable.
ATPG based methods have also been used to target the power issue [64]. However, this
method often results in longer test sequences. Compaction of test vectors can reduce the
length of tests, but the compacted vector set generally induces more activity, resulting in
higher power consumption [51]. To overcome this problem, modification of test vectors for
power saving has also been addressed [33]. Another method studied to reduce test power
and/or test application time is modifying the order of scan cells or inserting inversion logic
between scan cells after the test generation [19]. Seth et al., in [9], describea double-tree scan
architecture to reduce test power. Although the power saving is quite significant, the test
time and test data volume either remains the same or more. A modified scan-architecture to
reduce test time in full-scan circuits has been addressed in [28]. They illustrate a reduction
of test time by 50%; nevertheless test power still remains a matter of concern.
Testing for path delay faults in non-scan sequential circuits is complicated by the
limited state transitions during normal operation. An accepted method for overcoming
this difficulty is to use a scan chain consisting of enhanced scan flip-flops which makes the
application of arbitrary vector pairs possible. However this technique requires a hold-latch
connected to each flip-flop in addition to a ?HOLD? signal that must be routed to every
hold latch. This increases the area overhead and also adds some delay in the scan path [10].
Normal-scan sequential circuits can be tested for delay faults, but the vector-pairs must
be specially generated [15]. Here, the first vector V1 is scanned in (usually with a slow
scan clock) and is then replaced in the scan register by either (a) applying V2 which is
12
obtained by a one-bit shift to the scan register also known as scan-shift delay test [52, 53],
or (b) propagating V1 through the combinational logic in the normal mode, where the
state portion of V2 may be justified by V1, known as Functional broad-side delay test [54].
However, high fault coverages are dependent on the circuit and cannot be guaranteed due
to the correlation between the two vectors.
All the problems stated above are due to the underlying architecture used, which is
serial scan! Random Access Scan (RAS) [2] is a single concurrent solution to all of them.
As the name implies, each scan-cell is randomly and uniquely addressable. The architecture
described in [7] targets reduction of both test application time as well as power consumption
simultaneously, which are otherwise complementary objectives. A modified scheme of RAS
has been described in [3], although with a different name. Here, the captured response of
the previous pattern in the flip-flops is used as a template and modified by a circular shift
for the subsequent pattern.
13
Chapter 3
Previous Work on Random Access Scan
The concept of Random Access Scan (RAS) was first proposed by H. Ando [2] in 1980.
Although it was a novel idea, it failed to impress researchers and industry because of the
hardware area overhead associated with it. In this chapter, the literature related to previous
RAS architectures are discussed starting with the earliest reference.
3.1 Previous and Current Work on Random Access Scan
3.1.1 Ando et al.?s Method
Ando [2] proposed an addressable latch shown in Figure 3.1. An addressable latch or
flip-flop is a basic storage element in a random access scan-in/out network, which serves
as a test input holding latch, test output holding latch and part of an output multiplexer
while testing in addition to information storage function in normal operation.
X?ADR
Q
?SDQ
DATA
SDI
?CK
SCK
Y?ADR
Figure 3.1: Input MPX type addressable latch.
14
Y?ADR
DATA
?SDQ
?CK
Q?CL
?PR
X?ADR
Figure 3.2: Set/Reset type addressable latch.
An addressable latch is a latch whose state can be controlled and observed through
scan-in/out lines only when it is selected by some address means. Figure 3.1 is an example of
an addressable latch which selects one of the two inputs and holds it depending upon which
clock is applied. When the latch is selected, the test input on the SDI line is sampled and
held in response to the SCK clock. The state of the latch is observable through the other
address gate. Latch state -SDQ signals from all latches are ANDed together to produce a
chip scan out signal. Another type of addressable latch is shown in Figure 3.2. A Set/Reset
type addressable latch is made of a latch and two address gates, one for gating a preset
signal and the other for gating the output. Any latch or flip-flop with asynchronous preset
and clear can be used. Prior to scan-in operation, all latches have to be cleared with a
common -CL signal. Then the latch is selected with address lines and the PR pulse is
applied to flip latch state.
To complete the scan-in/out network, a pair of address decoder and AND gate trees
to combine all latch state signals is necessary. One address decoder is called the X address
decoder and drives X-ADR lines. The other one is the Y address decoder. An address
latch which locates at the intersection of selected X and Y address lines is accessed like
a memory cell in an array. Any point in combinational circuits can be observed with one
15
C
X ? DECODER
Combinational
Circuit
SDO
Storage 
Addressable
Elements
SDI
SCK
Outputs
Clear & Clocks
Inputs
Scan 
Address
Y
|
D
E
Figure 3.3: Random access scan-In/Out network.
additional gate and one address. The general description of a logic circuit with a scan-
in/out network is shown in Figure 3.3. Ando also suggested that the input pin requirement
could be reduced by adding a scan address register which has serial shift input capability
and count-up capability. The shortcomings or investments for the use of random access
scan network were:
1. Two address gates for each storage element, address decoders and output AND tree.
Those total about 3 to 4 gates overhead per storage element.
2. Scan control, data and address pins are required. This 10 to 20 pins requirement can
be cut down to around 6 pins with a serially loadable address counter.
3. Some logic design limitations are imposed, such as the exclusion of asynchronous latch
operation.
16
3.1.2 Wagner et al.?s Method
Wagner [61]extended the idea of Ando[2], and described theimplementation of Amdahl
Random Access Scan (ARAS) at the (1) latch; (2) chip; (3) board, and (4) system levels.
A study of the (1) cost; (2) functional value, and (3) testability benefits was performed.
The basic latch used in Amdahl was built using 8 transistors and wired-collector logic.
Two gates were added to the basic latch to perform scan-in and scan-out functions shown
in Figure 3.4.
Within a chip each scannable element is assigned a unique scan address consisting og a
plane (PL), column (COL) and row (ROW) number. Element addresses range from PLane
0, COLumn 0, ROW 0 through PLane 3, COLumn 4, ROW 3. An address space of size 62
results since 2 addresses are not used by real elements but are instead dedicated to control.
In order to scan-out this latch, COL = ROW = 0 and DATAOUT appears at SNOP of
the Scan-Out Gate. To scan in, clocks must be inactive (CS = 1,CH = 0), SIPH is pulsed
to 1 to set the latch, inputs SIPL = COL = ROW = 0 must be applied to the Scan-In
Gate.
Every scannable chip in the system has an on-chip scan machine. The machine receives
64 SCANCLKs accompanied by 64 SCAN DATA bits during a scan operation. The first 2
bits are used for control and the last 62 bits for serial data transfer. The scan-out operation
can be performed concurrent with system operation. All SIPL and SIPH element inputs
remain inactive. The SNOP outputs for all elements in each plane are wire-ORed. At any
address, the DATAOUT for the selected element is placed on its Scan-Out PLane with the
guarantee that no other elements are active on this PLane. A final 4-to-1 selection between
the 4 Scan-Out PLanes is performed to extract the correct SCAN DATA bit for output off
chip. The scan-in operation is more complex and cannot be performed with active system
clocks. Initially, all latches with scan-in receive the SIPH pulse, setting them. The ship
17
GATE
DATA IN
SIPH
SIPL
SNOP
DATA OUT
SCAN?OUT GATE
SIPL = Set In?Phase Output (DATA OUT) Low
SIPH = Set In?Phase Output (DATA OUT)  High
SNOP = Scan Out?Phase (SNOP = DATA OUT)
Used as input PLane
Used as output PLane
SCAN?IN
CS
CH
COL
ROW
ROW
COL
Figure 3.4: Scannable (master) latch as described in [61].
scan machine then distributes the remaining 62-bit serial stream that it receives, activating
the SIPL input to a Scan-In inactive. It thus resets only selected latches as directed by
the SCAN DATA input bit pattern.
Scan-in of a latch must be preceded by a full scan-out of its chip, followed by a full
scan-in with the same data, altered only in its one bit location. Each chip is assigned a
unique address within the Multi-Chip Carrier (MCC) where it is located. The scan address
hierarchy consists of
? MCC Number
? Chip Number (16?8)
? On-Chip Element Address (4 PLanes ? 4 COLs ? 4 ROWs)
Cost of ARAS
The cost of Amdahl scan is summarized in Table 3.1. From these results, the author
estimated the ARAS overhead per uniprocessor system to be 19.7% in chip space and 3.4%
18
Table 3.1: Hardware Requirements for ARAS.
LATCH CHIP
MCC SYSTEMMaster Scan
Latch Machine
1. 2 Gates Serial: 1 MCC 1. Console
2. Added 2 pins & Scan Chip Software
inter- 50 gates 2. 1/8 Console
connect OR Processor
Parallel: 3. Interface
9 pins & Chips (2)
18 gates
in chip pins. In conclusion the ARAS method accomplished savings at all levels of system
development.
3.1.3 Ito et al.?s Method
In this work [32], the design described in [2] and two other types of testability circuits
are implemented in the Fujitsu VP-2000 series supercomputer and a detailed study was
made. The method of performing delay testing using RAS is described in this work.
A delay fault can be detected by transmitting a signal transition between latches in a
clock cycle of normal operation [29, 34]. In Figure 3.5 the path from a data-out line of latch
L1 to a data-in line of latch L2 is sensitized, and the paths from a clock primary input to
clock lines of L1 and L2 are sensitized. Then, ?0? is scanned into L1 and L2. Next, required
values are scanned into the related latches and placed on the related primary inputs so that
?1? is set on the data-in line of L1. Under these preparations, a clock is issued to L1 in the
first cycle, and to L2 in the second cycle. Then L2 is scanned out. If its value is ?0?, an
over-delay fault exists between L1 and L2.
Boundary scan was used to enable the chips to be controlled and observed without
direct probing. Scan points with respect to this work were categorized into three types
according to the testing purposes. The first type is a set of scan points which make internal
19
Clock1
Clock2
1
1
1
1
C
O
L1
L2
1
Figure 3.5: Delay testing between latches as described in [32].
latches controllable and observable mainly for the static functional testing. These are
determined based on the result of the adequate testability analysis. The second type is
a set of scan points which suppress clocks to latches for the delay fault testing. These are
determined by adding scan only latches to control the clock enable lines of clock choppers
and latches, which is done after the completion of system logic placement. The third type
is a set of scan points which make chip I/O pins observable for the board testing. Pin
scan-out circuits are added to all the primary inputs and outputs of the chip except scan
address pins and a scan-out pin.
Table 3.2 shows the gate overhead, where SORG is the average number of gates used
per chip before testability circuits were incorporated, SSCANFF is the average number of
scannable latches or flip-flops per chip before testability circuits were incorporated, SDELAY
is the average gate count of the incorporated testability circuit for clock suppression and
SSCAN is the average gate count of the incorporated scan control circuit including the
pin scan-out circuit and reset distribution circuit. The percentage overhead due to the
testability circuits was calculated as per the expression given below:
20
Table 3.2: Testability circuit size per chip [32].
Chip Type SORG SSCANFF SDELAY SSCAN
L15K 9487 621 636 1392
LOGIC + RAM64K bit 1871 25 147 1016
Chip type 3 5341 394 471 1307
Chip type 4 4894 266 380 1177
SDELAY + SSCANFF + SSCAN
SORG + SDELAY + SSCAN (3.1)
3.1.4 Baik et al.?s Initial Method
The concept of RAS was shelved for a very long time until it was reinvestigated in 2004
by Baik et. al. [7]. In this work the authors investigated the potential of RAS architecture
and the feasibility of the same in today?s technology. They have discussed the advantages of
RAS over traditional serial scan. They prove beyond a shadow of doubt that the test power,
test data volume and test application time can be reduced simultaneously in RAS. These
problems have been studied independently and together, in various contexts such as IC test,
microprocessor test, and system-on-a-chip (SOC) test [12, 19, 42, 56]. Due to overheating
of devices such as a SOC during testing, test scheduling methods are used [16, 68]. Serial
scan operations create unacceptably high activity due to frequent transitions in scan chains.
This problem could be solved by either minimizing the flip-flop transitions or slowing down
the scan clock [12, 13, 19, 68] which would increase the test time.
The basic architecture of RAS architecture is illustrated in Figure 3.6. The RAS
structure allows reading or writing of any flip-flop using log2n address bits where n is the
number of scanned flip-flops. The address can be applied in either a parallel manner using
multiplexed PIs or in a serial manner using an address shift register (ASR). In this work an
ASR or an Address Register is used in which the address of the flip-flop could be scanned-in
a serial mode. When the address is applied, the address decoder generates a scan enable
21
Scan?In
? ? ? ? ? ? ? ?
CUT
Flip?Flops
Scan?Enable
Address Decoder
Address Register
Figure 3.6: Abstracted structure of RAS.
signal to the corresponding flip-flop and the addressed flip-flop is written a new value from
the scan-in signal.
A test application example is illustrated in Figure 3.7. An example circuit CUT with
five flip-flops and test set T shown in Table 3.3 is considered. Since only flip-flops are
scanned, the table gives values of the pseudo primary input (PPI) part of input vectors
and the pseudo primary output (PPO) part of its fault-free response. An application of T
is considered in a sequence t1 ? t2 ? t3 ? t4. After t1 has been applied, the response
is captured as o1. If a fault which can be detected by t1 is present in the circuit, it will
be detected via a PO or the MISR. Otherwise, the value of o1 will be equal to the fault
free output. The RAS can directly update the fourth bit of o1 for application of t2 as
illustrated in Figure 3.7. Scan operations are required for every flip-flop whose value is
different between oi and ii+1. Figure 3.8 represents the entire test application sequence for
T drawn as a directed graph. Each vertex represents ii/oi pair for ti and weights on edges
are equal to the number of scan operations for the RAS environment. The sum of all the
weights on the edges in the graph equals nT. In this work [7] the authors have developed a
scheme to 1) minimize nT and 2) reduced the cost of address scan operations.
22
CUT
1
0 11
t1 t2
00
10
1
00
11
0
00
10
0
00
10
1
1
0 11
1o i2 o2i1
Scan?in operation
CUT
Figure 3.7: RAS scan-in operation.
Table 3.3: Example vectors [7].
Test set PPI (ii) PPO (oi)
t1 00101 00110
t2 00100 00101
t3 11010 11010
t4 00111 01011
No. of Scan
i o i o i o i o1 1 2 2 3 3 4 4
5 1 5 4
Figure 3.8: Test application using RAS [7].
23
Normal
Scan?in
ACLK
Address Decoder
ASR
Scan Enable
Mode CLK
Scan cell
D Q
Figure 3.9: Mux based RAS [7].
The authors proposed two techniques, namely test vector ordering and Hamming dis-
tance reduction to minimize the total number of RAS operations (nT). To illustrate this
they have used an example. For the test set T shown in Table 3.3 and for the sequence of
Figure 3.8, nT is 15, including the initialization. However, using test vector ordering, the
vectors can be reordered as t2 ? t1 ? t4 ? t3 and the weights on the edges become 5, 0, 1
and 2, which results in nT = 8.
Hamming distance reduction was achieved by modifying the test vectors to minimize
nT. In the above example if the first two bits of i4 can be switched to 1 without losing
fault coverage, then the weight of edge e34 in Figure 3.8 becomes 2 instead of 4 and results
in nT = 12. A method called Don?t-care identification [39] has been proposed to identify
x?s on specific bits in the test set. However this algorithm works best when the outputs
are free of x?s. This requires a modification of Don?t-care identification and iteration of the
Don?t-care identification procedure before and after the test vector ordering.
24
A multiplexor based RAS architecture was used in this work. Figure 3.9 illustrates the
architecture. The Mode input is set to 1 and the address bits are scanned into ASR via
the scan-in port and Address shift Clock (ACLK) to update a value of a flip-flop. After the
address is scanned-in the ACLK is deactivated and the system clock (CLK) is applied to
write to the selected flip-flop. All the unaddressed flip-flops hold their values. The scan-in
port can be shared by both value scan and address scan. This operation can be explained
as follows.
Assume a set of addresses Aij = {a1,a2,...,a3} to be accessed to apply tj after ti. The
scan operations will be repeated for all addresses in Aij. Once all flip-flops are set to desired
values, the Mode is set to 0 and CLK is applied to capture the test response. If there are
nff flip-flops, the total number of test data bits required to apply tj is [log2nff]?cij +cij,
where cij is the number of RAS operations to change oi to ij. This formula holds assuming
that all [log2nff] bits of the ASR are scanned for each address. However, if the address
scan operation is modified using the scan address ordering method, the number of data bits
required for the RAS can be reduced. The scan address ordering method uses an asymmetric
traveling salesman problem (ATSP) [18] that finds the optimum solution. The results for
the total number of clock cycles and test volume reduction for some ISCAS89 and ITC99
benchmark circuits compared to serial scan is given in Table 3.5.
The Table 3.5 is divided into four blocks. The first block contains the circuit statistics.
The second block lists the ASR width and the number of the RAS operations when this
method was used on the initial test set. The third and the fourth blocks compare the test
data volume and the test application time of this method against conventional serial-scan
method. The peak and average switching activity is compared between conventional serial
scan and RAS in Table 3.4. This work mainly outlined the significant advantages and
potential of RAS compared to serial scan.
25
Table 3.4: Peak and average switching activity during scan [7].
Peak switching activity Average switching activity
Circuit Serial RAS Ratio Serial RAS Ratio
name (%) (%) (%) (%) (%) (%)
s5378 39.76 5.00 12.58 22.79 0.218 0.957
s9234 42.27 10.81 25.57 25.72 0.220 0.857
s13207 38.80 4.15 10.70 24.93 0.052 0.207
s15850 40.75 8.51 20.89 24.55 0.092 0.374
s35932 21.50 0.21 0.96 6.30 0.032 0.506
s38417 34.58 1.46 4.22 23.62 0.001 0.002
s38584 31.31 18.86 60.23 24.23 .040 0.165
b17s 30.65 5.01 16.34 13.50 0.004 0.033
b20s 37.87 12.37 32.67 24.39 0.006 0.027
b22s 36.52 8.16 22.34 22.67 0.003 0.015
3.1.5 Baik et al.?s Modified Methods
In the subsequent works related to RAS by Baik et al., they have extended the idea
that they evolved in their previous work [7] and given a practical dimension to implement
their design. They point out in this work that although multiple scan chains can be used to
reduce the length of scan chains, and hence the test application time, the number of scan
chains is limited by the number of test channels/pins on an automatic test equipment (ATE)
whose cost may be prohibitive [10]. There are several techniques that have been researched
to reduce the test application time for the limited scan I/O pins [3, 49] but the test power
has not been considered in these works. A very popular and one of the most significant
works in data compression has been by J. Rajski et. al. [48] on Embedded Deterministic
Test. However they fail to address the test power related issues. The technique developed
by the authors in their work is called Progressive Random Access Scan (PRAS) and test
application methods for PRAS were proposed with the goal of simultaneous reduction of test
application time, test data volume and test power with relatively small hardware overhead.
The PRAS structure [4] is similar to static random access memory (SRAM) or grid
addressable latch [58]. In PRAS architecture, scan-cells are configured as an m ? n SRAM
26
Ta
ble
3.5
:C
irc
uit
sta
tis
tic
s&
tes
td
ata
vo
lum
ea
nd
tes
ta
pp
lic
ati
on
tim
er
ed
uc
tio
n[
7].
Ci
rc
ui
ts
ta
tis
tic
s
RA
S
pr
op
er
tie
s
Te
st
ap
pl
ica
tio
n
tim
e
Te
st
ap
pl
ica
tio
n
tim
e
Ci
rc
ui
t
N
o.
N
o.
AS
R
N
o.
RA
S
Se
ria
l
RA
S
Re
du
ct
ion
Se
ria
l
RA
S
Sp
ee
d
up
na
m
e
FF
ve
ct
or
wi
dt
h
op
er
at
ion
(b
its
)
(b
its
)
(%
)
(c
yc
les
)
(c
yc
les
)
(r
at
io)
s53
78
17
9
10
0
8
20
89
17
90
0
75
32
57
.92
18
17
9
97
21
1.8
7
s92
34
22
8
11
1
8
34
07
25
30
8
11
05
5
56
.32
25
64
7
14
57
3
1.7
6
s13
20
7
66
9
23
5
10
50
43
15
72
15
27
89
1
82
.26
15
81
19
33
16
9
4.7
7
s15
85
0
59
7
97
10
48
81
57
90
9
23
16
3
60
.00
58
60
3
28
14
1
2.0
8
s35
93
2
17
28
12
11
56
68
20
76
3
15
49
5
25
.27
22
47
6
21
17
5
1.0
6
s38
41
7
16
36
87
11
15
20
3
14
23
32
58
90
5
58
.61
14
40
55
74
19
5
1.9
4
s38
58
4
14
52
11
4
11
13
94
0
16
55
28
57
47
1
65
.28
16
70
94
71
52
5
2.3
4
b1
7s
14
15
61
7
11
24
46
7
87
30
55
14
54
30
83
.34
97
50
87
17
05
14
5.1
3
b2
0s
49
0
43
8
9
17
68
0
21
46
20
73
86
7
65
.58
21
55
48
91
98
5
2.3
4
b2
2s
73
5
48
1
10
27
24
5
35
35
35
13
23
55
62
.56
35
47
51
16
00
81
2.2
2
27
b RE
SD SD
Master Slave
D Q
Driver
WE
SD
M Ma
Figure 3.10: RAM-based RAS [4].
like grid structure, and some additional peripheral and test control logic was added. The
number of rows and columns are decided by the geometry of the circuit or the number of
available test pins. During test mode, scan-cells in one of the m-rows is enabled, allowing
it to be read or written by the horizontal row enable signal available from the row enable
shift register.
The RAM-based PRAS cell is shown in Figure 3.10. A read operation is performed
when the contents of cells in the enabled row are placed on the vertical bidirectional scan-
data lines and passed to the sense amplifier. The data read from the scan-cells in a row are
passed to a multiple input signature register (MISR) to calculate the signature of the test
responses. The write operation is performed one cell at a time. A read cycle is followed by
the write cycle progressively for every row. Test cost reduction is done in a similar fashion
as in the previous work. The authors have also estimated the routing and area overhead of
this architecture. The PRAS architecture needed only marginal extra routing compared to
the Multiple serial scan (MSS) implementation, and the transistor overhead was negligible
compared to MSS in most cases among the benchmark circuits.
28
In another recent work by the authors, they have developed a test generation technique
for PRAS [5]. Here the goal is to minimize both nw (total number of PRAS write operations
to apply test sequence T) as well as N (Number of test patterns in T). The traditional test
generation measures such as SCOAP [27] are used to estimate the difficulty of controlling
or observing each line in the circuit. SCOAP measures the controllability and observability
by approximating the minimum number of lines to be set to control or observe a specific
line. On similar lines, the authors have defined a testability measure that approximates the
minimum number of scan-cells to be set for controlling or observing a specific line in the
circuit. A static component and a dynamic component are used in this testability measure.
The static component is calculated without considering the present state of the circuit and
the dynamic component is calculated taking into account the present state of the circuit.
Hence every time the state of the circuit changes the dynamic component is recalculated.
The algorithm to compact test vectors is given in Figure 3.11. The static testability is
calculated first and the initial states of all the scan-cells are assigned with random values.
Based on the initial state of the scan-cells, the dynamic testability including DP (Detection
Progress, represents the approximate percentage of scan-cells that are already set by the
current state of the circuit to detect a certain fault), is assigned to all faults. Then, the
fault with maximum DP is targeted. A test is generated for a targeted fault and is fixed.
The circuit is fault simulated and additional detected faults are dropped. After the fault
simulation, the dynamic testability (DT) and DP are recalculated. Then the faults with
maximum DP are iteratively targeted until all faults are targeted or the maximum DP
is above a dynamically changing threshold value. Once all the faults above the current
threshold DP are targeted, one test vector is generated and fault simulated to permanently
drop the detected faults. All the scan-cells are updated to the next state.
A further modification to PRAS has been proposed in [6]. While the PRAS configura-
tion uses an m?n grid structure by the distribution of scan-cells to minimize the routing
29
v
Yes
No
Yes
No
Start
Calculate static testability (ST)
Calculate DT and DP
Test Generation for fault with Max. DP
Success? No
Yes
Fix assigned values &
fault simulation for temporal fault drops
Update DT and DP
Any DP (I  ) above threshold?
Done
Update states to
(system clock)
PPO values
Unfix assigned 
values
All 
faults 
detected?
fault drop
for permanent 
Fault simulation 
Initialize states
Figure 3.11: Test generation procedure [5].
30
overhead, where the number of columns(n) and the number of address pins (log2n) are
predetermined by the grid configuration, regardless of the number of available test pins or
test channels, the partitioned grid approach takes into account the number of pins available
and structures the grid accordingly. This reconfiguration is done such that it minimizes the
routing overhead and reduces the test application time.
3.2 Summary
In this chapter we have explained most of the previous RAS architectures starting from
the very first reference in the literature. Our work is based on the work Baik et. al. in [7].
As we can see from the progress of work on RAS by the research community, RAS may
emerge as a very powerful DFT technique in the near future.
31
Chapter 4
Toggle Random Access Scan
In this chapter we introduce the new concept of Toggle Random Access Scan. The
motivation and the working of the design is explained in detail. The fundamental objective
while starting the work was to minimize the area and routing overhead associated with
earlier RAS designs and practically implement the design on an industrial circuit. Also the
idea was to motivate the paradigm shift from conventional serial scan toward RAS.
4.1 Toggle flip-flop
In Serial-Scan (SS), flip-flops form a seamless chain from the scan-in pin to the scan-out
pin in the test mode, forming a shift register structure. During normal mode of operation
the inputs to the flip-flops are from the combinational logic. During scan-in/scan-out, every
flip-flop is subject to change in state. This leads to continuous activity in the flip-flops, as
well as the combinational circuits, dissipating a lot of power, which is very undesirable. In
RAS, a decoder is used to address every flip-flop. Hence at any given point of time only one
flip-flop is accessed while the other flip-flops retain their states. This way no activity takes
place in the circuit during the scan mode or the test mode. The architectures described
in the literature [2, 7, 45] mainly consist of a scan-in signal that is broadcast to all the
flip-flops, a test control signal that is also broadcast to all flip-flops and a unique decoder
signal from the decoder to every flip-flop. The output from the flip-flop is either fed into a
MISR or the outputs are ORed to a primary output justifying the logic.
32
Table 4.1: RAS signals.
Function Clock Address decoder outputsRow (x) Column (y)
Normal active 0 0
data
Toggle inactive 1 active clock
data inactive active clock 1
Hold inactive 1 0
data inactive 0 1
inactive 0 0
4.1.1 Design
The design could become cumbersome if a unique decoder signal is routed to every
flip-flop and the scan-in signal is broadcast. In the design that we have developed, we use a
unique toggling scheme wherein the addressed flip-flop toggles its present state in the test
mode, thereby eliminating a separate globally routed scan-in signal. The output from the
flip-flop is fed into a bus. Thus the addressed flip-flop places its value on the bus in the test
mode providing the necessary observability.
The design of our RAS flip-flop can be described by three operations that are essential
to satisfy the test requirements, which are, to capture the response of the circuit in the
normal mode, to toggle the current state of the flip-flop being addressed and retrieve the
contents simultaneously, and finally to make sure that all unaddressed flip-flops hold their
previous states while one flip-flop is being accessed during test mode. The operations are
summarized in the firstcolumn of Table 4.1. The inherentredundancyin the clock signal [38]
is coupled with the signal from the decoder to trigger the latching in the flip-flop. We have
assumed the flip-flop to be made up of a master and a slave latch, as shown in Figure 2.3.
Every flip-flop gets two inputs, one from the row (x) and one from the column (y) de-
coder. The other inputs are clock and data from the combinational logic. The combinations
33
to Macro Cell
Contol Signal
= Total number of flip?flops
Lines
Address
Column DecoderRow Decoder
Primary Output
Feeds to a bus leading
to combinational logic
logic
Data from combinational
Clock
1
0 SM
2
1 )ff2(log  n
2
1 )ff2(log  n
)ff2(log  n
X
U
M
nff Lines nff
nff
yj
xi
Figure 4.1: Toggle random access scan flip-flop.
used for the three defined functions are listed in Table 4.1. The operation of the modified
scan-FF can be described using Figure 4.1.
4.1.2 Working
In the normal mode of operation, the x and y lines are ?0?s and the decoders are
disabled. The output of the AND gate inside the flip-flop is logic ?0? enabling the OR gate
and routing the data from the combinational logic through the multiplexer to be captured
in the flip-flop. The master is latched at the high pulse of the clock and the slave is latched
subsequently in the low pulse. In the test mode, the clock is stopped and the row and
column decoders select one line each to address a flip-flop at its intersection. Hence only
one flip-flop which is addressed, receives a logic ?1? at both x and y lines. The multiplexer
now routes the inverted contents of the flip-flop to the master; we refer to this as the toggle
34
mode. The signal on x or y is then switched to logic ?0?, performing the function of a clock
to load the slave latch. This operation can happen at any desired frequency (may be slower
than the functional clock). Hence the addressed flip-flop toggles its current state and at the
same time the tristate buffer is enabled to route the data previously stored in the flip-flop to
a common bus. Meanwhile, the other flip-flops have to hold their previous states while the
toggle operation is being performed on one flip-flop. Since the output from the AND gate is
a logic ?0?, the master latch never gets activated, since the clock is turned off. Consequently
the slave latch holds its previous state. One must note that addressing a flip-flop reads the
contents of the flip-flop as well as toggling its contents. Hence the contents of the flip-flop
after a read operation would be opposite to the value that was read out. Care is to be taken
to avoid a race condition in the flip-flop. This can be achieved by inserting appropriate
delays.
All the flip-flops can be cleared initially by using a built-in circuit, which in the clear
mode would read each flip-flop and, based on its current contents, determine if another read
operation is to be performed to clear it. For example, during the clear mode, if a flip-flop
is read and is found to contain a logic ?0?, the contents of that flip-flop would have toggled
to logic ?1? and the same flip-flop is addressed again to toggle its state to clear the flip-flop
(logic ?0? state). This operation requires two clock cycles. In the case when the first read
is a logic ?1?, the next cycle is a dummy cycle and the flip-flop is left unaddressed, since
it would have toggled to state ?0?. Hence, the number of clock cycles to clear all flip-flops
would be twice the number of flip-flops in the circuit. The working of the toggle flip-flop
is explained in detail in our paper presented at the VLSI Design and Test symposium ?05
(VDAT ?05) [40]
35
4.2 Decoder Design
The row and column decoders are built in such a way that the row and column lines
intersect to address a flip-flop. This design has the least area and routing overhead compared
to other decoding schemes. One may think of it as a Random Access Memory structure
where the combinational logic is built around the memory element. The total number of
rows and columns depends on the number of flip-flops and the actual layout of the circuit.
The least number of horizontal and vertical lines would be the case when both are equal in
number and numerically equal to the square root of the number of flip-flops in the circuit.
Let us assume that the row decoder decodes one among the ?m? lines and the column
decoder decodes one among the ?n? lines, where the total number of flip-flops are m ? n. It
is assumed that the inputs to the decoder fan-out from the primary inputs of the circuit,
since during test mode there is no activity in the combinational logic. Therefore the number
of inputs to the circuit must be greater than log2m + log2n.
In comparison with cross check [14], where an entire row needs to be addressed and a
single flip-flop can be set only if the contents of all other flip-flops in that row are known,
our method can be used to set or observe any flip-flop dynamically. This scheme would
not work if a MISR is used to capture the outputs. In our architecture we can address
any flip-flop without any constraint and read its value. Also, cross-check requires an extra
signal, namely scan-in, to set the desired value of the flip-flop. The decoder logic is purely
combinational. A control signal may be used to enable and disable the decoder during the
test mode and normal mode of operation. The macro level of the row and column decoder
implementation is shown in Figure 4.2
36
3
3
Row Decoder
Column Decoder
Figure 4.2: Decoder design.
4.3 Routing
The architecture described in [7] used three separate signals to control any given flip-
flop, apart from the signal feeding-in from the combinational logic. This design is illustrated
in Figure 4.3. Our design performs the equivalent function using only a decoder signal,
thereby eliminating two globally routed signals to the flip-flop. The output from every
flip-flop is connected to a bus that leads to a primary output pin. This is analogous to the
?Test-control? signal being routed in the serial scan, except that the Test-control signal is
connected to every flip-flop from a primary input pin. The scan-in signal, which forms a
seamless chain from a primary input to a primary output through all the flip-flops in serial
scan, is eliminated and a signal from the decoder to each flip-flop is added. The conventional
decoder scheme used in [7] becomes very complex and cumbersome to implement since a
single wire would have to be routed to every flip-flop. Also the decoder complexity will
grow proportionally. For 65536 (64K) flip-flops, 65536 unique wires will have to be routed
across the circuit and would require 64K 16-input AND gates to decode 16 address lines.
37
The outputs of the flip-flops are fed to a MISR, i.e. every flip-flop feeds to a MISR in the
previous RAS design by Baik et. al.
The grid architecture shown in Figure 4.2 was found to be the most efficient way to lay
out the decoders. The total number of extra routes added is m + n, where ?m? and ?n? are
the number of row lines and the column lines, respectively. With a minimum of two layers
of metal routing, the row wires can be accommodated within the channel in between the
cell rows and the column wires can be routed over the cell in the next metal layer. Hence
there will be an increase of one track per channel (assuming ?m? channels) and ?n? tracks
that are routed on the next metal layer. Let us assume a circuit with 65536 (64K) flip-flops
like before. Let us also assume a square layout that has 256 routing channels. Hence every
row will contain 256 flip-flops, i.e. m = 256 and n = 256. The total number of additional
tracks will be 256 + 256 = 512. Let the length of every channel be ?l? ?m and assuming
the vertical dimension to be a linear multiple of the channel length, i.e. (q ? l) ?m, then
the increase in length of routes is (q + 1) ? l ?m. Hence 65536 wires have been reduced to
512 wires.
4.3.1 Gate area overhead of RAS
Assume a circuit with ?ng? gates and ?nff? flip-flops, each consisting of 10 gates. Assume
the scan flip-flop is designed as shown in Figure 2.4, then the gate overhead of serial scan [10]
and RAS is given by equations (4.1) and (4.2), respectively
Gate overhead of scan = 4?nffn
g + 10?nff
?100% (4.1)
The RAS flip-flop has 4 gates of the multiplexer similar to scan-flip-flop and the gates
in Figure 2.4, the additional gates that are added are one AND-OR-INVERT (AOI) and
a tri-state buffer as shown in Figure 4.1, i.e., the logic can be minimized by using one
38
n?address wires to n?ffs
Address Decoder
Address
Clock
Mode
Scan flip?flop
combinational logic
logic
to combinational
Data from
to MISR
Scan?in
SM
X
U
M X
U
M
Figure 4.3: Design of RAS as described in [7].
complex gate (AOI) and using the same inverter that is used to invert the clock in a flip-
flop. The logic shown within the dotted box in Figure 4.1 can be further minimized. For
the number of gates increased by the decoder, let us assume a decoder structure built using
pass transistors shown in Figure 4.4. The number of transistors required to decode ?log2c?
lines to ?c? lines approximately equals 2 ? c. Let us assume that a gate is made up of 4
transistors and nff = c (horizontal lines) ? d (vertical lines). The gate overhead of RAS
can be approximated by the following equation:
Gate overhead of RAS = 6?nff +
?n
ff
ng + 10?nff ?100% (4.2)
Let us consider a circuit with 5,120 gates and assume that there are 512 flip-flops in
the circuit. The gate overhead of serial scan is 20% from Equation 1 and the gate overhead
of RAS is 30.2% from Equation 2. Hence there is an increase of 10% in the x dimension of
the layout.
39
bit<1>
bit<7>
bit<6>
bit<5>
bit<4>
bit<3>
bit<2>
bit<0>
Figure 4.4: Decoder built using pass transistors [65].
40
Table 4.2: Gate overhead of RAS vs Serial Scan.
Circuit
No. of No. of Gate Gate (%) Increase in
combi. Flip- overhead overhead gate area over
gates Flops Serial Scan RAS Serial Scan
s208 96 8 18.18 28.88 10.7
s349 161 11 19.29 30.18 10.89
s386 159 6 10.96 17.56 6.6
s420 196 16 17.98 28.09 10.11
s510 211 6 8.86 14.19 5.33
s641 379 19 13.36 20.80 7.44
s838 390 32 18.03 27.84 9.81
s1196 529 18 10.16 15.83 5.67
s1269 569 37 15.76 24.29 8.53
s3271 1572 116 16.98 25.87 8.89
s3384 1685 183 20.83 31.62 10.79
s5378 2779 179 15.67 23.80 8.13
s13207 7951 638 17.80 26.89 9.09
Comparing the transistor level implementations of serial scan and RAS from the syn-
thesized schematics obtained from the Design Architect R? tool by Mentor Graphics R? in 0.5
?m CMOS technology, the RAS flip-flop design had an addition of 16 transistors compared
to serial scan. Hence we can formulate the transistor overhead similar to the gate overhead
calculation as follows:
Transistor overhead of serial scan = 10?nffn
t + 28?nff
?100% (4.3)
Here ?nt? is the number of transistors in the circuit without the flip-flops and each
flip-flop is made up of 28 transistors. There are 16 extra transistors in RAS compared to
serial scan, hence the equation becomes:
Transistor overhead of RAS = 26?nff + 4?
?n
ff
nt + 28?nff ?100% (4.4)
41
4.4 Testing
The tests target all the stuck-faults in the CUT. Consistently dominant faults are
modeled on the tri-state buffers in the circuit [46, 11, 35, 55, 60, 31]. The decoder is first
tested using the MATS++[59] test. The flip-flops are cleared initially since it is assumed
that a clear operation is possible on all the flip-flops to initialize them and then the test is
performed.
{ arrowdblbothv(w0); ?(r0,w1); ?(r1,w0,r0) }
where:
arrowdblbothv - Addressing order can be either increasing or decreasing
?- Increasing memory addressing order
?-Decreasing memory addressing order
This test adequately tests for address decoder faults (AF) unlinked with transition
faults (TF) and all AFs linked with TFs. All the stuck at faults (SAF) are detected
because, from each cell a ?0? and a ?1? are read uniquely.
After the test-circuitry is tested for fault free operation, the flip-flops are set up to
perform the routine tests. The initial states are loaded into the flip-flops and the combina-
tional inputs are applied at the primary inputs. The vector sequences required to test the
decoder and flip-flops, are linearly proportional to the number of flip-flops in the circuit.
4.5 Algorithm to compact test vectors
The ?toggle? RAS is a new method to implement DFT and the maximum compaction
of test vectors may only be possible by using an algorithm specifically suited to it. A
greedy algorithm has been developed to compact the test vectors. Here the vectors for the
42
combinational circuit are obtained using an ATPG1. The vectors are sequenced based on
the response captured by the flip-flops for an input vector along with the change in state
of those flip-flops that are read where the faults have propagated during the application of
the previous vector. The algorithm is as follows:
1. Obtain the combinational vectors along with good circuit responses and store the results
in a stack
2. Find the flip-flops where faults are propagated at each vector
3. While number of vectors > 0
(a) Read all the flip-flops where the faults are detected
(b) Choose the next vector from the stack that has the least Hamming distance from
current flip-flop states
4. End While
The algorithm can be explained with an example as follows: First the compacted test
set is obtained using a combinational ATPG. A list of all the flip-flops where the faults are
propagated is stored for every vector in the test set. Now an initial vector is selected which
has the states of the flip-flops close to the circuit start-up state or clear state. The vector
is applied and the response is captured in the flip-flops. Those flip-flops are read where the
faults are propagated. Now the present states of the flip-flops in the circuit are those of the
response captured from the previous vector except those, whose values are toggled due to a
read operation performed on them. Then a search is performed on the remaining vectors to
determine the vector which has the least Hamming distance from the present state of the
1Vectors were obtained from HITEC/PROOFS [44, 43] and circuit responses and outputs where faults
were detected on each vector were obtained using AUSIM [57]
43
Table 4.3: Results of Vector Compaction for various Benchmark Circuits.
Circuit
No. No. of No. of No. of Test
of Combi. SS RAS time
FFs vectors vectors vectors red. (%)
s208 8 64 584 301 48.46
s349 11 42 687 366 46.72
s386 6 138 972 450 53.70
s420 16 128 2192 1056 51.82
s510 6 110 776 344 55.67
s641 19 142 2859 1148 59.85
s838 32 240 7952 3595 54.79
s1196 18 344 6554 2447 62.66
s1269 37 118 4521 1981 56.18
s3271 116 264 31004 12540 59.55
s3384 183 260 48759 21119 56.69
s5378 179 618 111419 48677 56.31
s13207 638 1138 727820 309132 57.53
circuit. This will need minimum clock operations to set up the next test vector. The same
procedure is followed until all the vectors have been applied.
4.6 Results
Theproposedarchitecture was modeled and tested on ISCAS?89 [8] benchmarkcircuits.
The algorithm was implemented and the fault coverage was observed to be the same as
serial scan. A reduction in test vectors up to 60% can be observed (Table 4.3) in most
of the circuits. Maximum reduction is acheived when the average number of faults per
combinational vector is small and the number of flip-flops is proportionally higher, since in
these cases the setup time of scan flip-flops would increase compared to RAS. The reduction
in test time is slightly lower than that described in [7]. This is because of the improvement
that we made in the design, by minimizing the number of signals that needs to be routed
to every flip-flop.
44
During scan-in, the CUT is subject to unnecessary activity and all the flip-flops are
subject to change state. Various methods are presented in the literature to mask the flip-
flop transitions during test mode [24, 67] . Let us assume that the power dissipation in the
CUT is directly proportional to the number of transitions in the primary inputs and the
transitions in the states of flip-flops. The power dissipation in RAS is reduced drastically,
since, the only activity during scan mode is a transition in the state of a single flip-flop
under consideration and transitions at the primary input pins that control the decoder.
Relative reduction of power dissipation in the circuit is calculated assuming that, the
power dissipated is directly proportional to the number of transitions in the primary inputs
and states of flip-flops. The results were obtained for both serial scan and RAS (Table 4.4).
It can be observed that, as the size of the circuits increases, reduction in power dissipation
up to 99% is achieved using RAS.
4.7 Modifying ATPG to Further Decrease the Number of Vectors
The results presented in this paper are based on the vectors obtained using existing
ATPG algorithms. A slight modification in the form of an added constraint in the ATPG
algorithms can further decrease the number of test vectors needed using RAS. The following
algorithm can be employed to obtain this further compaction of test vectors;
1. Set the cost function of modifying the value of a flip-flop to be the highest
2. Generate a vector to target a fault
3. Perform Fault simulation
4. While the number of faults > 0
(a) Read all the flip-flop where the faults are detected
45
(b) Target a fault and Generate the next vector with minimum changes to be made
in the flip-flops from the current states considering the change of state due to a
read operation.
(c) Perform Fault simulation
5. End While
ConsidermodifyingPODEM?s [26]back-trace algorithm, such that thepseudo-controllability
of the flip-flop (pseudo primary inputs) is set very high. Thus during back-trace, a mini-
mal set of flip-flops is assigned for each targeted fault. This will require the least number
of flip-flops to be set at test. Furthermore, the test for the next fault is generated with
minimum changes to the test response captured in the scan chain from the current test,
again to minimize test application time. Early experimentation on the smaller benchmark
circuits indicates that such a strategy can show a 30-40% improvement in test time. It?s
worthwhile noting that better vector compaction can be achieved for larger circuits using
this algorithm.
4.8 Summary
In this chapter we introduced the concept of ?toggle? RAS. The design and working was
explained in detail. The routing and area overhead of the proposed architecture was derived
analytically. The decoder design is described in detail along with the method to test the
circuitry. We have presented the results of the experiments we performed on benchmark
circuits implementing this architecture.
46
Table 4.4: Power estimation based on number of transitions at the inputs for various Bench-
mark Circuits.
Circuit
No. of No. of Test
Tansitions Transitions power
in SS tests in RAS tests saving (%)
s208 1866 1209 35.21
s349 4755 1233 74.07
s386 2495 1515 39.28
s420 11587 4708 59.37
s510 3141 2382 24.16
s641 27715 7924 71.41
s838 72914 17782 75.61
s1196 57409 10601 81.53
s1269 77755 7880 89.87
s3271 1744149 45971 97.36
s3384 4299362 77665 98.19
s5378 8947677 175710 98.04
s13207 230176409 211048 99.91
47
Chapter 5
Scan-out Design
We have designed a novel mechanism for the scan-out of the flip-flops. This is a
hierarchical structure that ensures there is no loading on the flip-flops while driving the
output bus. The idea is illustrated in Figure 5.1. A cluster of flip-flops in close proximity
feed a common bus. The bus control signals from the flip-flops are ORed together to produce
a signal to control the next stage of the bus. This function is performed by the scan-out
macro-cell.
5.1 Macro-cell design
The design of the scan-out macro cell is given in Figure 5.2. The tri-stated signal
from each flip-flop feeds a bus. The maximum number of signals that can be placed on a
single bus depends on the specific technology that is used to implement the design. The
bus control signals from the flip-flops are ORed together and used to control the next level
tri-state buffer. These scan-out macro-cells can be replicated at several stages before the
bus signal reaches the primary output. This is illustrated in Figure 5.3, where a single 4
? 4 block is a structure similar to the one shown in Figure 5.1. 4 ? 4 is just shown as an
illustration. The number could be as large as the maximum number of tri-state buffers that
can be placed on a bus in that particular technology.
To avoid a slow read during test, normal D-flip-flops can be inserted after a given
number of stages of scan-out macro-cells so that the values are preserved for a multi-cycle
48
R44
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
R
A
S
Scan?out
Macro?Cell
Scan?out
Macro?Cell
Scan?out
Macro?Cell
Scan?out
Macro?Cell
Scan?out
Macro?Cell
x1
x2
x3
x4
y1 y2 y3 y4
R11 R12 R13 R14
R21 R22 R23 R24
R31 R32 R33 R34
R41 R42 R43
R
Figure 5.1: Macro level description of scan-out structure.
49
BUS control
Scan?out Macro?cell
To next
stage
signals
BUS
Figure 5.2: Scan-out Macro-cell.
read out. This is a novel scan-out design architecture that we have developed that is not
limited to this design but may be extended to SOC cores as well.
As an advancement, another way to implement the scan-out structure is to implement
a scheme with sense amplifiers and pre-charged lines, like in conventional memory, to read
the contents of the flip-flops.
50
Output Bus
4 x 4
4 x 4
4 x 4
4 x 4
Macro?cell
Scan?out
4?to?1 To Next Level
Figure 5.3: Hierarchical scan-out illustration.
51
Chapter 6
An Experimental Study
An experimental study was performed at Texas Instrument India Pvt. Ltd. to im-
plement this work in an industrial circuit. The duration of work was 3 months and the
outcome was very promising. The most significant results of the work is described in this
chapter.
The circuit used for the experiment was a module belonging to a Texas Instruments
(TI) System On Chip (SOC). The application of the SOC was targeted for high performance
video applications such as videophones, image processing, video CODECs and streaming
media. It is one of the fastest DSP chips made at TI.
A synthesized Verilog netlist was used, which was originally DFT-ready with scan flip-
flops inserted. The synthesis was performed using TI?s internal 90nm technology component
library. Synopsys?R? Design Compiler (DC) was used for synthesis.
Scan stitching was performed in DC on the original netlist to have one scan chain. This
netlist had 5,321 scan flip-flops contributing to 31,321 NAND gate equivalents, while the
total NAND gate equivalent count for the entire circuit was 107,315. This Verilog netlist
with a single scan chain will be referred to as the Serial Scan netlist.
Initially the RTL models for the RAS-cell, scan-out structure and row and column
decoders were synthesized using the same library. Tcl scripts were written and run in DC
to replace every scan flop by the RAS-cell from the original netlist. Tcl scripts were also
written to insert the scan-out structure and the row and column decoders. The final netlist
52
was saved in Verilog format. This Verilog netlist will be referred to as the RAS netlist from
here on.
The row and column decoders in the RAS netlist contributed to 507 NAND equivalent
gates. The 5,321 RAS-cells contributed to 83,805 NAND gates, while the scan-out structure
contributed to 20,919 NAND gates. The total NAND gate equivalent count for the entire
circuit was 181,224.
The scan-out structure was designed assuming only 4 drivers could drive a single bus.
But, for the module under consideration, which runs at 333Mhz, we could have 25 drivers
driving a single bus in the scan-out structure. So, instead of contributing to 20,919 NAND
gate equivalent counts, now it would contribute to just 2,636 gate counts. This would
decrease the area overhead.
A RAS standard cell could not be built but would have reduced the area overhead
significantly, the reason for which is justified below:
The scan flip-flop, if implemented using standard gates, resulted in 9.75 NAND equiva-
lent gates where 2 AND gates equals 2.5 NAND gates, 1 OR gate equals 1.25 NAND gates,
2 NOT gates equals 1.5 NAND gates and finally 2 LATCHES equals 4.5 NAND gates.
While the scan flip-flop standard cell used only 5.75 NAND equivalent gates, which is a
41% reduction.
In the case of RAS, a standard gate implementation used 15.75 NAND equivalent
gates. The split is 3 AND gates which equals 3.75 NAND gates, 2 OR gates which equals
2.5 NAND gates, 4 NOT gates which equals 3 NAND gates, 2 LATCHES which equals 4.5
NAND gates and 1 BUFFER which equals 2 NAND gates. An extra inverter was used in
synthesis since the library did not contain an active high enable BUFFER. Assuming that
extra inverter is removed, the gate count would be 15.
53
Also, let us assume that a 41% reduction is possible for a RAS standard cell. The
gate count would be down to a mere 8.85. But we chose 9 to be a little pessimistic in our
estimatation.
Using these values, we could calculate the gate area overhead of the circuit. The 5321
flip-flops were replaced by RAS cells which would contribute to 47,889 NAND equivalent
gates. The scan-out structure had 25 drivers on a single bus. The total increase in the
gate count would be 127,026. Hence the gate overhead of RAS over serial scan could be
summarized as:
Gate overhead = 127,026?107,315107,315 ?100% = 18.4% (6.1)
SynopsysR? Tetramax R? was used for generating the test patterns. DC was used to
generate the STIL files required by Tetramax R? for test generation.
For the serial scan netlist, Tetramax R? generated 612 patterns. The X-fill option was
used for test pattern generation, which filled the unspecified bits in the test pattern with
Xs and simulated the patterns using these X-filled vectors. Now to convert these serial scan
vectors to RAS vectors, we required to know which flip-flops captured a useful value for a
given pattern, i.e. the flip-flops that assisted in fault detection. But, no commercial tool
was able to give us this information. Hence, we assumed that all flip-flops capturing a value
other than an X were useful (please note that this is a pessimistic estimate). A vector is
applied and its response is captured. The useful flip-flops are then toggled (scanned out).
Now the new vector to be applied was chosen to be at a minimum Hamming distance from
the new state of the flip-flops to reduce the test time. A script was written to order the
vectors using this strategy.
The total test time for serial scan and RAS is calculated as follows:
54
Total Test Time for Serial Scan = No. of vectors
(No. of clock cycles) ? (No. of scan flip?flops + 1)
+ No. of scan flip?flops
= 612?(5,321 + 1) + 5,321
= 3,262,385 (6.2)
Total Test Time for RAS = No. of scanout operations
(no. of clock cycles) (i.e. Cumulative No. of useful flip?flops)
+ Cumulative No. of toggles performed
to apply new vectors after vector reordering
+ 612 (test mode clock cycles)
= 363,584 + 456,989 + 612
= 821,185 (6.3)
Test Time Reduction = 3,262,385 ?821,1853,262,385 ?100%
= 74.82%
This amounts to a speedup of 4X. This number can be increased with careful analysis.
We found that after collapsing, the faults are reduced to 166,261. So a maximum of 166,261
55
scan-out operations will be required instead of 363,584. This can only be achieved if the
ATPG tool gives us the useful flip-flop information. Now with this new number, the speedup
would be 5.3X.
6.1 Physical Design
Magma R? Design Automations Blast Fusion R? was used for doing the Physical Design.
The floor plan of serial scan occupied a total floor area of 1.125 sq. mm. The total area
used was 0.970 sq. mm., and the utilization was 86.2%. In case of RAS the same floor plan
was used with a larger floor area. The total floor area was equal to 1.563 sq. mm. and the
area that was used was 1.382 sq. mm., which gives a utilization factor of 88.4%.
Area and Routing Overhead = 1.563?1.1251.125 ?100%
= 39%
So a 68.9% area overhead after synthesis had translated to 39% area and routing
overhead after physical design. Now if we consider just the 18.4% area overhead after
synthesis after using the RAS standard cell and new scan-out structure, we can say that
this could effectively translate to an area and routing overhead of just 10.4% after physical
design. Sometimes, it may so happen that even if there is an increased area overhead after
synthesis, it could result in a zero overhead after physical design as all this increased gate
overhead could be accommodated in the same floor plan with an increased utilization.
6.2 Design and Implementation Issues
Due to the toggle mechanism used in the design, there is difficulty in loading the first
pattern. Either there has to be some added circuitry to clear or set the flops initially or,
56
there needs to be some kind of feedback loop during testing which will allow the first pattern
to be loaded after reading out the contents of the flip-flops.
In the test mode, the clock is suppressed to the scan flip-flops while it is still applied
to the rest of the circuit (Scanout structure). This could result in some added circuitry for
suppressing the clock or some routing overhead, as we may have to route two separate clock
trees.
We were not able to get an exact estimate of the area overhead as we did not have a
RAS standard cell in the library. Also none of the commercial ATPG tools allowed us to
get the information about the useful flip-flops assisting in fault detection. Finally, we were
expecting to have an extra metal layer to route the column decoder signals, but the metal
layers for a certain library and certain technology were fixed and hence we did not have any
control over it.
57
Chapter 7
Conclusions
In conclusion, we have designed a novel toggle RAS flip-flop which eliminates two
broadcast signals, namely scan-in and test control (mode control) signal from earlier RAS
designs. The main constraint today is routing within the circuit. Transistors can be added
at a very low cost in today?s technology. We have met our target in reducing the routing
overhead by eliminating the two broadcast signals.
We have shown the advantages of RAS over single chain serial scan by reducing the
test time by 60% and reducing the test power by three orders of magnitude. We have also
derived an analytical expression for the increase in gate area overhead compared to serial
scan. This expression agrees well with the experiment performed on an industrial circuit.
We have developed auniquescan-out structurethrough which thevalues can bescanned
out dynamically through a primary output pin. This structure can also be used for a multi
cycle readout to reduce the slow scan-out time.
7.1 Future Work
An ATPG needs to be built specific to toggle RAS. Experiments related to delay testing
needs to be performed. The design needs to be implemented and studied in its completeness.
7.1.1 Delay Testing
Delay testing in serial scan circuits is very constrained. The scan-FFs are modified
and HOLD latches [20, 21] are often inserted between the FFs and the combinational logic.
58
The latches insert excess delays in the path and increase area overhead due to routing of
an additional control (HOLD) signal. A one bit change in the consecutive vectors can be
obtained very easily using RAS, which is very vital in the case of delay testing. A vector V1
is set up and vector V2 with a one bit change is applied. It is known that any testable path
can be tested by a single input change vector pair [25]. These tests are easy to apply in RAS
but cannot be guaranteed in serial scan. A change in state of a flip-flop only needs one clock
and the circuit response is captured in the next clock cycle, thereby testing a desired path
for delay. Hence delay testing can be performed using RAS with no additional hardware,
and any combinationally generated delay test vector will work for sequential circuits using
RAS.
7.1.2 Random-Pattern BIST using RAS
With the ability to control any flip-flop in the circuit, random patterns can be applied
by just addressing any flip-flop through the primary inputs. While testing the flip-flops and
decoder for faults initially, using the march test, a fault simulation will result in random
pattern testing of the circuit and the results may be interesting to observe. BIST circuit
to implement the march tests are relatively easier to implement and are commonly used to
test random access memory [30, 36, 62, 59].
Error diagnosis, which is a lengthy process for serial scan, can be very efficient with
RAS.
59
Bibliography
[1] V. D. Agrawal, K.-T. Cheng, D. D. Johnson, and T. Lin, ?Designing Circuits with Partial
Scan,? IEEE Design & Test of Computers, vol. 5, pp. 8?15, Apr. 1988.
[2] H. Ando, ?Testing VLSI with Random Access Scan,? in Digest COMPCON, Feb. 1980, pp.
50?52.
[3] B. Arslanand A. Orailoglu,?Test CostReduction through a ReconfigurableScan Architecture,?
in Proc. International Test Conf., Oct. 2004, pp. 945?952.
[4] D. H. Baik and K. K. Saluja, ?Progressive Random Access Scan: A Simultaneous Solution to
Test Power, Test Data Volume and Test Time,? in Proc. International Test Conf., Nov. 2005.
[5] D. H. Baik and K. K. Saluja, ?State-reuse Test Generation for Progressive Random Access
Scan: Solution to Test Power, Application time and Data Size,? in Proc. 14th IEEE Asian
Test Symp., Dec. 2005.
[6] D. H. Baik and K. K. Saluja, ?Test Cost Reduction Using Partitioned Grid Random Access
Scan,? in Proc. 19th International Conf. VLSI Design, Jan. 2006.
[7] D. H. Baik, K. K. Saluja, and S. Kajihara, ?Random Access Scan: A Solution to Test Power,
Test Data Volume and Test Time,? in Proc. 17th International Conf. VLSI Design, Jan. 2004,
pp. 883?888.
[8] F. Beglez, D. Bryan, and K. Komzminski, ?Combinational Profiles of Sequential Benchmark
Circuits,? in Proc. IEEE International Symp. on Circuits and Systems, 1989, pp. 1929?1934.
[9] B. Bhattacharya, S. Seth, and S. Zhang, ?Double-Tree Scan: A Novel Low-Power Scan-Path
Architecture,? in Proc. International Test Conf., 2003, pp. 470?479.
[10] M. L. Bushnell and V. D. Agrawal, Essentials of Electronic Testing for Digital, Memory and
Mixed-Signal VLSI Circuits. Boston, MA: Kluwer Academic Publishers, 2000.
[11] S. T. Chakradhar, S. G. Rothweiler, and V. D. Agrawal, ?Redundancy Removal and Test
Generation for Circuits with Non-Boolean Primitives,? IEEE Trans. on CAD, vol. 16, no. 11,
pp. 1370?1377, Nov. 1997.
[12] A. Chandra and K. Chakrabarty,?Combining Low-PowerScan Testing and Test Data Compres-
sion for System-on-a-chip,? in Proc. ACM/IEEE Design Automation Conf., 2001, pp. 166?169.
[13] A. Chandra and K. Chakrabarty, ?System-on-a-Chip test data compression and decompression
architectures based on Golomb codes,? IEEE Trans. Computer-Aided Design of Integrated
Circuits and Systems, vol. 20, no. 3, pp. 355?368, Mar. 2001.
[14] S. J. Chandra, T. Ferry, T. Gheewala, and K.Pierce, ?ATPGbased ona NovelGrid Addressable
Latch Element,? in Proc. ACM/IEEE Design Automation Conf., 1991, pp. 282?286.
[15] K.-T. Cheng, S. Devadas, and K. Keutzer, ?Delay Fault Test Generation and Synthesis for
Testability Under a Standard Scan Design Methodology,? IEEE Trans. on Computer-Aided
Design, vol. 12, pp. 1217?1231, Aug. 1993.
60
[16] R. M. Chou, K. K. Saluja, and V. D. Agrawal, ?Power Constraint Scheduling of Tests,? in
Proc. 7th International Conf. VLSI Design, Jan. 1994, pp. 271?274.
[17] R. M. Chou, K. K. Saluja, and V. D. Agrawal, ?Scheduling Tests for VLSI Systems Under
Power Constraints,? IEEE Trans. VLSI Systems, vol. 5, no. 2, pp. 175?185, June 1997.
[18] T. H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to algorithms. New York:
McGraw Hill, 2000.
[19] V. Dabholkar, S. Chakravarty, I. Pomeranz, and S. M. Reddy, ?Techniques for Minimizing
Power Dissipation in Scan and Combinational Circuits During Test Application,? in IEEE
Tran. on Computer-Aided Design of Integrated Circuits and Systems, 1998, pp. 1325?1333.
[20] S. DasGupta, P. Goel, R. G. Walther, and T. W. Williams, ?A Variation of LSSD and its
Implications on Design and Test Pattern Generation in VLSI,? in Proc. International Test
Conf., 1982, pp. 216?219.
[21] S. DasGupta, R. G. Walther, T. W. Williams, and E. B. Eichelberger, ?An Enhancement to
LSSD and Some Applications of LSSD in Reliability, Availability and Servicebility,? in Proc.
International Fault-Tolerant Computing Symp, 1981, pp. 32?34.
[22] E. B. Eichelberger, E. Lindbloom, J. A. Waicukauski, and T. W. Williams, Structured Logic
Testing. Englewood Cliffs, NJ: Prentice-Hall, Inc., 1991.
[23] H. Fujiwara, Logic Testing and Design for Testability. Cambridge, MA: The MIT Press, 1985.
[24] S. Gerstendorfer and H. J. Wunderlich, ?Minimized Power Consumption for Scan-based BIST,?
in Proc. International Test Conf., 1999, pp. 77?84.
[25] M. A. Gharaybeh, M. L. Bushnell, and V. D. Agrawal, ?Classification and Test Generation
for Path-Delay Faults Using Single Stuck-at Fault Tests,? J. Electronic Testing: Theory and
Applications, vol. 11, no. 1, pp. 55?67, Aug. 1997.
[26] P. Goel, ?An Implicit Enumeration Algorithm to Generate Tests for Combinational Logic
Circuits,? IEEE Trans. on Computers, vol. C-30, no 3., pp. 215?222, Mar. 1981.
[27] L. H. Goldstein, ?Controllability/observability analysis of digital circuits,? IEEE Trans. Cir-
cuits and Systems, vol. CAS-26, no 9., pp. 685?693, 1979.
[28] I. Hamzaoglu and J. Patel, ?Reducing Test Application Time for Full Scan Embedded Cores,?
in FTCS, 1999, pp. 260?267.
[29] E. P. Hsieh, R. A. Rasmussen, L. J. Vidunas, and W. T. Davis, ?Delay test generation,? in
Proc. 14th ACM/IEEE Design Automation Conf., 1977, pp. 486?491.
[30] C.-T. Huang, J.-R. Huang, C.-F. Wu, C.-W. Wu, and T.-Y. Chang, ?A Programmable BIST
Core for Embedded DRAM,? IEEE Design & Test of Computers, vol. 16, no. 1, pp. 59?70,
Jan. 1999.
[31] N. Itazaki and K. Kinoshita, ?Test Pattern Generation for Circuits with Tri-State Modules by
Z-Algorithm,? IEEE Trans. Computer-Aided Design, vol. 8, no. 12, pp. 1327?1334, Dec. 1989.
[32] N. Ito, ?Automatic incorporation of on-chip testability circuits,? in Proc. 27th ACM/IEEE
Design Automation Conf., 1990, pp. 529?535.
[33] S. Kajihara, K. Ishida, and K. Miyase, ?Test Vector Modification for Power Reduction During
Scan Testing,? in Proc. VLSI Test Symp., 2002, pp. 160?165.
[34] K. Kishida, F. Shirotori, Y. Ikemoto, S. Isiyama, and Y. Hayashi, ?A delay test system for
high-speed logic lsi?s,? in Proc. 23rd ACM/IEEE Design Automation Conf., 1986, pp. 786?790.
[35] Y. Koseko, T. Ogihara, and S. Murai, ?Ti-state bus conflict checking method for atpg using
bdd,? in Proc. International Conf. Computer Aided Design, 1993, pp. 512?515.
61
[36] K.-J. Lin and C.-W. Wu, ?Testing Content-Addressable Memories Using Functional Fault Mod-
els and March-like Algorithms,? IEEE Trans. Computer-Aided Design of Integrated Circuits
and Systems, vol. 19, no. 5, pp. 577?588, May 2000.
[37] C. M. Maunder and R. E. Tulloss, The Test Access Port and Boundary-Scan Architecture.
IEEE Computer Society Press, 1990.
[38] M. R. Mercer and V. D. Agrawal, ?A Novel Clocking Technique for VLSI Circuit Testability,?
IEEE J. Sol. St. Circ., vol. SC-19, pp. 207?212, Apr. 1984.
[39] K. Miyase, S. Kajihara, I. Pomeranz, and S. M. Reddy, ?Don?t-Care Identification on Specific
Bits of Test Patterns,? in Proc. VLSI Test Symp., Sept. 2002, pp. 194?200.
[40] A. S. Mudlapur, V. D. Agrawal, and A. D. Singh, ?A novel random access scan flip-flop design,?
in Proc. 9th VLSI Design & Test Symp. (VDAT?05), Aug. 2005, pp. 226?236.
[41] A. S. Mudlapur, V. D. Agrawal, and A. D. Singh, ?A random access scan architecture to reduce
hardware overhead,? in Proc. International Test Conf., Nov. 2005.
[42] S. Narayanan and M. A. Breuer, ?Reconfigurable scan chains: A novel approach to reduce test
application time,? in Proc. International Conf. Computer Aided Design, 1994, pp. 271?274.
[43] T. M. Niermann, W.-T. Cheng, and J. H. Patel, ?PROOFS: A Fast, Memory-Efficient Sequen-
tial Circuit Fault Simulator,? IEEE Trans. Computer-Aided Design of Integrated Circuits and
Systems, vol. 11, no. 2, pp. 198?207, Feb. 1992.
[44] T. M. Niermann and J. H. Patel, ?HITEC: A Test Generation Package for Sequential Circuits,?
in Proc. European Design Automation Conf., 1991, pp. 214?218.
[45] Z. Pl?iva, O. Nov?ak, and P. B. d?Aguerre, ?Hardware Overhead of Boundry Scan and RAS
Design Methodologies.? http://www.fm.vslib.cz/ kes/pub/ecms03.pdf.
[46] T. J. Powell, ?Consistently Dominant Fault Model for Tristate Buffer Nets,? in Proc. VLSI
Test Symp., 1996, pp. 400?404.
[47] J. Rajski and J. Tyszer, Arithmetic Built-In Self-Test. Upper Saddle River, NJ: Prentice-Hall,
Inc., 1998.
[48] J. Rajski, J. Tyszer, M. Kassab, and N. Mukherjee, ?Embedded Deterministic Test,? IEEE
Trans. on CAD, vol. 23, pp. 776?792, May 2004.
[49] S. Reda and A. Orailoglu, ?CircularScan: A Scan Architecture for Test Cost Reduction,? in
Proc. Design, Automation and Test in Europe (DATE?04).
[50] K. Saluja, ?An Enhancement of LSSD to reduce test pattern Generation Effort and Increase
Fault Coverage,? in Proc. ACM/IEEE Design Automation Conf., 1982, pp. 489?494.
[51] R. Sankaralingam, R. R. Oruganti, and N. A. Touba, ?Static Compaction Techniques to Control
Scan Vector Power Dissipation,? in Proc. VLSI Test Symp., 2000, pp. 35?40.
[52] J. Savir, ?Skewed-Load Transition Test: Part I, Calculus,? in Proc. International Test Conf.,
1992, pp. 705?713.
[53] J. Savir, ?Skewed-Load Transition Test: Part II, Coverage,? in Proc. International Test Conf.,
1992, pp. 714?722.
[54] J. Savir, ?On Broad-Side Delay Testing,? in Proc. 12th VLSI Test Symp., 1994, pp. 284?290.
[55] R. Schrift, ?Digital Bus Faults Measuring Techniques,? in Proc. International Test Conf., 1998,
pp. 382?387.
[56] O. Sinanoglu, I. Bayraktaroglu, and A. Orailoglu, ?Test power reduction through minimization
of scan chain transitions,? in Proc. VLSI Test Symp., 2002, pp. 166?171.
62
[57] C. E. Stroud, ?AUSIM: Auburn University SIMulator - Version L2.2.? Dept. of Electrical &
Computer Engineering, Auburn University, Jan. 2004.
[58] T. G. Susheel, J. Chandra, T. Ferry, and K. Pierce, ?ATPG Based on A Novel Grid-Addressable
Latch Element,? in Proc. ACM/IEEE Design Automation Conf., 2002, pp. 282?286.
[59] A. J. van de Goor, Testing Semiconductor Memories: Theory and Practice. Chichester, UK:
John Wiley & Sons, Inc., 1991.
[60] J.T. vander Linden, M. H. Konijenburg, andA. J.van deGoor, ?CircuitPartitionedAutomatic
Test Pattern Generation Constrained by Three-State Buses and Restrictors,? in Proc. Asian
Test Symp., 1996, pp. 29?33.
[61] K. D. Wagner, ?Design for testability in the amdahl 580,? in Digest COMPCON, 1983, pp.
384?388.
[62] C.-W. Wang, C.-F. Wu, J.-F. Li, C.-W. Wu, T. Teng, K. Chiu, and H.-P. Lin, ?A Built-In
Self-Test and Self-Diagnosis scheme for embedded SRAM,? in Proc. Asian Test Symp., 2000,
pp. 45?50.
[63] F. C. Wang, Digital Circuit Testing. San Diego, CA: Academic Press, Inc., 1991.
[64] S. Wang and S. K. Gupta, ?ATPG for Heat Dissipation Minimization During Scan Testing,?
in Proc. ACM/IEEE Design Automation Conf., 1997, pp. 614?619.
[65] N. Weste and K. Eshraghian, Principles of CMOS VLSI Design. Reading, PA: Addison-Wesley,
2nd edition, 1992.
[66] M. J. Y. Williams and J. B. Angell, ?Enhancing Testability of Large-Scale Integrated Circuits
via Test Points and Additional Logic,? IEEE Trans. on Computers, vol. C-22, no. 1, pp. 46?60,
Jan. 1973.
[67] X. Zhang and K. Roy, ?Power Reduction in Test-Per-Scan BIST,? in International On Line
Testing Workshop, 2000, pp. 133?138.
[68] Y. Zorian, E. J. Marinissen, and S. Dey, ?Testing embedded-core-based system chips,? IEEE
Trans. Computer, vol. 32, no. 6, pp. 52?60, 1999.
63
Appendices
64
Appendix A
Description of the programs used to implement the vector compacting
algorithm
This appendix describes the various programs implemented to achieve the vector com-
paction. The programs were finally tied together using a script to automate the procedure.
The flow of functions are described using Figure A.1.
The sequential circuit netlist is transformed to a combinational circuit by removing the
flip-flops from the netlist and adding pseudo primary inputs and pseudo primary outputs.
A compacted test set is obtained for the combinational circuit using an ATPG such as
HITEC. Using the vectors obtained fault simulation is performed on a fault simulator that
can provide detailed information about the detected faults such as the vector that detected
the fault and the primary output (or pseudo primary output) at which it was detected.
A vector re-ordering program is executed to obtain minimum scan operations. The vector
re-ordering program can be explained using the following example:
Consider a circuit with three flip-flops. Assume the compacted vectors obtained from
the ATPG as shown in Table A.1. The first column indicates the test vector number, the
second column indicates the values of the pseudo primary input (PPI), the third column
indicates the values of the pseudo primary outputs (PPO), the fourth column indicates the
Table A.1: Example vectors
Test set PPI (ii) PPO (oi) Faults detected at Modified PPO after read
t1 000 111 FF1, FF3 010
t2 011 101 FF2, FF3 110
t3 110 110 FF1 010
t4 010 001 - 001
65
original sequential circuit
newly ordered vectors
Perform fault simulation with
For verification compare the
undetected faults after fault
simulation before and after
implementing RAS
Done
Start
Convert the sequential circuits
Obtain compacted test vectors
from an ATPG(HITEC)
Perform fault simulation on any
fault simulator that provides
details of fault propagation to POs
to combinational circuits
Run vector re?ordering program
Insert RAS and decoder logic 
replacing the flip?flops from the
Figure A.1: Vector compaction program flow.
66
flip-flops or PPOs where the faults have been detected for which a read is to be performed
and the fifth column indicates the value of PPOs after a read at the respective flip-flops.
The first vector is chosen to be the closest to the all zero state, which is t1. The
program now searches the entire vector set for a closest match between the modified PPO
value (which is ?010? in this case) and a PPI state (which is t4 in this case). The program
stops searching when the first exact match is found. If an exact match is not found, the
PPI with the least Hamming distance from the modified PPO state is chosen. It is now
intuitive that the following vector after t4 is t2 and the last vector is t3. It is worth noting
that we have reduced the number of scan operation from 2 to 1.
The next step in the flow is to remove the flip-flops from the netlist and add the
RAS flip-flop and the decoders. A fault simulation is performed with the newly ordered
vectors. To verify that all the combinational faults in the circuit are detected using the
RAS architecture, the undetected faults from the two simulations (with and without RAS)
are compared.
67
Appendix B
Description of the programs used to calculate the power dissipation
during test
This appendix describes the program used to calculate the power dissipation during
scan-in. We have assumed that the change in state at the input of any circuit is directly
proportional to the activity in the circuit and hence proportional to the power dissipated
by the circuit. A program to calculate the number of bit changes compared to the previous
input during serial scan-in process was developed. This number for RAS is the change
in the address bits during scan-in process. Essentially the program counts the number of
transitions occurring at the primary and pseudo primary inputs of the circuit.
68