Built-In Self Test for Regular Structure Embedded Cores in
System-on-Chip
Except where reference is made to the work of others, the work described in this
thesis is my own or was done in collaboration with my advisory committee. This
thesis does not include proprietary or classi ed information.
Srinivas Murthy Garimella
Certi cate of Approval:
Victor P. Nelson
Professor
Electrical and Computer Engineering
Charles E. Stroud, Chair
Professor
Electrical and Computer Engineering
Adit D. Singh
Professor
Electrical and Computer Engineering
Stephen L. McFarland
Acting Dean
Graduate School
Built-In Self Test for Regular Structure Embedded Cores in
System-on-Chip
Srinivas Murthy Garimella
A Thesis
Submitted to
the Graduate Faculty of
Auburn University
in Partial Ful llment of the
Requirements for the
Degree of
Master of Science
Auburn, Alabama
May 13, 2005
Built-In Self Test for Regular Structure Embedded Cores in
System-on-Chip
Srinivas Murthy Garimella
Permission is granted to Auburn University to make copies of this thesis at its
discretion, upon the request of individuals or institutions and at their expense.
The author reserves all publication rights.
Signature of Author
Date
Copy sent to:
Name Date
iii
Vita
Srinivas Murthy Garimella, son of Satyanarayana and Subhadra Garimella, was
born on August 29 1980 in Vijayawada, India. He graduated with distinction with
a Bachelor of Technology in Electronics and Communications Engineering degree in
May 2002 from Jawaharlal Nehru Technological University, Hyderabad, India. After
completion of his undergraduate degree, he joined Tata Consultancy Services (TCS),
India as Assistant Systems Engineer in June 2002. He entered the graduate program
in Electrical and Computer Engineering at Auburn University in August 2003. While
in pursuit of his Master of Science degree at Auburn University, he worked under the
guidance of Dr. Charles E. Stroud as a graduate student assistant in the Electrical
and Computer Engineering Department.
iv
Thesis Abstract
Built-In Self Test for Regular Structure Embedded Cores in
System-on-Chip
Srinivas Murthy Garimella
Master of Science, May 13, 2005
(B.Tech., Jawaharlal Nehru Technological University, Hyderabad, India. May 2002)
109 Typed Pages
Directed by Charles Stroud
Miniaturization and integration of di erent cores onto a single chip are increasing
the complexity of VLSI chips. To ensure that these chips operate as desired, they
have to be tested at various phases of their development. Built-In Self-Test (BIST)
is one technique which allows testing of VLSI chips from wafer-level to system-level.
The basic idea of BIST is to build test circuitry inside the chip so that it tests
itself along with the BIST circuitry. The idea of current research is to develop BIST
con gurations for testing memory cores and other regular structure cores in Field
Programmable Gate Arrays (FPGAs) and System-on-Chips (SoCs).
FPGA-independent BIST approach for testing memory cores and other regular
structure cores in FPGAs is described in this thesis. BIST con gurations were devel-
oped to test memory cores in Atmel and Xilinx FPGAs using this approach. Another
approach which takes advantage of some of the architectural capabilities of Atmel
SoCs to reduce test time is also described in this thesis.
v
Acknowledgments
I would like to thank Dr. Stroud for his support and advice throughout my
research at Auburn University. I would also like to thank Dr. Nelson and Dr. Singh
for being on my graduate committee and for their contribution to my thesis. I would
like to acknowledge my research colleagues John, Jonathan, Sachin and Sudheer for
their help and inspirational discussions during my research. Finally I would like to
express my deepest gratitude to my parents whose love and encouragement is inspiring
me to achieve my goals.
vi
Style manual or journal used LATEX{ A Document Preparation System, Leslie
Lamport, Addison-Wesley Publishing Company, 2nd edition (1994). Bibliography
follows IEEE Transactions.
Computer software used The document preparation package TEX (speci cally
LATEX) together with the departmental style- le aums.sty. The plots were generated
using Microsoft Excela174. Images drawn using Microsofta174Visioa174.
vii
Table of Contents
List of Figures x
List of Tables xii
1 Introduction 1
1.1 System-on-Chip (SoC) . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 CSOC Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Embedded Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Built-In Self-Test (BIST) . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 BIST for SoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.7 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2 Background 12
2.1 System on a Chip (SoC) . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 FPGA Architectures . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Switching Elements in FPGAs . . . . . . . . . . . . . . . . . . 15
2.2.2 PLB Architecture . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Embedded Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Memory Types . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.2 Embedded Memories in FPGAs . . . . . . . . . . . . . . . . . 22
2.3.3 Embedded Memories and FPGAs in SoCs and their Interfacing 28
2.4 BIST for Memories . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.4.1 Present Methods for Testing FPGAs and SoCs . . . . . . . . . 35
2.5 Thesis Restatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3 Implementation Of BIST On ATMEL FPGAs And SoCs 40
3.1 RAM BIST Approaches . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.1.1 BIST Approach for Free RAMs Using FPGA Logic . . . . . . 41
3.1.1.1 BIST Architecture for Dual-port Synchronous Mode 41
3.1.1.2 BIST Architecture for Single-port Modes . . . . . . . 44
3.1.2 Advantages and Limitations of Using VHDL . . . . . . . . . . 47
3.1.3 BIST Approach for Free RAMs Using Embedded Processor Core 51
3.1.3.1 AVR-FPGA Interface Description . . . . . . . . . . . 51
3.1.3.2 BIST Architecture . . . . . . . . . . . . . . . . . . . 53
3.1.3.3 Implementation of BIST Approach in FPSLIC . . . . 53
3.1.4 On-Chip Diagnostics . . . . . . . . . . . . . . . . . . . . . . . 56
viii
3.2 Data SRAM Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.3 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4 Implementation of BIST on Xilinx FPGAs 66
4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
4.2 PLB and Routing Architecture . . . . . . . . . . . . . . . . . . . . . 67
4.3 Embedded Block RAMs Architecture . . . . . . . . . . . . . . . . . . 69
4.4 Block RAM Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
4.4.1 Block RAM Testing in Single-port Mode . . . . . . . . . . . . 73
4.4.1.1 BIST Implementation . . . . . . . . . . . . . . . . . 75
4.4.1.2 Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . 78
4.4.2 Block RAM Testing in Dual-port Mode . . . . . . . . . . . . . 78
4.5 Summary of Block RAM Testing . . . . . . . . . . . . . . . . . . . . 79
4.6 LUT RAM Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.6.1 BIST Implementation . . . . . . . . . . . . . . . . . . . . . . 81
4.7 MULTIPLIER BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5 Summary and Conclusions 85
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Observations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.3 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
Bibliography 91
Appendices 97
A ASL code for free RAM 98
B VHDL Code for March Y algorithm 103
C March LR Algorithm for Block RAMs 108
C.1 March LR Algorithm with BDS for 16-bit Wide RAMs . . . . . . . . 108
C.2 RAMBISTGEN Input File Format for Generating VHDL Code . . . . 108
ix
List of Figures
1.1 Evolutionary Stages of System-on-Chip Products . . . . . . . . . . . 2
1.2 Typical Architecture of a CSOC . . . . . . . . . . . . . . . . . . . . . 3
1.3 Architecture of a Typical FPGA . . . . . . . . . . . . . . . . . . . . . 4
1.4 BIST Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1 Switching Elements Used in FPGAs . . . . . . . . . . . . . . . . . . . 15
2.2 FPGA Programming Controlled by SRAM Cells . . . . . . . . . . . . 18
2.3 Atmel AT40K Series PLB . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 PLB Array Interconnection in Atmel AT40K Series FPGAs . . . . . . 21
2.5 Structure of Memory Cells . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Arrangement of Free RAMs in Atmel AT40K Series FPGAs . . . . . 24
2.7 Architecture of a Free RAM Block . . . . . . . . . . . . . . . . . . . 25
2.8 Block Diagram of Spartan II Family FPGAs . . . . . . . . . . . . . . 27
2.9 Block Diagram of a Block RAM . . . . . . . . . . . . . . . . . . . . . 29
2.10 Embedded SRAM in Atmel?s FPSLIC . . . . . . . . . . . . . . . . . . 30
2.11 Partitioning of Embedded SRAM in Atmel?s FPSLIC . . . . . . . . . 31
2.12 AVR-FPGA-RAM Interface in Atmel?s FPSLIC . . . . . . . . . . . . 32
2.13 AVR-FPGA Cache Logic in Atmel?s FPSLIC . . . . . . . . . . . . . . 33
2.14 BIST Architecture for Testing PLBs in FPGAs . . . . . . . . . . . . 37
3.1 Dual-port Free RAM BIST Architecture and ORA Design . . . . . . 42
3.2 Single-port Free RAM BIST Architecture and ORA Design . . . . . . 45
x
3.3 Fault Simulation Results for Free RAM . . . . . . . . . . . . . . . . . 47
3.4 Snapshot of The RAMBISTGEN Tool . . . . . . . . . . . . . . . . . 49
3.5 AVR-FPGA Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.6 Architecture of RAMBIST From AVR . . . . . . . . . . . . . . . . . 53
3.7 RAMBIST Implementation from AVR . . . . . . . . . . . . . . . . . 54
3.8 Three Con gurations for Data SRAM testing . . . . . . . . . . . . . 60
4.1 Architecture of a Slice in Virtex and Spartan FPGAs . . . . . . . . . 68
4.2 Organization of Block RAMs in Various Xilinx FPGAs . . . . . . . . 70
4.3 Block Diagram of a Block RAM . . . . . . . . . . . . . . . . . . . . . 72
4.4 BIST Architecture for Block RAMs Testing . . . . . . . . . . . . . . 73
4.5 Block RAM Con guration for Testing both Ports in Single-port Mode 74
4.6 Design of a Single-bit ORA for Block RAM Testing . . . . . . . . . . 76
4.7 Programmable Logic Resources in Xilinx FPGAs . . . . . . . . . . . 80
4.8 ORA Designs Used for LUT RAM Testing . . . . . . . . . . . . . . . 82
4.9 Multiplier Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
xi
List of Tables
3.1 Timing Analysis Results for Three RAM BIST Con gurations . . . . 46
3.2 RAMBISTGEN Input File Format for March Y and March LR . . . . 50
3.3 Function of IOSEL Lines . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.4 Contents of Reg1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.5 Contents of Reg2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
3.6 BIST and Diagnosis Summary . . . . . . . . . . . . . . . . . . . . . . 58
3.7 Contents of Registers Used for Testing Data SRAM . . . . . . . . . . 61
3.8 Summary of RAM BIST Con gurations for FPSLIC . . . . . . . . . . 64
3.9 Memory Storage Requirements for BIST Con gurations . . . . . . . . 65
4.1 PLB Array Size Bounds for Xilinx Family FPGAs . . . . . . . . . . . 69
4.2 BIST PLB Count for Virtex I and Spartan II . . . . . . . . . . . . . 76
4.3 Function of Xilinx JTAG pins . . . . . . . . . . . . . . . . . . . . . . 77
4.4 TPG and ORA Counts for Testing Block RAMs in Dual-port Mode . 79
4.5 TPG and ORA Counts for Testing LUT RAMs . . . . . . . . . . . . 83
4.6 Multiplier BIST Slice Count . . . . . . . . . . . . . . . . . . . . . . . 84
xii
Chapter 1
Introduction
Since the arrival of the  rst transistor-based computer, high scale integration
became one of the main concerns in the hardware design techniques. In the early
1970?s relatively high levels of integration were achieved, but the continuing e ort
to miniaturize and build more complex digital circuitry remained one of the goals
in leading computer construction and chip design [1]. As a result, semiconductor
integration has progressed from Small Scale Integration (SSI) to Very Large Scale
Integration (VLSI) and now to System Level Integration (SLI) or System-on-Chip
(SoC) [1].
1.1 System-on-Chip (SoC)
SoC technologies are the consequent continuation of the Application Speci c
Integrated Circuit (ASIC) technology, whereas complex functions, that previously re-
quired heterogeneous components to be merged onto a printed circuit board, are now
integrated within one single silicon IC or chip [2]. As device integration scales grew,
the enhanced performance of memory, microprocessors and logic devices boosted the
performance of the digital systems they constituted. However, performance increases
in larger systems were hampered by speed limitations associated with the long and
numerous interconnects between devices on the printed circuit board (PCB) and as-
sociated input/output (I/O) bu ers on the chips. Closely related system functions
must be combined on a single chip to eliminate this bottleneck and take full advantage
1
of improvements in transistor switching speeds and higher integration scales. This is
precisely the capability that SoC technology provides. Rapid advances in semicon-
ductor processing technologies allowed the realization of complicated designs on the
same IC. Figure 1.1 illustrates the evolutionary stages toward SoC products.
(a) Multi-board systems
(c) System-on-Chip 
products
(b) Single-board systems
DRAM/Flash
Memory
CPU/Cache/
Interface
3D Graphics
DRAM/Fla
sh Memory
CPU/
Cache
3D Graphics
Interface/
Logic
MPEG
DSP
Figure 1.1: Evolutionary Stages of System-on-Chip Products
SoCs can be broadly classi ed into two categories: ASIC-based and Con gurable
or Programmable. While the Con gurable SoCs (CSoC) can be customized to dif-
ferent applications through embedded recon gurable logic cores, ASIC-based SoCs
cannot be customized. CSoCs combine the advantages of both ASIC-based SoCs and
multi-chip board development using standard components [1]. The major general
goal for the development of such application-tailored recon gurable architectures is
to realize adaptivity vs. power/performance/cost trade-o s by migrating functional-
ity from ASICs to multi-granularity recon gurable hardware [3].
2
1.2 CSOC Architecture
The typical architecture of a CSoC is as shown in Figure 1.2. A CSoC consists
of a dedicated microcontroller core and other components built around a common bus
system. Required applications can be designed using microcontroller, DSP core or
other Intellectual Property (IP) cores. The recon gurable logic core typically consists
of a low power Field Programmable Gate Array (FPGA). Embedded memories also
form a large portion of the CSoC [4] [5].
Reconfigura ble 
Logic
(FPGA)
On-Chip M emory Micro Controller
DSP Core Other IP Cores
Common  Bus System
Figure 1.2: Typical Architecture of a CSOC
1.3 FPGAs
FPGAs are  exible alternatives to custom ICs. FPGAs can be programmed by
the end users at their site. Moreover they can be reprogrammed any number of times.
3
Since FPGAs bring short time-to-market and  exibility for systems using digital logic
circuits, many applications have been developed in order to make best use of FPGA
reprogrammability. FPGAs can implement both combinational and sequential logic
of tens of thousands of gates.
PLB PLB PLB
PLB PLB PLB
PLB PLB PLB
IOB
Interconnect Network
Figure 1.3: Architecture of a Typical FPGA
A typical FPGA architecture usually includes three categories of user programmable
elements as shown in Figure 1.3 . Programmable Logic Blocks (PLBs), Input Out-
put Blocks (IOBs) and programmable interconnection network. PLBs are sometimes
called Con gurable Logic Blocks (CLBs). An interior array of PLBs provides the
functional elements from which the user?s logic is constructed, while IOBs provide
4
an interface between the logic array and device package pins. The programmable
interconnect network provides routing paths to connect the inputs and outputs of the
PLBs and IOBs [6]. The functionality of these three types of programmable elements
is controlled by the con guration memory in the FPGA.
The FPGA provides a generic chip that can be programmed for any application
by downloading a desired con guration into the con guration memory of the FPGA.
This dictates the behavior of the underlying hardware (PLBs, IOBs and interconnect
network). The programming data takes the form of a bit-stream consisting of a string
of binary 1s and 0s and is stored in the con guration memory. These con guration
memory bits are then used on-board the FPGA to control the on-o state of vari-
ous pass transistors and multiplexers to program the PLBs, IOBs and interconnect
elements [7].
Improvements in process technology have had a signi cant impact on the archi-
tecture of FPGAs. Traditionally FPGAs were targeted to implement smaller logic
circuits. Recently, FPGAs are being used to implement relatively large circuits and
systems. Since the large systems often require data storage, large on-chip memories
have become an essential component of high-density FPGAs [8] [9]. These memory
arrays can also be con gured as Read Only Memories (ROMs) to implement large
combinational logic functions.
1.4 Embedded Memories
SoCs are moving from logic-dominant chips to memory-dominant chips. Large
amounts of Static Random Access Memory (SRAM), ROM, Erasable Programmable
5
Read-Only Memory (EPROM) and multi-port RAMs are  nding their way on board.
According to the International Technology Roadmap for Semiconductors [10] memo-
ries will cover 90 percent of the SoC die area by 2010. Because of their high density,
embedded memories are more prone to defects that already exist in silicon than any
other component on the chip [11]. Increasing the memory on a SoC complicates the
manufacturing processes and reduces yield, adding to the cost of the SoC. Therefore
from a testability point of view, it is essential to thoroughly test memories in the
SoCs [12].
1.5 Built-In Self-Test (BIST)
Traditionally chips were tested using Automatic Test Equipment (ATE). Tests
ranged from those developed manually to those generated automatically for scan-
based designs. Scan is a Design for Test (DFT) technique whereby all internal storage
elements are modi ed so that in test mode they form individual stages of a shift
register for scanning test data in and test responses out. The use of Automatic
Test Pattern Generation (ATPG) programs to generate manufacturing tests for VLSI
designs became popular in the early 1980s. Soon, it was also recognized that test
circuitry must be added to a design to simplify ATPG [13].
As the complexity and size of ICs grew, test equipment became more sophisti-
cated increasing the manufacturing cost to as much as 30 to 40 percent of the cost of
production [14]. Because of the limitations of the conventional testing techniques, a
new DFT technique called Built-In Self-Test (BIST) was developed.
6
 
BIST  
Controller 
Test Pattern Generator 
(TPG) 
Circuit under Test (CUT) 
Output Response Analyzer 
(ORA) 
Figure 1.4: BIST Architecture
BIST is a DFT technique in which testing is accomplished through built-in hard-
ware features [15]. The basic idea is to have a VLSI chip that tests itself. The typical
BIST architecture is composed of three hardware modules in addition to the circuit
under test (CUT), as shown in Figure 1.4. The Test Pattern Generator (TPG) gener-
ates the test patterns for the CUT. The Output Response Analyzer (ORA) compares
or analyzes the test responses to determine correctness of the CUT. The BIST con-
troller is the central unit to control all the BIST operations including initialization
and length of the BIST sequence. In a BIST system hierarchy, there are BIST con-
trollers at each level of the circuit hierarchy, such as module, chip, board, and system
levels. Each BIST controller is responsible for the self-test in that particular level,
the control of BIST operations for the lower level BIST, and the reporting of the test
results to the upper level. The design of a TPG is determined by the test strategy
7
being deployed. The test strategy being selected is determined by the fault coverage,
test hardware overhead, and testing time [15].
1.6 BIST for SoC
The major advantages of the Con gurable SoC (CSoC) technique are a short
time to market due to pre-designed cores, less cost due to reusability of cores, a
higher performance using optimized algorithms and less hardware area using opti-
mized designs. But the SoC technique also introduces new di culties into the test
process caused by the increased complexity of the chip, the reduced accessibility of
the cores and the higher heterogeneity of the modules. In the SoC test process, a core
test strategy has to be determined  rst. Then a SoC test strategy has to be selected
where the test access for individual cores is determined and the tests are integrated
at the system level. All these tasks are simpli ed if the cores and the entire system
support a BIST strategy [16].
1.7 Thesis Statement
Many of today?s chips demand more embedded memory than ever before. SoCs
and FPGAs are also moving from logic-dominant to memory-dominant chips. The
addition of memory, while it creates a more powerful chip, increases die size and
results in poor yield. As the percentage of embedded memory continues to increase,
so does the chip?s complexity, density, speed and of course, the probability of failures
due to wafer defects. For SoCs to keep up their momentum and remain a viable
8
option for improving system integration and performance, the problems relating to
testing multiple high-density, multi-megabit memories must be solved [11].
Embedded memories placed on a single chip are scattered around the device and
typically have di erent types (SRAM, DRAM), sizes, access protocols, and timing.
Since on-chip  eld-con gurable memory provides signi cant memory bandwidth com-
pared to o -chip memory, memories are embedded into more recent FPGAs as well
as into CSoCs. Typically, these FPGAs contain heterogeneous memory architecture,
that is, architecture with more than one size of memory array. Due to reprogamma-
bility of FPGAs, it has been proposed that BIST capabilities can be con gured in
an FPGA to completely test the embedded memories in FPGAs and other memory
cores shared by the FPGAs [17].
In this thesis, two approaches are described for testing embedded memories in
FPGAs and SoCs. The  rst approach aims at reducing the BIST development time
when generating BIST con gurations for testing memories in di erent FPGA devices.
The second approach aims at reducing the total test-time. The  rst approach is
partly based on the BIST for FPGAs method in [18] [19]. In this approach, some
of the PLBs of the FPGA are con gured as TPGs and ORAs to test the embedded
memory. Unlike the traditional BIST for FPGAs, the basic approach here consists
of developing parameterized VHDL code for testing embedded memories of various
sizes and various types.The VHDL Code is then synthesized using Computer Aided
Design (CAD) tools to generate bit-streams. The bit-streams are then downloaded
to con gure the FPGA to test embedded memories. This approach is used in stand-
alone FPGAs which do not have the capability of dynamic partial recon guration.
9
This VHDL approach has an added advantage of portability. This reduces BIST
development time for generating BIST con guration for testing di erent types and
sizes of memory cores in di erent FPGAs. The VHDL approach was applied to
test memory modules in Atmel AT40K series FPGAs. The same VHDL code was
used with minimal changes for testing memory modules in Xilinx Virtex and Spartan
series FPGAs. Similar approaches can be used for testing other regular structure
embedded cores in FPGAs. This approach was used to test embedded multipliers in
Xilinx Virtex and Spartan series FPGAs.
For FPGA cores embedded in SoCs, which can be dynamically con gured, a
di erent approach is adopted. The embedded microcontroller can be used to test the
embedded memories in FPGAs. The microcontroller can dynamically recon gure the
memories to a di erent con guration mode and apply test patterns while PLBs are
used to perform the ORA functions. This process is repeated until all memories are
tested in all possible con gurations. For testing other memory cores accessible only
by the microcontroller, the microcontroller can be used to perform both TPG and
ORA functions. The proposed BIST methodologies are veri ed by testing embedded
memories in Atmel?s Field Programmable System Level Integrated Circuit (FPSLIC)
and Xilinx Virtex and Spartan series FPGAs.
This thesis is organized as follows: Chapter 2 gives a more detailed description
of the architecture of FPGAs and memories as well as existing BIST methodologies
for testing FPGAs and embedded memories. Chapter 3 presents the architecture,
implementation details and experimental results of the proposed BIST approaches
applied to test RAMs in the Atmel FPSLIC. Chapter 4 gives implementation details
10
and experimental results of the proposed BIST method applied to test RAMs in Virtex
and Spartan series FPGAs. The thesis is summarized in Chapter 5 with suggestions
for future research.
11
Chapter 2
Background
This chapter presents an overview of the SoCs, architecture of the FPGAs and
the memories that served as target for this thesis research. The interface of FPGA
core, memory core and processor core in the Atmel AT94K SoC is described. Also the
architecture of RAMs in Virtex and Spartan FPGAs is discussed. Finally, previous
BIST methodologies for testing FPGAs and embedded memories are presented.
2.1 System on a Chip (SoC)
From its introduction in the 1990s, the SoC has gone through many phases.
Early SoCs consisted of a central processor, memory, and random or glue logic. Glue
logic was used by designers to connect the cores to make the SoC meet a set of design
speci cations. Current SoCs comprise one or more processing blocks (microproces-
sors, DSP cores), communication cores, memory blocks (SRAM, DRAM,  ash, etc.),
random logic, analog functions and often con gurable logic [20].
The architecture of most of the current SoCs is processor driven. The central
processor in a SoC manages IP cores, on-chip memory, I/O and is thus responsible for
overall system supervision [5]. The microprocessor communicates with all other cores
through one or more on-chip busses. An alternative concept of the logic centric archi-
tecture is discussed in [21]; where in an embedded processor would be an additional
system component rather than a central component. The logic centric architecture
focuses on making programmable logic a central architectural feature.
12
Most con gurable SoCs, also called CSoCs, support programmable logic in the
form of an embedded FPGA core. Embedded FPGAs can be used to recon gure on-
chip functionality after chip fabrication. FPGAs can be used to correct any design
errors that could have occurred during chip development and also to upgrade products
to adapt to changing requirements. FPGAs are thus becoming essential components
of current SoCs. Di erent kinds of stand-alone FPGAs and their architectures are
discussed in the subsequent section [21].
2.2 FPGA Architectures
Digital logic can be implemented using either discrete logic devices (often called
Small-Scale Integrated circuits or SSI), Programmable Logic Devices (PLDs), Masked-
Programmed Gate Arrays (MPGAs), or FPGAs. SSI is used for implementing small
amounts of logic. A PLD is a general purpose device capable of implementing the logic
as two-level sum-of-products of its inputs. Power consumption and delay typically
limit its usage to implementation of eight to sixteen product terms. To implement
designs with thousands or tens of thousands of gates on a single IC, MPGAs (com-
monly called gate arrays) can be used. An MPGA consists of a base of pre-designed
transistors with customized wiring added for each design. The wiring is built dur-
ing the manufacturing process such that each design requires custom masks for the
wiring. The mask-making charges make low-volume MPGAs expensive [20].
FPGAs o er bene ts of both PLDs and gate arrays. Like MPGAs, FPGAs
can implement large designs on a single IC. FPGAs, however, eliminate each design?s
custom masking, manufacturing, test pattern generation, wafer fabrication, packaging
13
and testing when compared to MPGAs [20]. Like PLDs, FPGAs are programmable
by designers at their site. FPGAs are however a step above Programmable Logic
Devices (PLDs) in complexity [7]. This is so because FPGAs can implement multilevel
logic, while most PLDs are optimized for two-level logic [22]. Thus FPGAs o er
advantages over MPGAs of low Non-Recurring Engineering (NRE) costs and rapid
turn-around time. However, the overhead of programming circuitry that manages the
programming part of the FPGAs reduces its density. Moreover, the programmable
switches in the FPGA increases signal delay. As a result, FPGAs are larger and
slower than equivalent MPGAs [20].
FPGAs are composed of an array of PLBs interconnected with a programmable
routing network. The size, structure and number of PLBs as well as the amount of
interconnect vary considerably among FPGA architectures. This di erence is gov-
erned by di erent programming technologies and di erent target applications of the
devices. Switching elements used for programming determine whether the FPGA is
antifuse-programmed, EPROM-programmed or SRAM-programmed. Depending on
routing structure, FPGAs can be further classi ed as Symmetrical style, Island style
and Cellular style [22]. Depending on cell granularity, FPGAs can be classi ed as
either coarse grained or  ne grained. Granularity of a PLB can be de ned in many
ways: number of boolean functions that can be implemented by it, total number of
transistors it uses, total number of inputs and outputs, total normalized area, etc.
Since the switching elements are the driving force in determining the choice of logic
modules and interconnect for FPGA, they become a key to FPGA architecture [20].
14
Di erent switching elements used in manufacturing FPGAs are examined in the next
subsection.
2.2.1 Switching Elements in FPGAs
In anti-fused programmed FPGAs, anti-fuses are used as switching elements. An
anti-fuse as shown in Figure 2.1(a) is a two terminal device that changes irreversibly
from a high to low resistance state when a programming voltage is applied across its
terminals [23]. Anti-fuse falls into two categories: amorphous silicon and dielectric.
Major advantages of the Anti-fuse are its small size, relatively low on-resistance and
low parasitic capacitance [24]. The major disadvantages of anti-fuse are that it is not
reprogrammable and it requires extra fabrication steps [24].
select 
gate
floating 
gate
SRAM
cell
W1
W2
dielectric
A B
(a) Antifuse (b) EPROM (c) SRAM
pull-up 
resistor
bit line
word line
Figure 2.1: Switching elements used in FPGAs [24]
15
Switching elements used by EPROM-programmed FPGAs are similar to the
ones used in EPROM memories as shown in Figure 2.1(b). Unlike a simple Metal
Oxide Semiconductor (MOS) transistor, an EPROM transistor comprises two gates,
a  oating gate and a select gate. In the un-programmed state, no charge exists on
the  oating gate and the transistor behaves like a normal MOS transistor. When the
transistor is programmed by causing a large current to  ow between source and drain,
a charge is trapped under the  oating gate which permanently turns the transistor
o . The EPROM transistor can be reprogrammed by  rst removing the trapped
charge from the  oating gate by exposing the gate to ultraviolet light [23]. A major
advantage of this technology compared to anti-fuse is its reprogrammability. An
additional advantage is that it is nonvolatile such that no external permanent memory
is needed to program the chip on power-up. Disadvantages associated with this
technology include relatively high on resistance, requirement of additional fabrication
steps over the ordinary CMOS fabrication process [24]. EPROM transistors, however,
cannot be reprogrammed in-circuit. Electrically Erasable and Programmable ROM
(EEPROM) technology, which is similar to EPROM technology, can be reprogrammed
in-circuit. EEPROM technology, however, consumes about twice the chip area as
EPROM transistors and requires multiple voltage sources for reprogramming [25].
The Static RAM (SRAM) programming technology uses SRAM cells to control
pass gates and multiplexers as shown in Figure 2.1(c). A logic one stored in an
SRAM cell closes the pass gate and a logic zero stored in an SRAM cell opens the
pass gate. A major advantage of this approach is that SRAM cells can be programmed
in-circuit and require only standard integrated circuit process technology [24]. Thus
16
SRAM programmable FPGAs take advantage of process improvements driven by
semiconductor memories. A major disadvantage of SRAM programming technology
is its large area. It takes at least  ve transistors to implement an SRAM cell, plus
at least one transistor to serve as a programmable switch. SRAM is volatile and
thus must be programmed or con gured at the time of power-up. This requires
external permanent memory to provide the programming bit storage. Since SRAM-
based FPGAs implement logic in static gates, they consume very low power even
for large amounts of logic and have very low standby current. All these factors
have made SRAM-programmed FPGAs quite popular, and as a result, they have
become the largest selling FPGAs in the commercial market [26]. Architectures of
FPGAs discussed in the remainder of the thesis assuming that the FPGAs are SRAM
programmed unless otherwise speci ed.
2.2.2 PLB Architecture
PLBs, which form an important building block of the FPGA device, are capa-
ble of implementing both combinational and sequential logic. Combinational logic is
commonly implemented by an array of SRAM cells called the lookup table or LUT.
The LUT made of 2n SRAM cells is addressed by n inputs. A LUT shown in Figure
2.2(a) consists of 3 inputs and is capable of implementing all 28 di erent Boolean
functions of its inputs. When the FPGA is programmed, the truth table correspond-
ing to the boolean function to be implemented is loaded into the LUT. For example,
the LUT shown in Figure 2.2(a) would implement a 3-input XOR function assuming
the topmost location corresponds to highest address in this case. The inputs to the
17
LUT are logically equivalent such that changing the pin to which a signal is connected
may require rearrangement of the bits in the LUT. Multiplexers (MUXs) are often
placed at the inputs of the LUT, so that inputs to the LUT can come from any of the
routing resources. A 2-input MUX controlled by a SRAM cell is as shown in Figure
2.2(b). The number of inputs to the MUX can be increased with more SRAM cells
to act as select controls.
a
b
c
z
1
0
0
1
0
1
1
0
a) Look Up Table 
a
b
z
0
1
0/1
SRAM cell
b) Multiplexer
Figure 2.2: FPGA Programming Controlled by SRAM Cells
For implementing sequential logic, storage elements like edge-triggered  ip- ops
or level-sensitive D latches are included. MUXs are included to control routing and
additional functionality inside the PLB. Thus a PLB generally consist of LUT(s),
MUX(s) and storage elements. The size and number of LUTs along with the num-
ber of storage elements de nes the granularity of a PLB. More complex PLBs with
large LUTs, greater numbers of MUXs and storage elements comprise coarse grained
18
FPGAs. FPGAs with simpler PLBs are  ne grained. An investigation on a range of
LUT sizes and their e ect on the overall chip revealed that 3-input or 4-input LUTs
give best density for a wide range of PLBs [27].
The PLB inside Atmel?s AT40K series FPGA is shown in Figure 2.3 [28] [29].
The PLB consists of two 3-input LUTs, called X LUT and Y LUT. Functions of up to
four inputs can be implemented using the LUTs and MUXs. A Set/Reset D Flip-Flop
is provided for implementing sequential logic. Multiplexers are included for providing
a variety of functionalities like combinational logic, sequential logic, arithmetic and
DSP/multiplier modes [29].
X Y
8x1
LUT
8x1
LUT
D
Y
Clk
Reset
X
To/From Global busses
44
44
Figure 2.3: Atmel AT40K Series PLB [29]
19
PLBs inside the FPGA are typically arranged in the form of an array which
is repeated over the entire FPGA. Each cell in the array can be directly connected
to particular set of interconnect lines, called local lines. Interconnect lines to which
many PLBs in the FPGA can be connected to are called global lines. In order to speed
up signal communication through longer or heavily loaded segments of interconnects,
repowering bu ers are typically provided. In Atmel AT40K series FPGAs, an array
of 4 4 PLBs are repeated over the entire FPGA as shown in Figure 2.4. Five vertical
and  ve horizontal global busing planes are associated with all PLBs as illustrated
in Figure 2.4. Four inputs to and one output from the PLB can access any of the
 ve global busing planes associated with the PLB. For every 4 4 array of PLBs,
bus repeaters (repowering bu ers) are placed within the global routing resources to
prevent signal degradation in the process of sending signals to distant or heavily
loaded nets. Every 4 4 array of PLBs share an embedded memory block called a free
RAM. Details of these embedded RAMs are discussed in section 2.3.
The con guration memory in the FPGA dictates the behavior of each resource in
the FPGA. The FPGA is programmed by writing bits into the con guration memory,
as required by the application. The FPGA can be recon gured for another applica-
tion by writing the appropriate new bits into the con guration memory. While the
FPGA is operating, the inactive regions can be recon gured to perform di erent
operations without disturbing the active regions of operation. This type of recon gu-
ration is called partial recon guration. Dynamic partial recon guration is the process
of recon guring the active regions of the FPGA to perform a di erent function.
20
Cell
Horizontal Busing Plane
Vertical Busing Plane
I/O Pad
Free Ram
Logic Cell
Repeater
Row
Repeater
Column
Figure 2.4: PLB Array Interconnection in Atmel AT40K Series FPGAs [29]
In general, only small portions of the logic circuitry are active at any given time.
By loading the logic functions into the FPGA as required, replacing or complementing
the logic already present, logic can be implemented e ciently. This concept is called
Cache Logic [30]. Thus cache logic operates similar to cache memory; active functions
are loaded into logic cache at any given time and unused functions or variations are
stored in low cost memory.
2.3 Embedded Memories
Memory is often integrated on the chip rather than o chip for signi cant reduc-
tion in cost and size. On-chip memory interface reduces capacitive load, power, heat
dissipation and helps in achieving higher speeds [31]. For similar reasons, memories
are embedded in both FPGAs and SoCs. SoCs typically contain di erent types of
memories like SRAMs, ROMs, DRAMs and  ash memory blocks. FPGAs typically
21
contain heterogeneous memories, which can have di erent array sizes and depth. They
can also function in di erent modes like synchronous or asynchronous and single-port
or multi-port. Di erent kinds of memory technologies that exist are discussed in the
next subsection.
2.3.1 Memory Types
Memory cells can be designed in a number of ways. The structure of the memory
cells determines the type of memory chip. Figure 2.5 (a-c)shows the basic structures
of some memory cells. The memory cells shown are in the order of decreasing area and
decreasing speed. Figure 2.5 (d) shows a memory cell in a two-port memory block.
By adding more pass transistors and bit lines, a multi-port memory array can be
created. The type of memory embedded depends on the intended application of the
chip. SRAM technology is used for high speed applications. In applications requiring
large amounts of memory, DRAM technology is employed. While SRAM memories
are commonly embedded in FPGAs, a SoC can contain other kinds of memories. Most
of the FPGAs contain SRAM memories, as they are compatible with the process used
to fabricate logic on chip.
2.3.2 Embedded Memories in FPGAs
As shown in Figure 2.4, each 4 4 PLB array in the AT40K series FPGA shares
a memory block called a free RAM. These 32 4 dual-ported RAMs dispersed over
the entire array can be con gured to operate in four di erent modes: single-port
22
Vdd
Word
Word
Bit
Bit
Read
Write
Din Dout
(a) Six-transistor  SRAM cell
(b) Three-transistor DRAM cell
Data
Read/write
(c) One-transistor DRAM cell
Word1
Wor
d
2
Bit2Bit1
Word1
Wor
d
2
Vdd
(d) Two-port memory cell
Bit2 Bit1
Figure 2.5: Structure of Memory Cells
synchronous, dual-port synchronous, single-port asynchronous and dual-port asyn-
chronous. All the RAMs except those in the rightmost column of the array can
operate in all four modes. RAMs in the rightmost column of the array can operate
only in single-port modes. Free RAMs are not true dual-port RAMs as they have
separate read and write ports instead of two ports that can be used for both reading
and writing [29]. A RAM in the leftmost column has its read address lines to its
right, while a RAM in the column adjacent to the right has its read address lines
to its left. This arrangement causes each RAM to share read address with one of
23
its adjacent RAMs and write address with the other as shown in Figure 2.6. This
arrangement provides easier memory to memory interconnect interfaces to increase
the width (number of bits used) and/or height (number of words) of the overall mem-
ory. When embedding memory into an FPGA, a good memory/logic interface is
critical [8] and so dedicated routing resources are provided for the data, address and
control signals of each free RAM.
Ain
Din Dout
OEN
WEN
Aout
Clk CLR
Din Dout
OEN
WEN
Ain
ClkCLR
Aout
Din Dout
OEN
WEN
Ain
Clk CLR
Aout
Figure 2.6: Arrangement of Free RAMs in Atmel AT40K Series FPGAs
The architecture of a free RAM is as shown in Figure 2.7. In single-port mode,
the write address (Ain) lines are disconnected by opening the switch S1 and closing
switch S2 such that the read address (Aout) lines provide both the read address and
write address [29]. Data output (Dout) lines are disconnected by opening switch S4
and switch S3 is closed to read the data output from data input (Din) lines such that
the data bus is bidirectional. The tri-state bu er is enabled when output enable (OE)
is active low and data can be read out of the Din lines. In dual-port mode, switches
S1 and S4 are closed while switches S2 and S3 are opened. This enables two sets of
24
address lines and two sets of data lines for reading from and writing into the RAM
independently.
32 x 4 
Dual-port 
free RAM
Read Address
Write Address
Write Enable
Data In Data Out
Clear
Load
Latch
Latch
Latch
Clock
?1? ?1?
5
5
4
4
?1? OE
RAM Clear
Ain
Aout
WEN
Din Dout
S2
S1
S3
S4
Figure 2.7: Architecture of a Free RAM Block [29]
As shown in Figure 2.7, latches are used for synchronizing the Write Address,
data and Write Enable. Reading of the RAM is always asynchronous. Both clock
multiplexers select the clock input in synchronous mode, and select logic ?1? in asyn-
chronous mode. The Load input is connected to each bit in the RAM. In synchronous
25
mode, the Clock input is connected to each bit in the RAM, while the Clock input
is inverted and is connected to the front-end latches. When the Load input is logic
?1?, the latches are transparent. They latch the data when Load is logic ?0?. Each bit
in the RAM is also a transparent latch. Thus the front-end memory latches and the
RAM form an edge-triggered  ip- op in synchronous mode and form a transparent
latch in asynchronous mode. A RAM-Clear Byte is used to clear the contents of the
RAMs during con guration [29].
There exist two di erent implementations of on-chip memory in FPGAs:  ne-
grained and coarse-grained. In the  ne-grained approach, each LUT can be con gured
as RAM to implement large memories. In the coarse-grained approach, large memo-
ries are embedded inside the FPGA like the free RAMs in AT40K series FPGAs. This
approach results in denser memory implementation, but requires memory and logic
partitioning during FPGA design. Because of wide-varying memory requirements by
di erent applications, memory/logic partitioning might result in poor utilization of
either logic or memory. In order to avoid poor memory utilization, memory arrays
should be designed to be used for logic implementation if unused [9]. This is possi-
ble by con guring the memory as a ROM (by disabling write enable), allowing it to
function as a LUT for implementing large combinational logic. The on-chip memory,
block RAMs, in Virtex and Spartan series FPGAs, adopts this strategy.
Another important factor to be considered when embedding memory into an
FPGA is  exibility. Some applications might require a single large block of mem-
ory directly connected to logic, while some others might require smaller memories
26
connected to a common bus or smaller memories distributed over entire logic. There-
fore, embedded memories must be  exible enough to operate with di erent sizes and
widths. However, the more  exible the FPGA architecture is, the more programmable
switches and programming bits are required. Programmable switches might also add
delay to critical paths within a circuit implementation. In [32] it is shown that a
memory array containing between 512 and 2048 bits and which can be con gured for
a word size of 1, 2, 4 or 8 can result in optimum  exibility, optimum logic and storage
implementation for many applications.
The block diagram of Virtex and Spartan II family FPGAs is as shown if Figure
2.8. PLBs are arranged in a 4 6 tile and repeated over the entire array. The array is
surrounded by IOBs. Large embedded block RAMs are present on either side of the
FPGA.
??.
??.
??.??.
B
L
O
C
K
R
A
M
B
L
O
C
K 
R
A
M
B
L
O
C
K
R
A
M
B
L
O
C
K
R
A
M
IOBs
PLBs PLBs
PLBs PLBs
Figure 2.8: Block Diagram of Spartan II Family FPGAs
27
Each PLB consists of two identical slices. A slice consists of two 4-input LUTs,
two storage elements, and carry logic. Each LUT can be con gured as a 16 1-bit syn-
chronous RAM. Two LUTs in a slice can function as 16 2 or 32 1 synchronous RAMs
or a 16 1 dual-port synchronous RAM. Thus Virtex and Spartan II series FPGAs
adopt both  ne grained and coarse grained approach for on-chip memory. Large
block RAMs complement small memory structures implemented in PLBs. These
block RAMs are 4 PLBs high and are present on either side of the chip.
Figure 2.9 shows the functional block diagram a block RAM where n = 12 and m
= 16 in Virtex and Spartan II FPGAs [33]. The block RAM is a true dual read/write
port fully synchronous RAM with 4K memory cells. Each port of the block RAM can
be independently con gured as a read/write port, a read port or a write port, and
can be con gured to a speci c data width. Each port can independently access the
same 4096 locations and can be independently con gured to have data widths of 1,
2, 4, 8 or 16 bits. The four control signals (CLK, WE, EN, RST) for each port have
independent inversion control as a con guration option [33].
2.3.3 Embedded Memories and FPGAs in SoCs and their Interfacing
Almost all SoCs contain some form of embedded memory and they typically
occupy about 70 % of total chip area [34]. Embedded SRAMs are widely used be-
cause, by merging with logic, data bandwidth can be increased and hardware cost
can be reduced. However, with pad limited, multi-million gate designs, other types
of embedded RAMs are also being used [35]. The following example illustrates uses
of other types of embedded RAMs.
28
Addr A
Din A
Addr B
Din B
Dout A
Dout B
n
m
n
m
m
m
WEA
ENA
RSTA
CLKA
WEB
ENB
RSTB
CLKB
Port A
Port B
Figure 2.9: Block Diagram of a Block RAM [33]
Figure 2.10 shows connections of the data SRAM embedded in Atmel?s AT94K
series SoC called the FPSLIC. Figure 2.11 shows the partitioning of the complete
embedded SRAM in Atmel?s FPLIC. This dual-ported SRAM is 36K bytes in size and
is shared by both FPGA and AVR (Advanced Virtual Reduced Instruction Set Com-
puter) microcontroller core. The embedded SRAM is partitioned into data SRAM
and program SRAM. While program SRAM is accessible only from the AVR core, the
data SRAM is accessible from both AVR core and FPGA core. The memory block
consists of 20 Kbytes of  xed program SRAM and 4 Kbytes of  xed data SRAM.
The remaining 12 Kbytes of memory are partitioned into three 4Kx8-bit blocks and
these blocks can be con gured to be used as program SRAM or as data SRAM.
The \SOFT BOOT BLOCK" at the top of program memory is used by the chip on
power-up. The lower portion of the data SRAM (96 bytes) is not shared between the
AVR and FPGA; the AVR uses it for CPU general working registers and for memory
29
Embedded 
FPGA 
CORE
DATA SRAM
4K x 8
To
16K x 8
Embedded 
AVR CORE
FPGA Address 
lines
AVR Data 
Address Bus
16
16
FPGA Write 
Enable
AVR Write 
Enable
AVR Read 
Enable
FPGA 
CLK
AVR CLK
8
8
Data Read
Data Write
Data Read/
Write
8
B Side A Side
Figure 2.10: Embedded SRAM in Atmel?s FPSLIC [29]
mapped I/O. Therefore, on the FPGA side those bytes are available for data that is
only needed by the FPGA [29].
All cores in an SoC are connected with one or more bus structures. Bus-based
designs are easy to manage primarily because on-chip busses provide a common in-
terface by which cores can be connected [31]. Because of the diversity of embedded
cores, a segmented bus architecture is generally used [36]. FPSLIC uses a bus-based
interconnection structure.
The interfacing between FPGA core, memory core and microprocessor core in
Atmel?s FPSLIC is shown in Figure 2.12. The dual-port data SRAM core resides
between the FPGA and AVR cores enabling data exchange between AVR and FPGA
cores. Access by either core is via a 16-bit address bus and 8-bit bidirectional data
bus associated with each port. The FPGA core can also be directly accessed by the
30
4K x 8
4K x 8
4K x 8
AVR Reg 
Space
AVR Memory 
Mapped I/OFPGA 
Access Only
FIXED 
4K x 8
O
P
T
I
O
N
A
L
2K x 16
2K x 16
2K x 16
FIXED 
10K x 16
O
P
T
I
O
N
A
L
SOFT ?BOOT BLOCK?
Program SRAM 
Memory
Data SRAM 
Memory
Figure 2.11: Partitioning of Embedded SRAM in Atmel?s FPSLIC
AVR core. An 8-bit bus interconnects the FPGA core and the AVR allowing inter-
active communication. Up to 16 decoded address lines available from AVR into the
FPGA interface directly into the FPGA global busing resources. Up to 16 interrupts
are available from the FPGA to the AVR. The AVR core can also write into the
con guration memory of the FPGA core such that the FPGA can be dynamically
recon gured by the AVR during system operation, without re-downloading con g-
uration data externally. This access is illustrated in Figure 2.13, where FPGAX,
FPGAY, and FPGAZ specify the 24-bit address of the target con guration memory
byte of the FPGA to be recon gured while FPGAD speci es the byte of con guration
data to be written into the con guration memory. The X and Y address values corre-
spond to the horizontal and vertical location of the PLB, RAM or routing resource to
be recon gured. The Z address value corresponds to speci c logic, RAM or routing
resources being con gured.
31
Data SRAM
16
-
B
i
t
 A
d
dr
es
s
 
B
u
s
16
-
B
i
t
 A
d
dr
es
s
 
B
u
s
AV
R
 CL
K
8
-
B
it D
a
t
a
 
B
u
s
R
e
ad
/
W
r
i
te
 E
n
a
b
le
AVR Core
16-Bit Interrupt Bus
16-Bit I/O Memory Address Bus
8-Bit Data Bus
Read/W rite Enable
FP
G
A
 CL
K
8
-
B
it D
a
t
a
 
B
u
s
R
e
ad
/
W
r
i
te
 E
n
a
b
le
FPGA Core
SoC
A side B side
Figure 2.12: AVR-FPGA-RAM Interface in Atmel?s FPSLIC
2.4 BIST for Memories
BIST was initially developed for random logic. Later, it was used to test ROMs,
RAMs and other structured logic. The regularity of these structures leads to more
e cient test generation and fault detection algorithms than for random logic [37].
Fault models used for memories are di erent from those used for digital logic. In ad-
dition to stuck-at, bridging and stuck-on/o faults, fault models like coupling faults,
pattern sensitive faults and address decoder faults are de ned for memories. Because
of the modular nature of memory, BIST is suitable for testing both stand-alone mem-
ories and embedded memories [38]. BIST has been proven to be one of the most
32
3
2
-B
it C
o
nf
ig
ur
ation
 
W
o
r
d
FPGAD [7:0]
FPGAZ [7:0]
FPGAY [7:0]
FPGAX [7:0]
AVR Core
2 4 - B i t A d d r e s s  
+
 8 - B it D a ta
FPGA Core
X
Y
Z
Write 
Figure 2.13: AVR-FPGA Cache Logic in Atmel?s FPSLIC
cost-e ective and widely used solutions for memory testing for many reasons includ-
ing at-speed testing, on-chip pattern generation for higher controllability, on-line or
o -line testing, adaptability to engineering changes, etc, [11] [31] [39].
A large number of test algorithms have been reported in literature to test mem-
ories [39]. These algorithms are called march tests, which test the memories func-
tionally by writing patterns into the memories and reading those patterns. Many
variations of these march tests have developed taking into account various faults
models that emerged for di erent kinds of memories and thus have di erent fault de-
tection capabilities. These march tests are modi ed accordingly for testing multi-port
memories and word-oriented memories.
March tests use functional fault models for the RAM and therefore do not require
knowledge of the memory chip at the circuit level, which would otherwise complicate
the model and increase the test-time [40]. The faults detected by march tests include
33
faults present in the address decoder, data line and refresh logic along with faults in
memory array cells. Typical faults covered by most of these tests include: Stuck-at
Faults (SAFs), Transition Faults (TFs), Coupling Faults (CFs) and Neighborhood
Pattern Sensitive Faults (NPSFs) [39]. The notation used for march tests is shown
below for the example of the March Y algorithm.
March Y test :m(w0);*(r0;w1;r1);+(r1;w0;r0);*(r0);
The symbol * indicates RAM addressing in ascending order, the symbol + indi-
cates RAM addressing in descending order and the symbolmindicates RAM address-
ing in ascending or descending order. The notation w0 (w1) indicates writing all 0s
(writing all 1s). The notation r0 (r1) indicates reading all 0s (reading all 1s). March
tests are composed of march elements and these elements are separated by a semi-
colon. The length of a march Y test sequence is 8N, where N indicates the number
of words in the RAM, since the test sequence traverses the entire memory 8 times.
The march Y algorithm detects SAFs, TFs and address decoder faults but doesn?t
detect all CFs. Moreover, the use of all 0s and all 1s input patterns is not su cient
to completely detect CFs and NPSFs. To ensure that pattern sensitive faults and
CFs (both inter-word and intra-word) are detected, modi cations are made to the
march algorithms. The modi cations consist of running the algorithm with Back-
ground Data Sequences (BDS) as described in [41]. For example, the BDS for a 4-bit
memory are: 0000(1111), 0101(1010) and 0011(1100). The number of BDS added is
equal to log2(K)+1, where the number of bits in a memory word is equal to 2K.
34
2.4.1 Present Methods for Testing FPGAs and SoCs
Di erent approaches exist in the literature for testing FPGAs [18] [19] [42] [43],
[44] [45]. In [44] an approach for testing PLBs of an FPGA is presented. An external
memory is used for storing test con gurations and also test patterns. This approach
is dependent on the number of inputs and outputs of a PLB and also on the nature of
the PLB (combinational or sequential). Test con gurations are developed after parti-
tioning PLBs into modules: a combinational module and a sequential module. PLBs
are connected to form one dimensional arrays and are tested in parallel. This ap-
proach was applied for testing PLBs in the Xilinx 4000 series FPGAs. This approach
needed 21 test phases and around 102 test vectors for completely testing the PLBs
including their RAM modes of operation [44]. Each time the FPGA is recon gured
to test any resource, it is referred to as a test phase.
A BIST approach for testing PLBs in SRAM based FPGA was proposed in
[18]. In this approach, the BIST logic is created using the FPGA logic resources
during o -line testing, which takes advantage of the in-system reprogrammability
of the SRAM-based FPGAs and thus eliminating area overhead for BIST circuitry.
This BIST approach is applicable at all levels of testing (wafer, package, board and
system) and also provides at-speed testing [18]. Unlike the previous approach, the
Test Pattern Generator (TPG) is built inside the FPGA. This approach, however,
requires storage space for BIST con guration  les. This approach involves using the
PLBs as TPGs, Blocks Under Test (BUTs) and Output Response Analyzers (ORAs)
as shown in the Figure 2.14. The functionality of the BUTs is changed during each
con guration until all the logic resources in the PLBs are tested. Each con guration
35
is downloaded into the FPGA and resulting ORA responses are obtained. Due to
penalties involved in storing expected responses, the ORA compares test responses
from two adjacent BUTs. For reliability reasons, each PLB is monitored by two
di erent ORAs and compared with two di erent BUTs. This approach yields correct
results as long as all the PLBs being compared do not contain functionally equivalent
faults. For completely testing PLBs and completely testing LUTs in RAM mode,
using the above mentioned approach, a total of 9 BIST con gurations were required
for ORCA2C series FPGAs and 14 BIST con gurations were required for ORCA2CA
series FPGAs [45]. A similar approach was adopted to test all logic resources in Xilinx
4000 and Spartan series FPGAs in [19]. LUTs that can be con gured as RAMs in
Xilinx series FPGAs are tested in [19] using the approach described in [18]. For testing
all the logic resources including LUTs in RAM mode, a total of 12 con gurations were
required for Xilinx 4000 and Spartan series FPGAs. A similar approach can be applied
for testing other resources, like embedded memories and interconnect, in the FPGAs
without any area overhead.
An approach to test the memory modules (LUTs in RAM mode) in SRAM-
based FPGAs is presented in [46]. The approach aims at reducing the number of test
con gurations by taking into account the fact that the number of cells in memory
modules in a PLB is very small. A memory module with n inputs and 2 memory
modes (ROM and RAM) can be tested with 3n con gurations and 8n  2n test
patterns using this approach.
The concept of con guration-dependent testing was introduced in [47]. Con guration-
dependent testing involves determining that a particular con guration is fault free to
36
 
TPG TPG 
BUT 
BUT 
ORA 
BUTT 
BUTT 
ORAA 
BUTT 
BUTT 
ORAA 
BUT 
BUT 
ORA 
Figure 2.14: BIST Architecture for Testing PLBs in FPGAs [18]
reduce test time. In [47], the logic function of the original application is modi ed
to test the con gured interconnect structure. Since only logic functions are changed,
time consuming placement and routing is avoided between test con gurations. A
similar approach for testing interconnect structure presented in [48] reduces the num-
ber of test con gurations to 20 for testing the largest mapped design in the largest
commercially available FPGA.
One limitation of BIST for FPGAs is that though the BIST architecture is
generic, speci c test con gurations are not [49]. The BIST con gurations have to
be redeveloped for every new PLB and/or interconnect architecture. Therefore, even
though any of the above mentioned approaches can be applied to test all resources
in FPGAs, di erent test con gurations have to be developed for di erent FPGA ar-
chitectures. Most of the approaches mentioned above are similar in the sense that
they make use of the recon gurability feature of FPGAs to test the FPGA. However,
37
each of these approaches aims at reducing the total number of test con gurations so
that number of downloads can be reduced and with it the total test-time, since the
downloading process is the major component in FPGA test-time and cost.
2.5 Thesis Restatement
Existing methods for testing stand-alone FPGAs can also be applied for test-
ing FPGAs embedded in SoCs. However, utilization of some of the SOC features
(like accessibility of all cores by the embedded microcontroller) can help in devel-
oping a di erent test strategy that would reduce total test-time. Techniques used
to test embedded cores in FPGAs are described in [50] [51] [52] [53] [54]. Usage of
wrappers around memory and other cores for testing is described in [55]. In [54]
possible use of an embedded microcontroller core for testing all the accessible cores
in a SoC is discussed. This approach of using the microcontroller core for testing
other embedded cores forms the basis for one of the test techniques presented in this
thesis. Most of the current SoCs contain FPGA cores and memory cores. More-
over, the FPGA cores can be recon gured at run-time by the microcontroller core,
which is generally the central core in a SoC. The microcontroller can be used to test
FPGA cores and dynamic recon guration feature can be used to reduce number of
downloads and hence the overall test-time. The implementation details and results
of this approach as applied for testing memory modules in Atmel?s AT94K SoC are
discussed in Chapter 3. Test development time can be reduced signi cantly if BIST
con gurations developed are portable. VHDL can be used to develop portable code
for testing embedded memories in any FPGA. The other technique presented in this
38
thesis is development of portable code using VHDL for testing both embedded RAMs
and distributed RAMs in FPGAs. This approach uses the FPGA logic to test the
memory components. The details of this approach as applied to test memory cores
in Atmel?s AT40K FPGAs is presented in Chapter 3. Chapter 4 explains how this
approach is used to test embedded memories and other regular structure cores like
multipliers in Xilinx Spartan and Virtex FPGAs with minimal changes.
39
Chapter 3
Implementation Of BIST On ATMEL FPGAs And SoCs
The BIST approaches developed for testing RAMs in Atmel?s AT40K series
FPGAs and AT94K series SoCs are discussed in this chapter. The BIST architectures
and their implementation details are presented along with results from actual testing
of two di erent SoCs in the AT94K series. Finally, improvements to the performance
of BIST for RAMs in SoCs are also discussed.
3.1 RAM BIST Approaches
Two di erent approaches are followed for testing free RAMs in Atmel?s AT40K
series FPGAs and AT94K series FPSLIC. While the  rst approach is applicable for
both the devices, the second approach is only applicable for the FPSLIC. In the  rst
approach, all the BIST circuitry (TPG and ORAs) is built using FPGA logic re-
sources. This approach is suitable for testing RAMs in stand-alone FPGAs (which do
not have an embedded processor with partial recon guration capability) like AT40K
series FPGAs. In the second approach, TPG signals are generated by the embedded
microcontroller core (AVR) and the ORA is built using the FPGA logic. This ap-
proach is more suitable for testing embedded RAMs in SoCs and embedded FPGA(s)
where the FPGA can be accessed and can be partially recon gured from an embed-
ded microcontroller core. This approach is, therefore, applicable only for the SoCs.
A mixture of these two approaches is used for testing data SRAM shared by both the
FPGA and the AVR in FPSLIC.
40
Free RAMs in AT94K and AT40K series FPGAs can be con gured to operate as
single-port RAMs or dual-port RAMs in both synchronous and asynchronous modes
and have to be tested in all modes of operation. Only three modes are su cient
to test free RAMs completely. The three modes are single-port synchronous mode,
single-port asynchronous mode and dual-port synchronous mode. Free RAMs are
not truly dual-ported and also the read-port is asynchronous. As a result, free RAMs
need not be tested in dual-port asynchronous mode. Also BDS are employed only in
single-port synchronous mode of testing. Coupling faults and neighborhood pattern
sensitive faults detected using BDS are memory speci c and need not be detected
again.
3.1.1 BIST Approach for Free RAMs Using FPGA Logic
In this approach, the TPG which generates march sequences is built using FPGA
logic resources. The ORA, responsible for comparing output responses and storing
BIST results, is also built using FPGA logic resources. March algorithms used for
testing free RAMs and BIST architectures used in this approach are explained in the
following subsections.
3.1.1.1 BIST Architecture for Dual-port Synchronous Mode
The BIST architecture used for testing free RAMs in dual-port synchronous mode
is shown in Figure 3.1(a). All RAMs are tested in parallel using a single TPG and
the ORA is designed to compare outputs of two adjacent RAMs. All RAMs except
those on the rightmost and leftmost columns are compared by two ORAs. Two TPGs
41
are generally used for this kind of BIST architecture to make sure that TPG is not
faulty. But the Finite State Machine (FSM) based TPG is too large to replicate and
 t inside the device. Therefore, it is assumed that the logic and routing resources
are known to be fault-free as a result of previously executed BIST for programmable
logic and routing resources [56].
ORA
RAM
TPG
(a)
PLB PLB
Data from RAM1
Data from RAM2
Shift Data
Shift Control
Clk Reset
Shift Data 
to Next 
ORA
(b)
Figure 3.1: a) Dual-Port Free RAM BIST Architecture b) ORA Design
The design of a single-bit ORA which uses two PLBs is shown in Figure 3.1(b).
The ORA latches a logic ?1? if any mismatch occurs at the RAM outputs during the
march sequence. All the ORAs are connected in the form of a scan chain to shift
42
the BIST results out. At the end of the BIST sequence, when the shift control pin is
high, the ORA acts as a shift register. Four single-bit ORAs are associated with each
RAM. In a N N device, where N is the number of PLBs along one dimension of the
FPGA, there are (N /4) ((N /4)-1) dual-port RAMs, as the RAMs in the rightmost
column cannot act as dual-port RAMs. Since each bit in the left and right columns
of the dual-port RAMs is being compared by only one ORA, the number of PLBs
used for the ORA is equal to N (N /4-2)/2.
The TPG generates a march sequence to supply RAM with data, address and
control signals. The DPR march algorithm used to test dual-port RAMs in [19] is
slightly modi ed and used to test dual-port free RAMs. The modi ed DPR march
sequence used is as follows:
DPRTest : m(w0 : n);+(n : r0);*(w1 :+r1);+(w0 :*r0);
The notation used is as described in Chapter 2. Here, ?n? indicates no operation
on that particular port and the colon separates operations performed on the write
and read ports. The TPG is implemented as a FSM in VHDL and it synthesized to
66 PLBs. Four I/O pins are used: CLK input for running BIST and scanning results
out, RESET input for resetting the TPG and the ORA, SHIFT input for shifting
results out and SCANOUT data output for reading the results. The SHIFT pin also
goes to Shift Data input (as shown in Figure 3.1(b)) of the last ORA in the chain
and thus produces all 1s at the end of scan chain when shifting out the BIST results.
This provides a sanity check on the ORA data and assists in detecting certain faults
in the ORA scan chain [68]. The total number of clock cycles required for running the
43
BIST and retrieving the results is equal to 2112 + N ((N /4)-2), where N indicates
number of PLBs along one dimension of the array.
3.1.1.2 BIST Architecture for Single-port Modes
The BIST architecture for testing free RAMs in single-port synchronous and
asynchronous modes is similar and is as shown in Figure 3.2(a). All RAMs are
tested in parallel using a single TPG and the ORA compares data from RAMs with
expected read data results generated by the TPG. The design of the single-bit ORA is
shown in Figure 3.2(b). A tri-state bu er is required in this design as the write-data
lines are used for both reading and writing data in single-port mode. The active high
tri-state bu er in the ORA passes TPG data through when writing into the RAM and
is tri-stated when reading from the RAM which allows the read data to be compared
with expected data from the TPG. The tri-state bu er is controlled using the OEN
signal which also goes to the active low Output Enable signal of the RAM.
The ORA design for single-port mode, though not as simple as dual-port design,
makes diagnosis of RAMs much simpler. Such a design is not used in dual-port mode
because the generation of expected results by the TPG is more complicated as data
can be read and written at the same time and also routing resources are not su cient
to implement such a design. For a N N device, a total of N N /2 PLBs are used
for the ORAs.
In synchronous single-port mode of operation, the March LR [57] algorithm is
used to test the free RAMs. The algorithm is modi ed by including BDS to test for
intra-word CFs and neighborhood pattern sensitive faults [41]. The length of the test
44
(a)
ORA
RAM
TPG
PLB
Shift Data
Shift Control Clk Reset
Shift Data 
to Next 
ORA
(b)
PLB
TPG Data
OEN
Data to/from RAM
Figure 3.2: a) Single-port Free RAM BIST Architecture b) ORA Design
sequence is 30 N, where N =32 for a free RAMs. The TPG is implemented in VHDL
and is synthesized to 123 PLBs. The sequence is as follows:
March LR test :m(w0000);+(r0000;w1111);*(r1111;w0000;r0000;r0000;
w1111);*(r1111;w0000);*(r0000;w1111;r1111;r1111;w0000);*(r0000;w0101;
w1010;r1010);+(r1010;w0101;r0101);*(r0101;w0011;w1100;r1100);+(r1100;
w0011;r0011);*(r0011);
The same four I/O pins used in dual-port mode are used in single-port mode.
The total number of clock cycles required for running the BIST and retrieving the
45
results is equal to 960 + N N /4, where N indicates number of PLBs along one
dimension of the array.
In asynchronous mode of operation, the March Y [39] algorithm with no BDS is
used. BDS are primarily used for testing intra-word CFs and neighborhood pattern
sensitive faults and since they have already been tested in single-port synchronous
mode, BDS are not used in asynchronous mode of testing. The TPG is implemented
in VHDL and is synthesized to 18 PLBs. The sequence is as follows:
March Y test :m(w0);*(r0;w1;r1);+(r1;w0;r0);*(r0);
The length of the test sequence is 8N, where N =32 for a free RAM. Total number
of clock cycles required for running the BIST and retrieving the results is equal to
256 + N N /4, where N indicates number of PLBs along one dimension of the array.
Table 3.1 shows results of timing analysis performed for the three BIST con gurations
on AT94K40 device which contains an array of 48 48 PLBs.
Table 3.1: Timing Analysis Results for Three RAM BIST Con gurations
Mode Maximum Clock Frequency
Dual-port synchronous 17.7 MHz
Single-port synchronous 12.3 MHz
Single-port asynchronous 21.4 MHz
Fault simulation was carried out on a gate-level model of free RAM developed for
stuck-at fault coverage using AUSIM [58]. The model is described in ASL and is as
listed in Appendix A. Individual fault coverage for dual-port, single-port synchronous
and single-port asynchronous modes was found to be 75.74%, 81.79% and 75.56%
respectively as shown in Figure 3.3. A cumulative fault-coverage of 99.81% was
46
obtained for all three test con gurations. A total of 1870 single stuck-at faults exist
in the model and 6 faults were found to be undetected.
0
20
40
60
80
100
123
RAM Configuration
Fa
ult
 C
o
v
e
r
a
ge
 (
%
)
Individual FC Cumulative FC
D
P
S
Y
N
C
S
P
S
Y
N
C
S
P
A
S
Y
N
C
 
Figure 3.3: Fault Simulation Results for Free RAM
3.1.2 Advantages and Limitations of Using VHDL
Parameterized VHDL code is used to implement the BIST logic. This makes the
design portable and thus can be migrated onto other chips with minimal changes.
But for diagnosis of faults based on BIST results, some support is needed from the
synthesis tool. Unless the placement of RAMs with respect to the ORAs can be
controlled during synthesis, faulty RAMs cannot be identi ed from BIST results.
Placement cannot be controlled from Atmel?s synthesis tool (called Figaro) if VDHL
modeling is used. The solution is to either manually place the RAMs or maintain
47
mapping information for identifying physical locations of the faulty RAMs from BIST
results. As a design gets larger, manual placement becomes tedious. Also, the map-
ping information may change every time the design is synthesized. As a result, the
VHDL-only approach didn?t prove to be bene cial for Atmel?s FPGAs. A propri-
etary HDL is provided by Atmel for AT40K and AT94K devices. This language,
called Macro Generation Language (MGL) [59], has features similar to VHDL and
also has features that allow placement of logic blocks and de ne routing of intercon-
nect resources. As a result, a mixed approach is used by making use of both VHDL
and MGL. While MGL is used to de ne placement of RAMs and ORAs and their
interconnection, VHDL is used for the TPG. Since MGL does not support behavioral
description, designing the TPG with MGL implies transformation of the netlist into
MGL which would not be an easy task due to the complexity of the TPG. In order
to reduce development time, TPG is modeled using VHDL.
The fact that RAMs embedded in FPGAs can operate in di erent modes a ects
the portability of VHDL. Furthermore, memories may have to be tested with dif-
ferent march sequences if memory technology changes. With the change in memory
technology, fault models adopted for testing may change and this results in redevel-
opment of VHDL code. To avoid this problem to some extent, a tool was created
which generates VHDL code automatically for a particular march sequence. This
tool, called RAMBISTGEN, generates VHDL code for any size of the memory and
for any active edge of the clock and any active levels for control signals. However, this
tool is currently capable of generating code for only single-port march sequences. A
snapshot of the tool is shown in Figure 3.4. This tool takes the input  le containing
48
 
Figure 3.4: Snapshot of The RAMBISTGEN Tool
the march sequence and produces an output  le containing the resulting VHDL code.
The format for the input  le is as follows:
<u=d><r=w><data> [;<r=w><data>]::::.
The above line represents the format for a march element of the sequence. Each
line starts with ?u? or ?d? indicating the addressing order in up or down direction,
respectively. This is followed by ?r? or ?w? indicating read or write operation, re-
spectively. This is followed by the data to be read or written. Di erent read and
write operations are separated by a comma. Each line represents a di erent march
element. Address bus width is speci ed by the user in the GUI and data bus width is
interpreted from the read/write data in the input  le. The format of the input  les
for March Y and March LR algorithms is shown in Table 3.2.
49
Table 3.2: RAMBISTGEN Input File Format for March Y and March LR
Algorithm Input File Format
m(w0);
*(r0;w1;r1);
+(r1;w0;r0);
*(r0);
d w 0
u r 0 , w 1, r 1
d r 1, w 0 , r 0
u r 0
m(w0000);
+(r0000;w1111);
*(r1111;w0000;r0000;r0000;w1111);
*(r1111;w0000);
*(r0000;w1111;r1111;r1111;w0000);
*(r0000;w0101;w1010;r1010);
+(r1010;w0101;r0101);
*(r0101;w0011;w1100;r1100);
+(r1100;w0011;r0011);
*(r0011);
u w 0000
d r 0000 , w 1111
u r 1111, w 0000, r 0000, r 0000, w 1111
u r 1111, w 0000
u r 0000, w 1111, r 1111, r 1111, w 0000
u r 0000, w 0101, w 1010, r 1010
d r 1010, w 0101, r 0101
u r 0101, w 0011, w 1100 , r 1100
d r 1100, w 0011, r 0011
u r 0011
The input is not case sensitive. The tool generates approximately 140 and 300
lines of VHDL code for March Y and March LR algorithms, respectively. The tool
interprets the input  le as follows:
1. Each line of the  le is categorized as a phase.
2. All the words separated by a comma are treated as di erent elements of that phase.
For example, in u r 1111, w 0000 there are two elements: r 1111 and w 0000.
3. During FSM implementation, each phase is treated as a separate state and each
element of that phase forms a sub-state of that phase.
The resulting VHDL code for the above march sequences is given in Appendix
B. The tool was developed using Tool Control Language/Tool Kit (Tcl/Tk) and is
compatible with Windows and Linux environments. The line count for the source
code is 400.
50
3.1.3 BIST Approach for Free RAMs Using Embedded Processor Core
The idea of this approach is to generate TPG signals from the embedded proces-
sor core. As a result, this approach is applicable only to the FPSLIC. The processor
is also responsible for running the BIST, retrieving the BIST results, diagnosing the
results and reporting back the diagnostic results to a higher controlling device (PC for
example). The embedded processor in the FPSLIC can write into the con guration
memory of the FPGA. This capability of the processor is used in combining the three
RAM BIST con gurations into one con guration. The free RAMs are initially con g-
ured in dual-port synchronous mode for running BIST. Then RAMs and FPGA logic
are recon gured to test RAMs in single-port synchronous and asynchronous modes.
Thus, by avoiding two of the three downloads, testing time can be reduced signi -
cantly (approximately 3 times). Since only one bit-stream has to be stored instead of
three, memory requirements are also reduced by a factor of three. The TPG is very
irregular in structure. The rest of the circuit containing ORA and RAMs and can
be made regular. Thus, by making the BIST circuitry inside the FPGA regular, the
entire BIST logic to be built inside the FPGA (RAMs, ORAs and interconnections)
can be algorithmically con gured by the processor. This further reduces testing time
because no bit-stream needs to be downloaded into the FPGA and requires just a
download into program memory of the AVR.
3.1.3.1 AVR-FPGA Interface Description
Before describing the actual implementation, the AVR-FPGA interface has to
be reviewed. The interface is illustrated in Figure 3.5. Data can be written into
51
the FPGA from the AVR through the AVR Data bus using any of the 16 IOSEL
lines. Whenever data is written into the AVR Data bus using one of the IOSELn
lines, the FPGAWE line and corresponding IOSELn line are asserted high for one
AVR clock cycle after stable data is produced on the AVR Data bus. Data can be
read from the FPGA through the AVR Data bus using any of the 16 IOSEL lines.
When reading data from the FPGA into the AVR Data bus, the FPGARE line and
corresponding IOSELn line is asserted high for one AVR clock cycle before stable
data is produced on the AVR Data bus. According to the FPLSIC datasheet [29], in
order to use IOSELn lines as a clock inside the FPGA, they have to be quali ed with
the FPGAWE or the FPGARE line.
FPGA Side AVR Side
FPGA WE
FPGA RE
IOSEL0
IOSEL15
IOSEL1
8
AVR Data
Figure 3.5: AVR-FPGA Interface
52
3.1.3.2 BIST Architecture
The architecture used is similar to the one used in the previous approach ex-
cept that the TPG signals are generated by the processor. In dual-port mode, as
in the previous approach, each ORA compares two adjacent RAMs as shown in Fig-
ure 3.6(a). In single-port mode, each ORA compares data from RAM with expected
data generated by the processor as shown in Figure 3.6(b).
ORA
RAM
Processor
TPG signals
ORA
RAM
Processor
(a)
(b)
TPG signals
Figure 3.6: Architecture of RAMBIST From AVR (a)Dual-port Mode (b) Single-port
Mode
3.1.3.3 Implementation of BIST Approach in FPSLIC
Initially free RAMs are con gured to be tested in dual-port mode. The FPGAWE
and FPGARE lines are used as clocks for running BIST and for retrieving BIST
results, respectively. The AVR Data bus is used for providing address, data and
output enable signals to the free RAMs. Since the 8-bit wide data bus is not su cient
to provide all required signals, all signals are registered as shown in Figure 3.7. The
53
IOSEL lines are used as enable signals for the registers. The function of each IOSELn
is shown in Table 3.3. IOSEL0 is used as global reset signal for clearing the ORAs.
The two registers are selected by IOSEL1 and IOSEL2 lines respectively. IOSEL3
line is used as clock enable for running BIST.
AVR
FPGA
R
e
g
1
R
e
g
2
ORA and RUTs
TPG Data
Figure 3.7: RAMBIST Implementation from AVR
Table 3.3: Function of IOSEL Lines
IOSEL Line Function
IOSEL0 Reset
IOSEL1 Reg1 Enable
IOSEL2 Reg2 Enable
IOSEL3 BIST CLK Enable
As shown in Figure 3.7, two registers are used inside the FPGA for storing TPG
signals: an 8-bit wide register(Reg1) and a 5-bit wide register(Reg2). The contents
54
of Reg1 and Reg2 are shown in Table 3.4 and Table 3.5, respectively. In single-port
mode, bits 0-3 of Reg1 provide BDS for the RAM. Since no BDS are used in dual-port
mode, bit 0 is used to de ne whether all 0s or all 1s are written into RAM. Bit 7 of
Reg1 is used to control the data and OEN that to goes to each RAM in all modes
of testing. Bits 1-5 of Reg1 provide read address and bits 0-4 of Reg2 provide write
address in dual-port mode. In single-port mode, bits 0-4 of Reg2 provide address for
RAMs. Bit 6 of Reg1 is used as shift signal to read the contents of the ORA after
testing is completed.
Table 3.4: Contents of Reg1
B7 B6 B5 B4 B3 B2 B1 B0
Dual/Single
Port Shift RAddr4 RAddr3
RAddr2/
Data3
RAddr1/
Data2
RAddr0/
Data1 Data0
Table 3.5: Contents of Reg2
B5 B4 B3 B2 B1 B0
OEN WAddr4 WAddr3 WAddr2 WAddr1 WAddr0
MGL is used to de ne the placement of RAMs and ORAs and interconnection
between them. VHDL is used de ne the registers and also to activate the AVR-
FPGA interface. The AVR can write into FPGA con guration memory but, cannot
read back data from con guration memory. Moreover, the AVR can only write each
byte in the con guration memory. Some bytes in the con guration memory are shared
by both logic and routing resources. Therefore, it is important to know underlying
routing architecture when recon guring the FPGA from the AVR for testing RAMs
55
in a di erent mode. Therefore, MGL was used not only for placing RAMs and ORAs
but also for controlling the routing.
The program in the AVR memory for running the BIST is implemented in C lan-
guage. Once the program is downloaded, the AVR waits for a valid instruction from
a higher controlling device (PC in our case). AVR can be instructed to either run
BIST or to run diagnostics. If instructed to run BIST, AVR would return pass/fail
status to the PC after running BIST for a particular mode. If instructed to run diag-
nostics, AVR would return diagnostic results to the PC after executing the diagnostic
algorithm on the test results. A four-wire serial communication protocol is used for
communication between the PC and AVR.
3.1.4 On-Chip Diagnostics
AVR is not only capable of executing the BIST sequence and retrieving the
BIST results but also capable of performing diagnostic procedures based on the BIST
results for the identi cation of faulty RAMs in the FPGA core. The AVR, after
running diagnostic procedures, identi es the location of the faulty RAM in terms
of it?s X (column) and Y (row) coordinates. The AVR also identi es which bit(s)
of the RAM is faulty. Since two di erent BIST architectures are used for testing
free RAMs, two di erent diagnostic procedures were developed. In single-port test
con guration, the ORA compares the expected results generated by the AVR with
the data read from the RAMs Under Test (RUTs). Since the ORA incorporates
a shift register, the BIST results latched in the ORA are retrieved by the AVR.
Each bit retrieved corresponds to a single-bit of the 4-bit words of the RAM. The
56
position of the ORA in the FPGA array, and the corresponding RAM with which it is
associated, is determined by the ORA?s position in the shift register. As a result of the
ORA comparison of the RUTs output with the expected read results produced by the
TPG, the diagnostic procedure for the single-port RAM modes of operation is straight
forward. The diagnostic procedure looks for ORA failure indications (logic 1s) and
translates the positions based on the shift register order to identify not only which
RAMs are faulty but also which bits in a given RAM are faulty. A faulty ORA can
mimic a fault in its corresponding RAM. This can be identi ed when PLBs are tested.
In dual-port test con guration, since each ORA compares two adjacent RAMs
a di erent diagnostic approach is used. The Multiple Faulty Cell Locator (MULTI-
CELLO) algorithm originally developed for diagnosing faulty PLBs in FPGAs [45] is
used for diagnosis of dual-port RAMs. This procedure is more complicated because it
is possible that equivalent faults in two RAMs being compared by the same ORA will
go undetected. Since all the RAMs except those at the leftmost and the rightmost
edges of the FPGA are being observed by two sets of ORAs and being compared to
a di erent RAM in each set of ORAs, it is highly improbable for the faulty RAMs
to go undetected. This approach however loses diagnostic resolution for the RAMs
at the leftmost and rightmost edges of the FPGA. MULTICELLO algorithm marks
the faulty status of the RAMs to be unknown if the results indicate that there is any
possibility of faults. These ambiguities in the diagnosis can be overcome by rotating
the RAM BIST architecture by 90 such that rows of ORAs are comparing rows of
RAMs with the diagnostic procedure applied to the new BIST results. This procedure
of rotating and running the BIST has to be applied at the cost of increased testing
57
time. The MULTICELLO algorithm as applied to the dual-port RAMs is described
in [60].
The diagnostic procedures were implemented and veri ed in compiled C programs
that were downloaded into and executed by the AVR. These diagnostic procedures
require around 1.3K bytes of program memory irrespective of the device size. How-
ever, amount of data memory required changes linearly with the size of the device
because of the change in the amount of ORA data. Table 3.6 summarizes the pro-
gram and data memory requirements for AT94K40 device which contains a 12 12
array of RAMs (the largest in the AT94K series). Memory requirements for carrying
out BIST and diagnosis are listed individually. BIST and diagnosis are run at 20MHz
and the number of clock cycles required is also listed in Table 3.6. Implementation of
diagnostics would increase the testing time by 21%. However, the AVR would perform
diagnostics only when it receives such an instruction from a higher controlling device
and this reduces testing time. AVR is instructed to execute diagnostics procedures
only when BIST results indicate some failure and the failure analysis is of interest.
Table 3.6: BIST and Diagnosis Summary
Function Execution Cycles Program Memory (bytes) Data Memory(bytes)
BIST 398,100 1,860 72
Diagnostics 110,000 1,330 132
Total 508,100 3,190 204
58
3.2 Data SRAM Testing
Apart from free RAMs embedded in the FPGA, there exists a 36K bytes data
SRAM and program SRAM. The size of the data SRAM can vary from 4K bytes to
16K bytes and rest of the memory portion acts as program memory for the AVR. The
data SRAM is a dual-port RAM accessed by both FPGA and AVR from di erent
ports except for the lower 4K bytes portion which is accessible only by the FPGA.
The program SRAM, however, can be accessed only by the AVR. But the program
SRAM cannot be directly written or read from the AVR. Therefore the program
SRAM cannot be tested from the AVR. The dual-port data SRAM has to be tested
for both cell-related faults and port related faults. Therefore, the data SRAM has to
be tested in three di erent modes as shown in Figure 3.8. In the  rst testing mode,
the data SRAM is treated as a single-port RAM accessible by the FPGA and is tested
from the FPGA. In the second testing mode, the data SRAM is treated as single-port
RAM accessible by the AVR and is tested from the AVR. In the third testing mode,
the data SRAM is tested for port related faults with assistance from both FPGA and
AVR. While testing from FPGA, the data SRAM is con gured to be 16K bytes in
size.
Since the data SRAM cannot be con gured directly to be a single-port RAM i.e.,
accessible only from one side at all times, care has to be taken so that the contents of
RAM are not modi ed from one port when testing from the other port. The AVR uses
some portion of the data SRAM as a data segment for storing stack data and other
temporary variables. Therefore, when testing with BIST circuitry inside the FPGA,
there is a possibility that some previously stored data in the program memory of the
59
AVR results in AVR stacking data in the data SRAM. To avoid failure results in such
a case, AVR has to be restricted from writing into the data SRAM. In  rst mode
of testing, this was achieved by having AVR execute an instruction which always
branches to the same location.
F
P
G
A
data 
SRAM
A
V
R
data 
SRAM
A
V
R
data 
SRAM
F
P
G
A
(a) (b) (c)
Figure 3.8: Three Con gurations for Data SRAM testing (a) for Single-port Faults
from FPGA (b) for Single-port Faults from AVR (c) for Dual-port Faults from both
AVR and FPGA
The March LR with BDS is used to test data SRAM in the  rst mode of testing.
VHDL was used to implement March LR as a FSM and, when synthesized, 230 PLBs
are used for implementing the TPG in the FPGA and 16 PLBs are used for the ORA.
The ORA is con gured as a scan chain for reading the BIST results. Diagnosis is
simple and is limited to indication of the faulty bit(s) of the RAM.
March LR with BDS is used for testing the 12K bytes portion of data SRAM
accessible from AVR. Since some portion of data SRAM is used by the AVR for
stacking data, two BIST con gurations are required to completely test the data SRAM
60
from AVR. The data segment is relocated in the second con guration to test the
portion of RAM not tested in  rst con guration.
March d2pf and March s2pf algorithms [61] are used for testing the data SRAM
from both ports. The notation for these algorithms is as shown below.
March s2pf : m (w0 : n);* (r0 : r0;r0 :  ;w1 : r0);* (r1 : r1;r1 :  ;w0 : r1);+
(r0 : r0;r0 : ;w1 : r0);+(r1 : r1;r1 : ;w0 : r1);+(r0);
March d2pf :m(w0 : n);*C 1c=0 (*R 1r=0 (w1r;c : r0r+1;c;r1r;c : w1r 1;c;w0r;c : r1r 1;c;r0r;c :
w0r+1;c));*C 1c=0 (*R 1r=0 (w1r;c : r0r;c 1;r1r;c : w1r;c 1;w0r;c : r1r;c+1;r0r;c : w0r;c+1));
?C ? represents the column width and assumed to be ?1? while implementing the
March d2pf algorithm. The algorithms are implemented in compiled C code down-
loaded into the program memory of AVR. Three registers are built in the FPGA to
store address, data and control signals for testing the data SRAM from the FPGA
side. The AVR stores these registers before clocking the data SRAM from the FPGA
side. The contents of the three registers are as shown in Table 3.7.
Table 3.7: Contents of Registers Used for Testing Data SRAM
Reg1(B7-B0) Reg2(B7-B0) Reg3(B3-B0)
SRAM Address 7-0 SRAM Address 15-8 Reset ORA Enable Data WEN
Reg1 and Reg2 are used for storing the SRAM address and Reg3 is used for
storing control signals for the RAM and the ORA. All three registers are controlled
by the system clock that comes from the AVR. The three registers are enabled by
IOSEL0, IOSEL1 and IOSEL2 lines, respectively. The ORA Enable signal is used to
disable the ORAs while writing the data into the data SRAM. The Reset signal is
used for resetting the ORAs before running BIST. WEN is the write enable signal
61
for the data SRAM. The Data bit of Reg3 indicates if data to be written into or
read from the data SRAM is either all 1?s or all 0?s. This data is compared by the
ORA with the data from the data SRAM during read operation. The ORA and the
data SRAM are clocked by the FPGAWE signal and IOSEL3 is used as clock enable
signal.
Because the AVR uses some portion of the data SRAM for stacking data, two
con gurations are required to test the data SRAM in dual-port mode due to stack
relocation. A total of  ve con gurations are required for completely testing the data
SRAM: three single-port tests and two dual-port tests. These  ve con gurations are
reduced to three by combining two single-port tests with two dual-port tests. This
helps in reducing the testing time and also memory storage requirements.
3.3 Summary
Two approaches for testing the embedded memory cores were presented in this
chapter. The  rst approach aims at developing an FPGA independent BIST for em-
bedded memories. This is done by developing a parameterized VHDL code which
is portable and can used to test embedded memories in any FPGA with minimal
changes. However, for diagnosis, some support for placement control is needed from
the tools which synthesize the VHDL code. This approach was applied to test embed-
ded RAMs in Atmel?s FPGAs and SoCs. The portability of this approach is tested
by applying this approach to test embedded cores in Xilinx Virtex and Spartan series
FPGAs. The details of applying this approach to Xilinx devices and results of this
approach are discussed in Chapter 4.
62
The second approach aims at reducing the testing time. This approach, appli-
cable to SoCs, requires assistance from an embedded microcontroller. The partial
recon guration capability of the microcontroller can be used in combining di erent
BIST con gurations. This avoids multiple downloads into the FPGA and reduces the
testing time signi cantly. Though this approach can be applied to other SoCs, some
development is required because of the changes in interfacing between microcontroller
and FPGA.
A summary of RAM BIST con gurations and memory requirements for storing
BIST con gurations are shown in Table 3.8 and Table 3.9 respectively. The single-
download method explained in this thesis improves both testing time and memory
requirements for storing BIST con gurations by a factor of approximately 2.5 as
determined by running the tests on actual devices. As can be seen from Table 3.8,
by combining all con gurations into one download, the download time decreases from
1500 ms to 600 ms. However, the time taken for running BIST and retrieving BIST
results increases from 311 us to 8.8 ms because concurrent execution inside the FPGA
is now replaced with sequential execution of AVR program code. This increase in
BIST running time ( 8.5ms) is very small compared to the decrease in download time
(900ms) and this results in signi cant improvement in total test-time.
All the above mentioned BIST con gurations have been downloaded into Atmel
AT94K40 and AT94K10 SoCs and have been veri ed by injecting faults in various
resources of the FPGA.
Estimations of total test-time and memory storage requirements if the BIST
circuitry is algorithmically generated from the AVR without downloading into the
63
Table 3.8: Summary of RAM BIST Con gurations for FPSLIC
Testing
resource Con g
BIST
exec.
time
(sec)
Dwld
time
(ms)
Total
test
time
(ms)
TPG
PLBs
ORA
PLBs
Max
Clk
speed
(Hz)
Speed-
up
Free RAM Dual-port 147u 500 500.147 66 960 17.7M
testing Single-portsync 124u 500 500.124 123 1152 12.3M 1
from FPGA Single-portasync 40u 500 500.04 18 1152 21.4M
Free RAM
testing
from AVR
All modes 8.8m 600 608.8 14 1152 20M 2.46
Free RAM
testing
without
down-
loading
into
FPGA
All modes 20m 180 200 14 1152 20M 7.5
Single-port
from FPGA 32.7m 375 407.7 210 16 18.5M
Data SRAM Single-port+dual-port 657m 375 1032 30 8 20M 1
Single-port+
dual-port
with stack
relocation
95m 375 470 30 8 20M
FPGA are shown in bold in Table 3.8 and Table 3.9, respectively. Worst-case as-
sumptions for program size and partial recon guration time from the AVR result in
improvement of test-time by a factor of approximately 7.5 and improvement in mem-
ory requirements for storing BIST con gurations by a factor of approximately 8 for
a 48 48(PLBs) device if a single compiled AVR code is downloaded. Generation of
BIST logic from AVR requires downloading compiled C code into program memory
64
Table 3.9: Memory Storage Requirements for BIST Con gurations
Testing
Resource
Con g Bit-stream
Size(K Bytes)
Memory Reduc-
tion Factor
Free RAM Dual-port 590
testing Single-port sync 585 1
from FPGA Single-port async 561
Free RAM
testing Allmodes 731 2.37
from AVR
Free RAM
testing
without All modes 200 8.68
downloading
into FPGA
Data SRAM
Single-port from FPGA 458
1Single-port + dual-port 453
Single-port + dual-port
with stack relocation 453
of AVR eliminating any download into the FPGA. This results in signi cant improve-
ment in overall testing time. However, as the device size shrinks the improvements
may not be as signi cant because the bitstream-size decreases with the size of the
device and download time approaches that of BIST execution time.
65
Chapter 4
Implementation of BIST on Xilinx FPGAs
BIST approaches for testing embedded block RAMs and distributed LUT RAMs
in Virtex and Spartan series FPGAs from Xilinx are discussed in this chapter. The
VHDL code originally developed for testing memory components in FPSLIC is used
for testing RAMs in Xilinx FPGAs with minimal changes. The impact of architectural
changes in Xilinx FPGAs on the BIST architecture and the changes needed in the
BIST implementation are also discussed.
4.1 Motivation
The basic BIST architecture used for PLBs in FPGAs is shown in Figure 2.14.
A similar BIST architecture is used for testing routing resources and memory com-
ponents in various families of FPGAs [19] [45] [60]. Though the BIST architecture is
independent of the FPGA, BIST con gurations are architecture dependent and have
to be developed from scratch for di erent families of FPGAs. If BIST development for
one family of FPGAs can be reused, development time can be reduced signi cantly.
All FPGAs support logic implementation using a Hardware Description Language
(HDL) such as VHDL or Verilog. Since most HDLs are portable, BIST development
implemented for a given FPGA should be reusable in most of the other FPGAs. In
order to assess the  exibility and versatility of this approach, the VHDL-based BIST
developed for testing embedded RAMs in Atmel FPGAs is used for testing memory
components in Xilinx FPGAs. The architecture of Xilinx FPGAs is discussed in the
66
next section so as to compare with that of Atmel and discuss its impact on various
attributes of testing like total testing time, number of test con gurations and the
BIST architecture.
4.2 PLB and Routing Architecture
Xilinx FPGAs adopt a coarse-grained architecture as opposed to the  ne-grained
architecture adopted by the Atmel FPSLIC [62] [63] [64] [65] [66]. More logic can be
accommodated in a Xilinx PLB when compared to the Atmel PLB. PLBs in Spartan
and Virtex series FPGAs are made up of slices. Each slice typically contains two LUTs
and two storage elements along with other components. The basic architecture of a
slice is shown in Figure 4.1. Each slice in all the Xilinx FPGAs under consideration
for testing consists of two 4-input LUTs, two storage elements, fast carry look-ahead
chain and dedicated arithmetic logic gates. Multiplexers are used to handle larger
input logic functions by implementing Shannon?s expansion theorem. The LUTs can
also be con gured to operate as a shift-register or a RAM, which form the distributed
memory in the FPGA. Each slice is capable of implementing a logic function of up
to 9 inputs [64].
Each PLB consists of two slices in Virtex I, Spartan II devices and four slices in
Spartan III, Virtex II and Virtex II Pro devices. Compared to Atmel PLBs, PLBs
in Xilinx devices are more complicated and capable of accommodating more logic.
Table 4.1 summarizes the minimum and maximum PLB array sizes of Xilinx family
FPGAs under consideration for testing.
67
LUT1
Shift 
Reg
RAM
LUT2
Shift 
Reg
RAM
Storage 
Element
Storage 
Element
Carry 
logic
Carry 
logic
M
u
x
1
M
u
x
2
Arithmetic Logic
Figure 4.1: Architecture of a Slice in Virtex and Spartan FPGAs [65]
The routing architecture of Xilinx devices is hierarchical and consists of long lines,
hex lines, double lines and local direct lines. Long lines span across the entire height
and width of the device [65] [64]. Hex lines connect to every third and sixth PLB away
in all four directions. Double lines connect to every  rst and second PLB away in all
four directions. PLBs access the above mentioned global routing resources through
a switch matrix. Local routing resources enable PLBs to connect to adjacent PLBs.
Local direct lines in Virtex and Spartan II FPGAs allow connections to horizontally
adjacent PLBs and in Virtex II and Virtex II Pro devices, direct lines can connect to
all surrounding 8 PLBs. Apart from these lines, there are internal lines to connect
LUTs in di erent slices of a given PLB [65] [64].
68
Table 4.1: PLB Array Size Bounds for Xilinx Family FPGAs
Family Min Size Max Size
Virtex I 16x24 64x96
Spartan II 8x12 28x42
Spartan III 16x12 104x80
Virtex II 8x8 112x104
Virtex II Pro 16x22 120x94
4.3 Embedded Block RAMs Architecture
In addition to the distributed memory of the LUT RAMs in PLBs, the Xilinx
FPGAs incorporate multiple large, dedicated RAMs called block RAMs [65] [64]. The
size of block RAMs varies with the device family. Block RAMs in Virtex I and Spartan
II are functionally identically identical and are 4K bits in size. They are arranged in
two columns at the rightmost and leftmost edges of the array and are 4 PLBs in height
as shown in Figure 4.2(a). Each block RAM contains two identical ports which can
be operated independently. They can be con gured to operate in single-port mode
or in dual-port mode. Block RAMs are true dual-port RAMs, unlike free RAMs in
Atmel FPGAs. As a result, a di erent test algorithm has to be used. Block RAMs
are huge compared to free RAMs and this a ects the testing time. Block RAMs in
Virtex I and Spartan II can operate in  ve di erent sizes (words x bits): 4096 1,
2048 2, 1024 4, 512 8, 256 16. This a ects the number of con gurations required
to completely test block RAMs, as will be discussed. Block RAMs can only operate
in synchronous modes.
Block RAMs in Virtex II, Spartan III and Virtex II Pro devices are functionally
identical and are 18K bits in size. However, the number of block RAMs and their
69
???.
???.
???.
??.
Block RAMs and multiplier blocks
??.
PLBs
?. ?.
??.
(a) (b) (c)
Figure 4.2: Organization of Block RAMs in (a) Virtex I and Spartan II FPGAs (b)
Virtex II, Virtex II Pro and Spartan III FPGAs (c) Spartan III FPGAs
arrangement vary with the device in a particular family as shown in Figure 4.2(b) and
Figure 4.2(c). As a result, device characteristics have to be considered when placing
RAMs, as will be discussed. The 18K bits block RAMs operate in six di erent sizes
(words x size): 512 36, 1K 18, 2K 9, 4K 4, 8K 2 and 16K 1. For widths that are
not integral multiples of bytes, an additional parity bit is optionally provided for each
byte. All of these di erent modes of operation a ect the number of con gurations
required to completely test block RAMs, as will be discussed.
Three di erent write modes are provided in dual-port operation to maximize
throughput and e ciency of block RAMs [67]. The three modes are: WRITE FIRST,
READ FIRST and NO CHANGE. In WRITE FIRST mode, the input data is written
into the addressed RAM location and also simultaneously stored in the output data
latches and, if the other port tries to read the same location, the output data on
70
that port is unknown which means that the data can be either previously stored
data or data that is being currently written. In READ FIRST mode, data previously
present in the addressed RAM location is re ected on the output data lines while the
input data is being written into the addressed location and data previously stored is
re ected on the other port if it is trying to read the same location. In NO CHANGE
mode, the data on output data lines remain unchanged and, if the other port is trying
to read the same location, the output data on the port is unknown. The basic block
diagram of a block RAM is shown in Figure 4.3. Clock enable, set/reset, clock and
enable lines of each port can be independently con gured to operate with any active
level(or edge in case of clock) as shown in Figure 4.3. Set/Reset signal, when asserted,
would initialize the data output latches synchronously to all 1s or all 0s. All these
features a ect the number of BIST con gurations and also the BIST architecture as
will be discussed.
4.4 Block RAM Testing
Block RAMs have to be tested in both single-port and dual-port modes. Initially,
the block RAM is con gured in single-port mode to test for all cell-related faults.
Next, the block RAM is con gured in dual-port mode to test for port related faults.
Since the block RAM can be con gured to operate in di erent sizes, the block RAM
has to be tested in all possible sizes. For instance, since Virtex I and Spartan II devices
can operate in 5 di erent sizes, block RAMs are tested in single-port mode in all 5
sizes. BDS are used only with highest possible data width to detect the maximum
possible bridging faults among the data lines as well as CFs and NPSFs. BDS can
71
Enable
WENA
Set/ResetA
CLKA
EnableB
WENB
Set/ResetB
CLKB
Port A
Port B
AddessA [n-1 : 0]
AddessB [n-1 : 0]
DIA [m-1 : 0]
DIB [m-1 : 0]
DOA [m-1 : 0]
DOB [m-1 : 0]
Figure 4.3: Block Diagram of a Block RAM
be used in all con gurations, but this would increase the total testing time apart
from increasing the complexity of the TPG. When testing in dual-port mode, block
RAMs are con gured to operate with highest possible data width. One con guration
is su cient since all the cell-related faults and con guration bits that set the data
width of the device have already been tested in single-port mode. The details of
the BIST architectures used and results of implementation are presented in the next
subsection.
72
4.4.1 Block RAM Testing in Single-port Mode
The BIST architecture used for testing block RAMs is as shown in Figure 4.4. A
single TPG is used for providing test patterns and control signals and the comparison
based-approach is used for the ORAs. The architecture is slightly modi ed from that
used for testing free RAMs which had less diagnostic resolution for the RAMs at the
edges. An extra column of ORAs are added to compare RAMs at the both edges.
This circular comparison was not possible for free RAMs due to limited logic and
routing resources.
TPG
ORA
RAM
Figure 4.4: BIST Architecture for Block RAMs Testing
Each port can be independently controlled to have di erent active levels for
write enable, set/reset, RAM enable and active clock edge signals as shown in Figure
4.3. Since  ve di erent con gurations are required for completely testing single-
port modes, di erent active levels for control signals can be selected in di erent
con gurations. Also WRITE FIRST, READ FIRST and NO CHANGE write-mode
73
options can also be selected during these  ve con gurations. The reason for not
implementing expected data comparison as was done for Atmel free RAMs is to test
all write mode features in di erent con gurations. Expected data generation requires
a separate TPG implementation for each of these write modes.
The Xilinx synthesis tool always selects port A when RAMs are con gured in
single-port mode. In order to test both ports independently, block RAM is con gured
as shown in Figure 4.5.
Port A
Port B
TPG
ENA
ENB
To ORA_A
To ORA_B
Set/Reset A
Set/Reset B
Figure 4.5: Block RAM Con guration for Testing both Ports in Single-port Mode
The block RAM is actually con gured in dual-port mode and TPG provides
common test pattern signals for both ports except for RAM enable and set/reset
signals. Both the ports are enabled for only one clock cycle after BIST is started to
test the set/reset functionality of output latches. The TPG, which is implemented as
a state machine, enables only Port A during the  rst iteration of the march sequence
and enables Port B during the second iteration of the march sequence. Therefore,
74
except for one clock before the start of the  rst iteration, both ports are never enabled
at the same time and thus set/reset is never asserted high as shown in Figure 4.5.
The outputs from both the ports are compared with the data from identical ports of
two di erent RAMs by two di erent ORAs.
4.4.1.1 BIST Implementation
The entire BIST circuitry is designed using VHDL. The TPG is designed to
implement the March LR algorithm. BDS is used only when testing the RAM con-
 gured to operate with largest possible data width. The TPG implemented in VHDL
is generated using RAMBISGEN tool. The algorithm used and its input  le format
for generating VHDL code is listed in Appendix C.
The design of a single-bit ORA implemented in VHDL is as shown in Figure 4.6.
The design is identical to the one used for free RAMs in dual-port mode. One slice is
required to implement the single-bit ORA. A di erent ORA design can be used where
data from port A and data from port B can be compared as shown in Figure 4.8(a).
This reduces the number total number of ORAs required by a factor of 2 and can
be used in case of limited logic resources. However, the diagnostic resolution changes
from a single-port of a block RAM to a single block RAM.
The slice counts for implementing the TPG and the ORA for Virtex I and Spartan
II devices are shown in Table 4.2. PLB counts can be obtained by dividing the values
given in Table 4.2 by number of slices in the device. The total number of slices
required for implementing the BIST is greater than the sum of TPG slices and ORA
75
Data from RAM1
Data from RAM2
Slice
Shift Data
Shift Control Clk
Reset
Shift Data to 
Next ORA
D
Q
Figure 4.6: Design of a Single-bit ORA for Block RAM Testing
slices. This is because extra slices are required to bu er heavily loaded signals and
the number of extra slices required depends on the number of RAMs being tested.
Table 4.2: BIST PLB Count for Virtex I and Spartan II
FRAM BIST Algorithm TPG Slices ORA Slices
March LR w/o BDS 62
March LR with BDS(16-bit) 110 N xDx2
March LR with BDS(36-bit) 174
N = # of block RAMs, D = # of data bits
The Xilinx synthesis tool (ISE) allows placement of logic and RAMs to be con-
trolled via a constraint  le and, hence, the VHDL-only approach was used for imple-
menting the BIST. The format for specifying the placement of RAMs is as follows:
LOC =RAMBn X# Y#.
X and Y represent the row and column coordinates of the RAM and the value
of n indicates the size of the memory and is device speci c. For example, the INST
\RAM0" LOC = \RAMB16 X0 Y0" construct used in Virtex II and Virtex II Pro
76
FPGAs speci es that the placement tool places instance RAM0 of a 16K bits block
RAM at the bottommost left hand corner of the FPGA. The RAMB4 R# C# con-
struct is used in Virtex I and Spartan II FPGAs as the size of block RAMs is 4K bits
in these devices. Block RAM row and column designations are used instead of X and
Y coordinates in these devices.
The number of block RAMs and their arrangement varies with the device as
shown in Figure 4.2. In order to facilitate generation of the placement  le for di erent
devices, a program to generate the constraint  le is implemented in C language.
The same four BIST function I/O pins used for testing free RAMs are used for
testing block RAMs as shown in Table 4.3. In devices which have a JTAG interface
with access to the FPGA core, the boundary-scan interface can be used for download-
ing into FPGA con guration memory and also for running the BIST. The function of
Xilinx boundary-scan pins used as BIST I/O pins is shown in Table 4.3. The JTAG
interface allows de ning the I/O interface for running BIST independent of the device
and package.
Table 4.3: Function of Xilinx JTAG pins
JTAG Pin Function
DRCK1 Clk
SEL2 Reset
TDI Shift
TDO1 Scanout
77
4.4.1.2 Diagnosis
A modi ed version of the MULTICELLO algorithm, as explained in [68], is used
for performing diagnostics. This modi ed algorithm takes the circular comparison
of RAMs into account. Worst case scenarios wherein the modi ed MULTICELLO
algorithm is not able to  nd unique diagnosis is described in [69]. In order to obtain a
unique diagnosis in such cases, the pair-wise comparison of RAMs by the ORAs needs
to be changed by changing the location of the RAM in the constraint  le. The code
has to be synthesized again to download and execute the new BIST con guration.
Then the diagnosis has to be reapplied taking the results of the previous diagnosis
into account.
4.4.2 Block RAM Testing in Dual-port Mode
The BIST architecture used is identical to the one used for single-port mode test-
ing of free RAMs shown in Figure 3.6. The TPG generates expected data assuming
that RAMs operate in write- rst mode, which is the default mode. Since the di erent
write modes are tested in single-port mode, expected data comparison is feasible and
also diagnosis becomes simpler. The block RAMs are con gured to operate with the
maximum data width and no BDS is used in this mode of testing. March s2pf and
March d2pf algorithms [61] used for testing data SRAM in FPSLIC are used for test-
ing block RAMs in dual-port mode. The two algorithms could be combined to form
a single con guration but this TPG becomes too large to  t in some smaller devices.
VHDL is used to implement the BIST and placement of RAMs is controlled
through a constraint  le. The TPG and ORA slice counts are shown in Table 4.4.
78
March algorithms are implemented on 16-bit wide RAMs in Virtex I and Spartan II
devices and on 36-bit wide RAMs in Spartan III, Virtex II and Virtex II Pro devices.
Table 4.4: TPG and ORA Counts for Testing Block RAMs in Dual-port Mode
Algorithm Data Width TPG Slices ORA Slices
March s2pf D=16 49 N 2 D
March d2pf D=16 76 N 2 D
March s2pf D=36 64 N 2 D
March d2pf D=36 113 N 2 D
4.5 Summary of Block RAM Testing
As can be seen from Table 4.4 and Table 4.2, the March LR with BDS imple-
mentation requires more slices than any other march sequence. A comparison of the
maximum number of PLBs required for implementing the BIST in di erent devices is
determined through synthesis. The number of PLBs required for BIST is compared
with the maximum number of PLBs available in di erent devices and is shown in
Figure 4.7. There are 4 devices that cannot accommodate the BIST circuit com-
pletely and as a result these devices require testing block RAMs in two phases, with
half the block RAMs tested in each phase. Another approach is to use the ORA as
shown in Figure 4.8(b) at the cost of decreased diagnostic resolution. As can be seen
from the Figure 4.7, the number of RAMs and hence the number of PLBs required
for implementing the BIST increase tremendously in some of Virtex II and Virtex II
Pro FPGAs. This increases download-time considerably and hence the testing time.
Improvements that can be done to decrease the testing time in these devices are
discussed in Chapter 5.
79
All BIST con gurations have been downloaded into Spartan II 2S50, Spartan II
2S200 and Virtex II Pro 2VP30 devices as shown in Figure 4.7 and were veri ed
using fault injection.
0
5
10
15
20
25
30
35
40
45
50
2S
15
2S
30
2S
50
2
S
100
2
S
150
2
S
200 V5
0
V1
0
0
V1
5
0
V2
0
0
V3
0
0
V4
0
0
V6
0
0
V8
0
0
V1
0
0
0
3S
50
3
S
200
3
S
400
3S
1
000
3S
1
500
3S
2
000
3S
4
000
3S
5
000
2V
4
0
2V
8
0
2V
2
5
0
2V
5
0
0
2
V
100
0
2
V
150
0
2
V
200
0
2
V
300
0
2
V
400
0
2
V
600
0
2
V
800
0
2
VP2
2
VP4
2
VP7
2V
P
2
0
2
VPX2
0
2V
P
3
0
2V
P
4
0
2V
P
5
0
2V
P
7
0
2
VPX7
0
2
VP1
0
0
available in FPGA
needed for BIST
Slices (Thousands)
Devices with insufficient 
slices for BIST 
implementation
Devices used in this thesis
Figure 4.7: Programmable Logic Resources in Xilinx FPGAs
4.6 LUT RAM Testing
LUTs form distributed memory in Xilinx FPGAs. Each slice consists of two 4-
input LUTs (F-LUT and G-LUT), each of which can also function as a 16 1 single-
port synchronous RAM. Both the LUTs in a slice can be combined to function as a
16 2 single-port synchronous RAM, a 32 1 single-port synchronous RAM or 16 
80
1 dual-port synchronous RAM. Theoretically, the maximum amount of distributed
memory is equal to 2 nslice nplb 16 bits, where nslice indicates number of slices
per PLB and nplb indicates number of PLBs in the device and a factor of 2 is due to
the fact that each slice consists of two LUTs.
Three con guration modes are required to completely test the LUT RAMs: 16 2
single-port mode, 32 1 single-port mode and 16 1 dual-port mode. All LUT RAMs
cannot be tested in parallel, as some LUTs are required for BIST logic (TPGs and
ORAs). Therefore, each of the three testing con gurations requires two phases, where
the roles of the RUTs and the TPGs/ORAs are reversed in each phase.
4.6.1 BIST Implementation
The BIST architecture used in all three modes is identical to the one used for
PLBs as shown in Figure 2.14, with BUTs replaced by RUTs and two TPGs replaced
with a single TPG. The March Y algorithm used for testing asynchronous free RAMs
is used for testing in single-port modes and the DPR algorithm used for testing
free RAMs in dual-port mode is used for testing LUTs, as dual-port mode in the
LUT RAMs is not a true dual-port RAM. In fact the DPR algorithm was originally
developed for the LUT dual-port RAM mode in Xilinx FPGAs [19]. No BDS are
used in any of the modes. Comparison-based ORAs, as shown in Figure 4.8 (a), are
used for all three BIST con gurations. Diagnostic resolution in all the con gurations
is limited to a slice instead of a LUT RAM. The ORA design shown in Figure 4.8(b)
can also be used in any of the con gurations since F and G LUTs are tested in parallel
81
RAM2 LUTG Data
RAM1 LUT G Data
RAM2 LUTF Data
RAM1 LUTF Data
Slice
Shift Data
Shift Control Clk
Reset
Shift Data 
to Next 
ORA
D
Q
Slice
Shift Data
Shift Control Clk
Reset
Shift Data 
to Next 
ORA
D
Q
RAM2 LUTG Data
RAM1 LUT G Data
RAM2 LUTF Data
RAM1 LUTF Data
(a)
(b)
Figure 4.8: ORA Designs Used for LUT RAM Testing
and the data that is read from these two LUTs is always identical. VHDL is used for
implementing the BIST and details of implementation are shown in Table 4.5.
All 3 LUT RAM BIST con gurations have been downloaded into 2S50, 2S200
and V2P30 devices and veri ed using fault injection.
82
Table 4.5: TPG and ORA Counts for Testing LUT RAMs
Algorithm Test Mode TPG Slices ORA Slices
March Y 16 2 9 N
March Y 32 1 10 N/2
March DPR 16 1 40 N
18 x 18
  Multiplier
A[17:0]
B[17:0]
P[35:0]
CLK
CE
RST
18 x 18
  Multiplier
A[17:0]
B[17:0]
P[35:0]
(a) (b)
Figure 4.9: Multiplier Modes (a) Asynchronous Mode (b) Registered Mode [65]
4.7 MULTIPLIER BIST
Spartan III, Virtex II and Virtex II Pro FPGAs contain 18 18 multiplier blocks.
Their organization is similar to block RAMs, as each multiplier block is associated
with a block RAM. These multipliers perform 2?s complement multiplication of two
18-bit wide inputs to produce a 36-bit wide result. The modi ed BOOTH algorithm,
as explained in [70], is used by these multipliers. The multiplier blocks can be con-
 gured to operate in combinatorial mode or registered mode. Clock, clock enable
and synchronous reset inputs are added in the registered version, which can be pro-
grammed in terms of active level or edge in the case of clock as shown in Figure
4.9 [65].
83
The approach described in [71] is used for testing the multipliers. A total of
three con gurations are required to completely test the multipliers. VHDL is used
for implementing the BIST and details of synthesized implementation are described
in Table 4.6.
Table 4.6: Multiplier BIST Slice Count
Algorithm Mode TPG Slices ORA Slices
Count [10] combinational 8 N 36
Modi ed count registered 10 N 36
N=Number of Multiplier Cores
The multiplier BIST approach demonstrates that the VHDL-based BIST ap-
proach can be applied for any regular structured core other than RAMs in any FPGA.
84
Chapter 5
Summary and Conclusions
BIST con gurations for testing memory components in commercially available
FPGAs and SoCs are presented in this thesis. Two di erent approaches were followed
for developing BIST con gurations to separately deal with two important features:
portability of BIST development and testing time. BIST con gurations developed
were used to test memory components in AT40K series FPGAs and AT94K series
SoCs from Atmel and Spartan II, Spartan III, Virtex I, Virtex II series FPGAs and
Virtex II pro SoCs from Xilinx. A summary of the thesis, observations made during
BIST development, and suggestions for future research are discussed in this chapter.
5.1 Summary
The goal was to develop BIST con gurations for testing free RAMs in AT40K
series FPGAs and AT94K series SOCs since they have embedded AT40K FPGA cores.
Initially VHDL was used to design the BIST circuitry. This approach was useful only
for pass/fail indication and not for diagnosis to indicate faulty RAMs due to lack of
support from the synthesis tool for control of placement of RAMs relative to their
associated ORAs. As a result, a combined VHDL-MGL approach was used to design
the BIST circuitry. Three BIST con gurations were developed to completely test free
RAMs.
The embedded microcontroller (AVR) in AT94K series SoCs can access the em-
bedded FPGA core and can write into its con guration memory. This feature gave
85
rise to an alternate BIST approach for SoCs. The AVR was used to control the BIST
i.e., to start the BIST, retrieve the results after the BIST was completed and present
the results to a higher controlling device (PC) which performed diagnosis based on
BIST results. The same three BIST con gurations were developed to test the free
RAMs from the AVR.
BIST circuitry implemented inside the FPGA can be made regular by moving
the irregular TPG function into the AVR, leaving only the ORAs and RAMs in the
FPGA. This gave rise to the possibility of combining the three BIST con gurations
into one. This was possible because regular BIST structure inside the FPGA is similar
for all three con gurations and can now easily be recon gured by the AVR for the
next mode of testing. Diagnosis was also moved from PC to AVR and thus a single
con guration was developed which tests free RAMs completely and also performs
diagnosis.
A similar approach was used to test the embedded data SRAM shared by both
AVR and FPGA. Due to limitations imposed by the AVR architecture, three con g-
urations were required to completely test the data SRAM.
The VHDL-only approach did not yield any bene ts for Atmel FPGAs. However,
due to better synthesis tool support, the VHDL approach seemed worth experimenting
on Xilinx FPGAs. This approach yielded good results on Xilinx FPGAs by controlling
the placement of RAMs with respect to their associated ORAs. A portable VHDL
code was thus created to test embedded block RAMs and LUT RAMs in all families of
FPGAs from Xilinx. A total of 9 BIST con gurations were developed for completely
testing block RAMs in all families of FPGAs from Xilinx. Another 3 con gurations
86
were developed for testing LUT RAMs in all families of FPGAs from Xilinx. A
similar approach was used for testing embedded multipliers in some Xilinx FPGAs
and a total of 3 con gurations were developed for testing them completely.
5.2 Observations
It was observed that the architecture of an FPGA has a signi cant impact on
BIST development. FPGAs using two di erent architectures were considered in this
thesis. Atmel FPGAs use  ne-grained architecture as opposed to Xilinx FPGAs
which use coarse-grained architecture. In  ne-grained FPGAs, it may not always
be possible to  t the entire BIST circuitry if synthesis tools are used for placement
and routing of entire design since heuristic algorithms used by FPGA synthesis tools
may not always come up with optimized placement and routing for the regular BIST
structure. This was noticed while developing BIST con gurations for testing free
RAMs in single-port mode. Atmel?s design tool, called Figaro, could not  t the
entire design. This resulted in two con gurations for completely testing free RAMs
in single-port synchronous mode, with half the RAMs tested in each con guration.
To avoid extra download, the placement and routing of the design was controlled
using MGL. Such a problem can occur with coarse-grained FPGAs as well when logic
or routing resources are used almost completely. Placement and routing problems
did not occur with Xilinx FPGAs when testing block RAMs. However, LUT RAM
testing caused placement and routing issues, as almost 100% of logic resources were
used. Routing issues were solved once placement of RUTs and ORAs were de ned
with a constraint  le.
87
TPG signals become heavily loaded, particularly when testing all the memory
components in a large FPGA with a single BIST con guration. The default fan-out
limit with the Xilinx synthesis tool is 15 and the tool will bu er the signals using
additional logic resources once the limit is exceeded. This prevented  tting the BIST
circuitry in some of the smaller FPGAs from Xilinx. This problem was solved by
increasing the user controlled fan-out limit to trade o speed of testing with number
of test con gurations and thus the total testing time. Such a problem did not occur
with Atmel devices because the TPG signals are bu ered as they pass through the
repeaters.
All Xilinx FPGAs support boundary-scan with facilities for access to the FPGA
core logic and this enabled usage of boundary-scan signals for downloading, running
and controlling the BIST. This provides a common interface for BIST independent of
the package being tested. Due to lack of access to the FPGA core by the boundary-
scan in Atmel devices, di erent I/O pins had to be used in di erent packages for
running BIST.
Atmel SoCs support writing into FPGA con guration memory but do not sup-
port reading of con guration memory or reading the contents of storage elements
in the device. As a result, ORAs were required to be con gured as a scan chain to
shift out the results after running BIST. Read-back capability would save some testing
time and would also avoid the need for a scan chain. While the con guration memory
in Atmel devices is segmented into bytes, con guration memory in Xilinx FPGAs is
segmented into frames. The length of the frames varies with the device and typically
contains a few hundreds bits. Although Xilinx FPGAs have read-back capability, the
88
frame-level segmentation makes read-back complicated, as post processing of results
read back is required to extract the exact ORA data and, therefore, doesn?t reduce
testing signi cantly.
5.3 Future Research
To conclude the thesis, a few suggestions for improvements in the current BIST
approach and also some areas that can be explored are discussed.
Two kinds of approaches were used for output response analysis in this thesis:
comparison based approach and expected data comparison approach. It is better
to use the expected data comparison approach as the approach is more reliable and
makes diagnosis simpler as well. Comparison with adjacent elements detects all pos-
sible faults in the RAMs except for the case where all elements have equivalent faults
but fails to uniquely diagnose the results in cases where three or more adjacent el-
ements being compared have equivalent faults. Comparison with adjacent elements
was preferred over expected data comparison in some cases in this thesis as the latter
approach consumed more logic and routing resources and did not  t in some devices.
Virtex II Pro SoCs have embedded Power PC microprocessors similar to the
AVR in FPSLIC. The approach wherein the TPG was moved into the AVR and the
BIST was controlled by the AVR can be explored with Power PC in Virtex II Pro
SoCs. There is a possibility of this approach yielding more speed-up and memory
storage improvements in this device. Download time for the Virtex II Pro SoC is
much larger than that of FPSLIC because of larger con guration memory and also
the number of con gurations for testing block RAMs is 9 as opposed to 3 for the free
89
RAMs in FPSLIC and these factors can result in better speed-up provided all block
RAM test con gurations are combined into a single con guration executed by the
Power PC. The problem however is that the block RAMs form the program memory
for the Power PC.
With proper support from FPGA synthesis tools, the portable VHDL BIST
approach can also be experimented with logic blocks and routing in Xilinx FPGAs.
If a slice can be modeled using VHDL in such a way that the tool recognizes the model
as a slice, BIST development can be reduced signi cantly by following the approach
used for LUT RAM testing and logic BIST can be designed using VHDL alone and
by controlling the physical placement of logic blocks and ORAs.
90
Bibliography
[1] Arnaldo,B., \Systems on Chip: Evolutionary and Revolutionary Trends", 3rd
International Conference on Computer Architecture (ICCA?02), pp: 121-128,
2002.
[2] J. Becker, \Con gurable Systems-on-Chip (CSoC)", Proc. IEEE Integrated Cir-
cuits and Systems Design Symposium, pp: 379-384, 2002.
[3] M. Rabaey, \Experiences and Challenges in System Design", Proc. IEEE Com-
puter Society Workshop, pp: 2-4, 1998.
[4] J. Becker and M. Vorbach, \Architecture, Memory and Interface Technology
Integration of an Industrial/Academic Con gurable System-on-Chip (CSoC)",
Proc. IEEE. Computer Society Annual Symposium, pp: 107-112, 2003.
[5] S. Knapp and D. Tavana, \Field Con gurable System-On-Chip Device Architec-
ture", Proc. IEEE Custom Integrated Circuits Conference, pp: 155-158, 2000.
[6] K. Kawana, H. Keida, M. Sakamoto, K. Shibata and I.Moriyama, \An E cient
Logic Block Interconnect Architecture for User-Reprogrammable Gate Array",
Proc. IEEE Custom Integrated Circuits Conference, pp: 31.3/1-31.3/4, 1990.
[7] H. Verma, \Field Programmable Gate Arrays", IEEE Potentials, Vol. 18, No.
4, pp: 34-36, Oct - Nov, 1999.
[8] S.J.E Wilton, \Embedded Memory in FPGAs: Recent Research Results", Proc.
IEEE Paci c Rim Conference, pp: 292-296, 1999.
[9] S.J.E. Wilton, \Implementing Logic in FPGA Memory Arrays: Heterogeneous
Memory Architectures", Proc. IEEE Field-Programmable Technology, pp: 142-
147, 2002.
[10] ,\International Technology Roadmap For Semiconductors (ITRS) 2000
Update. Technical Report", ITRS, 2000.
[11] V. Ratford, \Self-Repair Boosts Memory SoC Yields", Integrated System De-
sign, Sept 2001.
[12] A. Benso, S. Carlo, G. Natale, P. Prinetto, and M. Bodoni, \Programmable
Built-in Self-Testing of Embedded RAM Clusters in System-on-Chip Architec-
tures", IEEE Communications Magazine, Vol. 41, No. 9, pp: 90-97, Sept 2003.
91
[13] B.G. Oomman, \A New Technology for System-on-Chip", Electronics Engineer,
April 2000.
[14] R. Chandramouli and S. Pateras, \Testing Systems on a Chip", IEEE Spectrum,
Vol. 33, No. 11, pp: 42-47, Nov 1996.
[15] V.D. Agrawal, R. Charles and K. Saluja, \A Tutorial on Built-in Self Test, Part
1: Principles", IEEE Design & Test of Computers, Vol. 10, No. 1, pp: 73-82,
March 1993.
[16] H.J. Wunderlich, \Non-intrusive BIST for Systems-On-a-Chip", Proc. IEEE
International Test Conference, pp: 644-651, 2000.
[17] M. Abramovici , C.E. Stroud and M. Emmert, \Using Embedded FPGAs for
SoC Yield Improvement", Proc. Design Automation Conference, pp: 713-724,
2002.
[18] C.E. Stroud, S. Konala, C. Ping and M. Abramovici, \Built-in Self-Test of
Logic Blocks in FPGAs (Finally, a Free Lunch: BIST Without Overhead!)",
Proc. VLSI Test Symposium, pp: 387- 392, 1996.
[19] C.E. Stroud, K.N. Leach, and T.A. Slaughter, \BIST for Xilinx 4000 and Spar-
tan Series FPGAs: a Case Study", Proc. IEEE International Test Conference,
2003.
[20] S.M. Trimberger, \Field-Programmable Gate Array Technology", Kluwer Pub-
lishers, Norwell MA, 1994.
[21] G. Brebner, \Eccentric SoC Architectures as the Future Norm", Proc. Digital
System Design, Euromicro Symposium, pp: 2-9,2003.
[22] S. Hauck, \The Roles of FPGA?s in Reprogrammable Systems" Proc. IEEE,
Vol. 86, No. 4, pp: 615-638, April 1998.
[23] Y. Khalilollahi, \Switching Elements, the Key to FPGA Architecture",
WESCON Conference Record, pp: 682 - 687,1994.
[24] J. Rose, A. El Gamal and A. Sangiovanni-Vincentelli, \Architecture of Field-
Programmable Gate Arrays", Proc. IEEE, Vol. 81, No. 7, pp: 1013-1029, July
1993.
[25] D.S. Brown, J.R. Francis, J. Rose and G.Z. Vranesic,
\Field-Programmable Gate Arrays", Kluwer Publishers, Norwell, MA,
1992.
92
[26] J.V. Old eld and R.C. Dorf, \Field-Programmable Gate Arrays:
Recon gurable Logic for Rapid Prototyping and Implementation of Digital
Systems", John Wiley & Sons, New York, 1995.
[27] J. Rose, R.J. Francis, D. Lewis, and P. Chow, \Architecture of Field Pro-
grammable Gate Arrays: The E ect of Logic Block Functionality on Area Ef-
 ciency", IEEE Journal of Solid-State Circuits, Vol. 25, No. 5, pp: 1217-1225,
Oct 1990.
[28] , "AT40K Series Field Programmable Gate Array", Data Sheet, Atmel
Corporation, 2003.
[29] , "AT94K Series Field Programmable System Level Integrated Circuit",
Data Sheet, Atmel Corporation, 2003.
[30] R. Camarota and J. Rosenberg, \Cache Logic FPGAs for Building Adaptive
Hardware", FPGAs Technology and Applications, IEE Colloquium, pp: 1-3,
1993.
[31] R. Rajsuman, \System-on-a-Chip: Design and Test", Artech House, London,
2000.
[32] S.J.E. Wilton, \Implementing Logic in FPGA Embedded Memory Arrays: Ar-
chitectural Implications", Proc. IEEE Custom Integrated Circuits Conference,
pp: 269-272, 1998.
[33] Xilinx Corp., www.xilinx.com/products.
[34] S. Singh, S. Azmi, N. Agrawal, P. Phani, and A. Rout, \Architecture and De-
sign of a High Performance SRAM for SOC Design", Proc. Design Automation
Conference, pp: 447-451, 2002.
[35] C.T. Huang, J.R Huang, C.F. Wu, C.W. Wu, and T.Y. Chang, \A Pro-
grammable BIST Core for Embedded DRAM", IEEE Design & Test of Com-
puters, Vol. 16, No. 1, pp: 59-70, Jan - March 1999.
[36] T. Seceleanu, J. Plosila, and P. Lijeberg, \On-Chip Segmented Bus: a Self-
Timed Approach", Proc. IEEE, ASIC/SOC Conference, pp: 216-220, 2002.
[37] D. Bhatia, \Field Programmable Gate Arrays", IEEE Potentials, Vol. 13, No.
1, pp: 16-19, Feb 1994.
[38] E. Hall and G. Costakis, \Developing a Design Methodology for Embedded
Memories", Integrated System Design, January 2000.
93
[39] A.J. Van de Goor, \Testing Semiconductor Memories: Theory and Practice",
John Wiley & Sons, New York, 1991.
[40] A.J. Van de Goor, \An Overview of Deterministic Functional RAM Chip Test-
ing", ACM Computing Surveys, Vol. 22, No. 1, pp: 5-33, March 1990.
[41] A.J. Van de Goor, I. Tlili and S. Hamdioui, \Converting March Tests for Bit-
Oriented Memories into Tests for Word-Oriented Memories," Proc. IEEE In-
ternational Workshop on Memory Technology Design and Testing, pp: 46-52,
1998.
[42] M. Renovell and Y. Zorian, \Di erent Experiments in Test Generation for Xilinx
FPGAs", Proc. International Test Conference, pp: 854-862, 2000.
[43] S.K. Lu, J.S. Shih and C.W. Wu, \Built-In Self-Test and Fault Diagnosis for
Lookup Table FPGAs", Proc. IEEE Circuits and Systems, pp: 80-83, 2000.
[44] W.K. Huang, and F. Lombardi, \An Approach for Testing Programm-
ble/Con gurable Field Programmable Gate Arrays", Proc. VLSI Test Sym-
posium, pp: 450-455, 1996.
[45] C.E. Stroud, E. Lee, and M. Abramovici, "BIST-Based Diagnostics of FPGA
Logic Blocks", Proc. IEEE International Test Conference, pp: 539-547, 1997.
[46] W.K. Huang, F.J. Meyer, N. Park, and F. Lombardi, \Testing Memory Mod-
ules in SRAM-Based Con gurable FPGAs", Proc. International Workshop on
Memory Technology, Design and Testing, pp: 79-86, 1997.
[47] D. Das and N.A. Touba, \A Low Cost Approach for Detecting, Locating, and
Avoiding Interconnect Faults in FPGA-Based Recon gurable Systems", Proc.
International Conference on VLSI Design, pp: 266-269, 1999.
[48] M.B. Tahoori, \Application-Dependent Testing of FPGA Interconnects", Proc.
IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems,
pp: 409-416, 2003.
[49] C.E. Stroud, J. Nall, A. Taylor, M. Ford and L. Charnley, \A System for Auto-
mated Generation of Built-In Self-Test for FPGAs", Proc. International Con-
ference on System Engineering, pp: 437-443, 2002.
[50] Y. Zorian , \System-Chip Test Strategies", Proc. Design Automation Confer-
ence, pp: 752-757, 1998.
[51] M.H. Tehranipour, S.M. Fakhraie, Z. Navabi and M.R. Movahedin, \A Low-
Cost At-Speed BIST Architecture for Embedded Processor and SRAM Cores,"
Journal of Electronic Testing: Theory and Applications, Vol. 20, No. 2, pp:
155-168, April 2004.
94
[52] C. H. Tsai and C. Wu, \Processor-Programmable Memory BIST for Bus-
Connected Embedded Memories", Proc. Design Automation Conference, pp:
325-330, 2001.
[53] A. Benso, S. Di Carlo, G. Di Natale, P. Prinetto and M. Lobetti Bodoni, \A
Programmable BIST Architecture for Clusters of Multiple-Port SRAMs", Proc.
IEEE International Test Conference, pp: 557-566, 2000.
[54] R. Rajsuman, \Testing a System-on-a-Chip with Embedded Microprocessor",
Proc. IEEE International Test Conference, pp: 499-508, 1999.
[55] F. Gharsalli, S. Meftali, F. Rousseau, and A.A. Jerraya, \Automatic Gener-
ation of Embedded Memory Wrapper for Multiprocessor SoC", Proc. Design
Automation Conference, pp: 596-601, 2002.
[56] J.M. Harris, \Built-In Self Test Con gurations for Field Programmable Gate
Arrays Cores in Systems-on-Chip", Masters Thesis, Auburn University, 2004.
[57] A. Van de Goor, G. Gaydadjiev, V.N. Jarmolik and V.G. Mikitjuk, \March
LR: A Test for Realistic Linked Faults", Proc. IEEE VLSI Test Symposium,
pp: 272-280, 1996.
[58] C.E. Stroud, \AUSIM: Auburn University Simulator - Version L2.2", Dept. of
Electrical & Computer Engineering, Auburn University, 2004.
[59] , "Integrated Development System AT40K Macro Library Version 6.0",
Atmel Corporation, Oct. 1998.
[60] C.E. Stroud, S. Garimella and J. Sunwoo, \On-Chip BIST-Based Diagnosis
of Embedded Programmable Logic Cores in System-on-Chip Devices", Proc.
International Conference on Computers and Their Applications,(pp: pending),
2005.
[61] S. Hamdioui and A. Van de Goor, \E cient Tests for Realistic Faults in Dual-
Port SRAMs", IEEE Transactions on Computers, Vol. 51, No. 5, pp: 460-473,
May 2002.
[62] , \Virtex FPGAs", Datasheet DS003, Xilinx Inc., 2004.
[63] , \Virtex II Platform FPGAs", Datasheet DS031, Xilinx Inc., 2004.
[64] , \Spartan II Family FPGAs", Datasheet DS001, Xilinx Inc., 2003.
[65] , \Virtex II Pro and Virtex II Pro X Platform FPGAs", Datasheet DS083,
Xilinx Inc., 2004.
95
[66] , \Spartan-3 FPGA Family", Datasheet DS099, Xilinx Inc., 2004.
[67] , \Using Block RAM in Spartan-3 FPGAs", Application note XAPP463,
Xilinx Inc., 2003.
[68] M. Abramovici and C. Stroud, \BIST-Based Test and Diagnosis of FPGA Logic
Blocks", IEEE Trans. on VLSI Systems, Vol. 9, No. 1, pp: 159-172, Jan 2001.
[69] C. Stroud and S. Garimella, \Built-In Self-Test and Diagnosis of Multiple Em-
bedded Cores in Generic SoCs", to be published Proc. International Conference
on Embedded Systems and Applications, 2005.
[70] O.L. Mac Sorley, \High speed arithmetic in binary computers", Proc. IRE, Vol.
49, No. 1, pp: 67-91, Jan 1961.
[71] D. Gizopoulos, A. Paschalis and Y. Zorian, \E ective Built-In Self-Test for
Booth Multipliers", IEEE Design & Test of Computers, Vol. 15, No. 3, pp:
105-111, Sept 1998.
96
Appendices
97
Appendix A
ASL code for free RAM
# mux2 ;
subckt: mux2 in: a b s out: z ;
not: sn in: s out: sn ;
and: a1 in: a sn out: a1 ;
and: a2 in: b s out: a2 ;
or: z in: a1 a2 out: z ;
# RAM word ;
subckt: word in: en d[3:0] out: q[3:0] ;
lat: q3 in: en d3 out: q3 ;
lat: q2 in: en d2 out: q2 ;
lat: q1 in: en d1 out: q1 ;
lat: q0 in: en d0 out: q0 ;
# write address decoder ;
subckt: dec in: a[4:0] en[1:0] out: ld[31:0] ;
not: a4n in: a4 out: a4n ;
not: a3n in: a3 out: a3n ;
not: a2n in: a2 out: a2n ;
not: a1n in: a1 out: a1n ;
not: a0n in: a0 out: a0n ;
and: ld31 in: a4 a3 a2 a1 a0 en[1:0] out: ld31 ;
and: ld30 in: a4 a3 a2 a1 a0n en[1:0] out: ld30 ;
and: ld29 in: a4 a3 a2 a1n a0 en[1:0] out: ld29 ;
and: ld28 in: a4 a3 a2 a1n a0n en[1:0] out: ld28 ;
and: ld27 in: a4 a3 a2n a1 a0 en[1:0] out: ld27 ;
and: ld26 in: a4 a3 a2n a1 a0n en[1:0] out: ld26 ;
and: ld25 in: a4 a3 a2n a1n a0 en[1:0] out: ld25 ;
and: ld24 in: a4 a3 a2n a1n a0n en[1:0] out: ld24 ;
and: ld23 in: a4 a3n a2 a1 a0 en[1:0] out: ld23 ;
and: ld22 in: a4 a3n a2 a1 a0n en[1:0] out: ld22 ;
and: ld21 in: a4 a3n a2 a1n a0 en[1:0] out: ld21 ;
and: ld20 in: a4 a3n a2 a1n a0n en[1:0] out: ld20 ;
98
and: ld19 in: a4 a3n a2n a1 a0 en[1:0] out: ld19 ;
and: ld18 in: a4 a3n a2n a1 a0n en[1:0] out: ld18 ;
and: ld17 in: a4 a3n a2n a1n a0 en[1:0] out: ld17 ;
and: ld16 in: a4 a3n a2n a1n a0n en[1:0] out: ld16 ;
and: ld15 in: a4n a3 a2 a1 a0 en[1:0] out: ld15 ;
and: ld14 in: a4n a3 a2 a1 a0n en[1:0] out: ld14 ;
and: ld13 in: a4n a3 a2 a1n a0 en[1:0] out: ld13 ;
and: ld12 in: a4n a3 a2 a1n a0n en[1:0] out: ld12 ;
and: ld11 in: a4n a3 a2n a1 a0 en[1:0] out: ld11 ;
and: ld10 in: a4n a3 a2n a1 a0n en[1:0] out: ld10 ;
and: ld9 in: a4n a3 a2n a1n a0 en[1:0] out: ld9 ;
and: ld8 in: a4n a3 a2n a1n a0n en[1:0] out: ld8 ;
and: ld7 in: a4n a3n a2 a1 a0 en[1:0] out: ld7 ;
and: ld6 in: a4n a3n a2 a1 a0n en[1:0] out: ld6 ;
and: ld5 in: a4n a3n a2 a1n a0 en[1:0] out: ld5 ;
and: ld4 in: a4n a3n a2 a1n a0n en[1:0] out: ld4 ;
and: ld3 in: a4n a3n a2n a1 a0 en[1:0] out: ld3 ;
and: ld2 in: a4n a3n a2n a1 a0n en[1:0] out: ld2 ;
and: ld1 in: a4n a3n a2n a1n a0 en[1:0] out: ld1 ;
and: ld0 in: a4n a3n a2n a1n a0n en[1:0] out: ld0 ;
# read mux ;
subckt: rmux in: a[4:0] d[31:0] out: z ;
not: a4n in: a4 out: a4n ;
not: a3n in: a3 out: a3n ;
not: a2n in: a2 out: a2n ;
not: a1n in: a1 out: a1n ;
not: a0n in: a0 out: a0n ;
and: ld31 in: a4 a3 a2 a1 a0 d31 out: ld31 ;
and: ld30 in: a4 a3 a2 a1 a0n d30 out: ld30 ;
and: ld29 in: a4 a3 a2 a1n a0 d29 out: ld29 ;
and: ld28 in: a4 a3 a2 a1n a0n d28 out: ld28 ;
and: ld27 in: a4 a3 a2n a1 a0 d27 out: ld27 ;
and: ld26 in: a4 a3 a2n a1 a0n d26 out: ld26 ;
and: ld25 in: a4 a3 a2n a1n a0 d25 out: ld25 ;
and: ld24 in: a4 a3 a2n a1n a0n d24 out: ld24 ;
99
and: ld23 in: a4 a3n a2 a1 a0 d23 out: ld23 ;
and: ld22 in: a4 a3n a2 a1 a0n d22 out: ld22 ;
and: ld21 in: a4 a3n a2 a1n a0 d21 out: ld21 ;
and: ld20 in: a4 a3n a2 a1n a0n d20 out: ld20 ;
and: ld19 in: a4 a3n a2n a1 a0 d19 out: ld19 ;
and: ld18 in: a4 a3n a2n a1 a0n d18 out: ld18 ;
and: ld17 in: a4 a3n a2n a1n a0 d17 out: ld17 ;
and: ld16 in: a4 a3n a2n a1n a0n d16 out: ld16 ;
and: ld15 in: a4n a3 a2 a1 a0 d15 out: ld15 ;
and: ld14 in: a4n a3 a2 a1 a0n d14 out: ld14 ;
and: ld13 in: a4n a3 a2 a1n a0 d13 out: ld13 ;
and: ld12 in: a4n a3 a2 a1n a0n d12 out: ld12 ;
and: ld11 in: a4n a3 a2n a1 a0 d11 out: ld11 ;
and: ld10 in: a4n a3 a2n a1 a0n d10 out: ld10 ;
and: ld9 in: a4n a3 a2n a1n a0 d9 out: ld9 ;
and: ld8 in: a4n a3 a2n a1n a0n d8 out: ld8 ;
and: ld7 in: a4n a3n a2 a1 a0 d7 out: ld7 ;
and: ld6 in: a4n a3n a2 a1 a0n d6 out: ld6 ;
and: ld5 in: a4n a3n a2 a1n a0 d5 out: ld5 ;
and: ld4 in: a4n a3n a2 a1n a0n d4 out: ld4 ;
and: ld3 in: a4n a3n a2n a1 a0 d3 out: ld3 ;
and: ld2 in: a4n a3n a2n a1 a0n d2 out: ld2 ;
and: ld1 in: a4n a3n a2n a1n a0 d1 out: ld1 ;
and: ld0 in: a4n a3n a2n a1n a0n d0 out: ld0 ;
or: z in: ld[31:0] out: z ;
# complete RAM ;
ckt: fRAM
in: clk radd[4:0] wadd[4:0] wen di[3:0] oen
con: async dpr
out: dout[3:0] ;
# config bits (async=1 => asynchronous) (dpr=1 => dualport) ;
# RAM core ;
word: w0 in: ld0 din[3:0] out: w0d[3:0] ;
word: w1 in: ld1 din[3:0] out: w1d[3:0] ;
word: w2 in: ld2 din[3:0] out: w2d[3:0] ;
100
word: w3 in: ld3 din[3:0] out: w3d[3:0] ;
word: w4 in: ld4 din[3:0] out: w4d[3:0] ;
word: w5 in: ld5 din[3:0] out: w5d[3:0] ;
word: w6 in: ld6 din[3:0] out: w6d[3:0] ;
word: w7 in: ld7 din[3:0] out: w7d[3:0] ;
word: w8 in: ld8 din[3:0] out: w8d[3:0] ;
word: w9 in: ld9 din[3:0] out: w9d[3:0] ;
word: w10 in: ld10 din[3:0] out: w10d[3:0] ;
word: w11 in: ld11 din[3:0] out: w11d[3:0] ;
word: w12 in: ld12 din[3:0] out: w12d[3:0] ;
word: w13 in: ld13 din[3:0] out: w13d[3:0] ;
word: w14 in: ld14 din[3:0] out: w14d[3:0] ;
word: w15 in: ld15 din[3:0] out: w15d[3:0] ;
word: w16 in: ld16 din[3:0] out: w16d[3:0] ;
word: w17 in: ld17 din[3:0] out: w17d[3:0] ;
word: w18 in: ld18 din[3:0] out: w18d[3:0] ;
word: w19 in: ld19 din[3:0] out: w19d[3:0] ;
word: w20 in: ld20 din[3:0] out: w20d[3:0] ;
word: w21 in: ld21 din[3:0] out: w21d[3:0] ;
word: w22 in: ld22 din[3:0] out: w22d[3:0] ;
word: w23 in: ld23 din[3:0] out: w23d[3:0] ;
word: w24 in: ld24 din[3:0] out: w24d[3:0] ;
word: w25 in: ld25 din[3:0] out: w25d[3:0] ;
word: w26 in: ld26 din[3:0] out: w26d[3:0] ;
word: w27 in: ld27 din[3:0] out: w27d[3:0] ;
word: w28 in: ld28 din[3:0] out: w28d[3:0] ;
word: w29 in: ld29 din[3:0] out: w29d[3:0] ;
word: w30 in: ld30 din[3:0] out: w30d[3:0] ;
word: w31 in: ld31 din[3:0] out: w31d[3:0] ;
# input latches ;
not: we in: wen out: we ;
lat: wr in: men we out: wr ;
lat: wa0 in: men wadd0 out: wa0 ;
lat: wa1 in: men wadd1 out: wa1 ;
lat: wa2 in: men wadd2 out: wa2 ;
101
lat: wa3 in: men wadd3 out: wa3 ;
lat: wa4 in: men wadd4 out: wa4 ;
lat: din0 in: men di0 out: din0 ;
lat: din1 in: men di1 out: din1 ;
lat: din2 in: men di2 out: din2 ;
lat: din3 in: men di3 out: din3 ;
not: ckn in: clk out: ckn ;
or: men in: async ckn out: men ;
or: sen in: async clk out: sen ;
dec: wdec in: wa[4:0] sen wr out: ld[31:0] ;
mux2: ra0 in: wadd0 radd0 dpr out: ra0 ;
mux2: ra1 in: wadd1 radd1 dpr out: ra1 ;
mux2: ra2 in: wadd2 radd2 dpr out: ra2 ;
mux2: ra3 in: wadd3 radd3 dpr out: ra3 ;
mux2: ra4 in: wadd4 radd4 dpr out: ra4 ;
rmux: do0 in: ra[4:0] w[31:0]d0 out: do0 ;
rmux: do1 in: ra[4:0] w[31:0]d1 out: do1 ;
rmux: do2 in: ra[4:0] w[31:0]d2 out: do2 ;
rmux: do3 in: ra[4:0] w[31:0]d3 out: do3 ;
or: dout0 in: oen do0 out: dout0 ;
or: dout1 in: oen do1 out: dout1 ;
or: dout2 in: oen do2 out: dout2 ;
or: dout3 in: oen do3 out: dout3 ;
102
Appendix B
VHDL Code for March Y algorithm
library ieee;
use ieee.std_logic_1164.all;
use ieee.std_logic_unsigned.all;
entity fsm is Generic(
CLKEDGE : std_logic := ?1?;
DONE_LEVEL : std_logic := ?1?;
WEN_ACTIVE : std_logic := ?1?;
OEN_ACTIVE : std_logic := ?0?;
ADDRESSWIDTH : Integer := 5;
DATAWIDTH : Integer:= 1);
port (
Reset : in std_logic;
Clk : in std_logic;
WEN : out std_logic;
OEN : out std_logic;
Data : out std_logic_vector(DATAWIDTH-1 downto 0);
RWAddress : out std_logic_vector(ADDRESSWIDTH-1 downto 0);
DONE: out std_logic);
end fsm;
architecture fsm of fsm is
type phases is (Init,Phase1,phase2,phase3,phase4);
type elements is (ele1,ele2,ele3);
signal phase : phases := Init;
signal Element : elements := ele1;
signal Address : std_logic_vector ( ADDRESSWIDTH-1 downto 0);
constant MAXADDRESS : std_logic_vector ( ADDRESSWIDTH-1 downto 0)
:= (others => ?1?);
constant MINADDRESS : std_logic_vector (ADDRESSWIDTH-1 downto 0)
:= (others => ?0?);
103
begin
p0: Process( Reset, Clk )
begin
if ( Reset = ?1? ) then
Address <= MAXADDRESS;
WEN <= WEN_ACTIVE;
OEN <= not(OEN_ACTIVE);
Data <= (others => ?0?);
Element <= ele1;
Phase <= Init;
DONE <= not(DONE_LEVEL);
elsif (Clk = CLKEDGE and Clk?Event) then
case Phase is
when Init =>
Address <= MAXADDRESS;
WEN <= WEN_ACTIVE;
OEN <= not(OEN_ACTIVE);
Data <= (others => ?0?);
Element <= ele1;
Phase <= Phase1;
when phase1 => -- D w 0
if ( Address /= MINADDRESS ) then
Address <= Address - ?1?;
WEN <= WEN_ACTIVE;
OEN <= not(OEN_ACTIVE);
Element <= ele1;
Data <= (others => ?0?);
else -- U r 0
Address <= MINADDRESS;
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Data <= (others => ?0?);
Phase <= Phase2;
Element <= ele2;
end if;
104
when phase2 => -- U r 0 w 1 r 1
case Element is
when ele2 =>
WEN <= WEN_ACTIVE;
OEN <= not(OEN_ACTIVE);
Data <= (others => ?1?);
Element <= ele3;
when ele3 =>
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Data <= (others => ?1?);
Element <= ele1;
when ele1 =>
if ( Address /= MAXADDRESS ) then
Address <= Address + ?1?;
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Element <= ele2;
Data <= (others => ?0?);
else -- D r 1
Address <= MAXADDRESS;
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Data <= (others => ?1?);
Phase <= Phase3;
Element <= ele2;
end if;
when others =>
end case;
when phase3 => -- D r 1 w 0 r 0
case Element is
when ele2 =>
WEN <= WEN_ACTIVE;
OEN <= not(OEN_ACTIVE);
Data <= (others => ?0?);
105
Element <= ele3;
when ele3 =>
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Data <= (others => ?0?);
Element <= ele1;
when ele1 =>
if ( Address /= MINADDRESS ) then
Address <= Address - ?1?;
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Element <= ele2;
Data <= (others => ?1?);
else -- U r 0
Address <= MINADDRESS;
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Data <= (others => ?0?);
Phase <= Phase4;
Element <= ele1;
end if;
when others =>
end case;
when phase4 => -- U r 0
if ( Address /= MAXADDRESS ) then
Address <= Address + ?1?;
WEN <= not(WEN_ACTIVE);
OEN <= OEN_ACTIVE;
Element <= ele1;
Data <= (others => ?0?);
else -- D w 0
Address <= MAXADDRESS;
WEN <= WEN_ACTIVE;
OEN <= not(OEN_ACTIVE);
Data <= (others => ?0?);
106
Phase <= Phase1;
Done <= DONE_LEVEL;
Element <= ele1;
end if;
end case;
end if;
end process;
RWAddress <= Address;
end fsm ;
107
Appendix C
March LR Algorithm and its input file format for testing 16-bit wide
Block RAMs
C.1 March LR Algorithm with BDS for 16-bit Wide RAMs
mw0000000000000000
+r0000000000000000;w1111111111111111
*r1111111111111111;w0000000000000000;r0000000000000000;
r0000000000000000;w1111111111111111
*r1111111111111111;w0000000000000000
*r0000000000000000;w1111111111111111;r1111111111111111;
r1111111111111111;w0000000000000000
+r0000000000000000;w0101010101010101;w1010101010101010;
r1010101010101010
*r1010101010101010;w0101010101010101;r0101010101010101
+r0101010101010101;w1100110011001100;w0011001100110011;
r0011001100110011
*r0011001100110011;w1100110011001100;r1100110011001100
+r1100110011001100;w0000111100001111;w1111000011110000;
r1111000011110000
*r1111000011110000;w0000111100001111;r0000111100001111
+r0000111100001111;w0000000011111111;w1111111100000000;
r1111111100000000
*r1111111100000000;w0000000011111111;r0000000011111111
+r0000000011111111
C.2 RAMBISTGEN Input File Format for Generating VHDL Code
u w 0000000000000000
d r 0000000000000000, w 1111111111111111
u r 1111111111111111, w 0000000000000000,r 0000000000000000,
r 0000000000000000 ,w 1111111111111111
u r 1111111111111111, w 0000000000000000
u r 0000000000000000, w 1111111111111111,r 1111111111111111,
108
r 1111111111111111, w 0000000000000000
d r 0000000000000000, w 0101010101010101,w 1010101010101010,
r 1010101010101010
u r 1010101010101010, w 0101010101010101,r 0101010101010101
d r 0101010101010101, w 1100110011001100,w 0011001100110011,
r 0011001100110011
u r 0011001100110011, w 1100110011001100,r 1100110011001100
d r 1100110011001100, w 0000111100001111,w 1111000011110000,
r 1111000011110000
u r 1111000011110000, w 0000111100001111,r 0000111100001111
d r 0000111100001111, w 0000000011111111,w 1111111100000000,
r 1111111100000000
u r 1111111100000000, w 0000000011111111,r 0000000011111111
d r 0000000011111111
109