Alternative Techniques for Built-In Self-Test of Field Programmable
Gate Arrays
Except where reference is made to the work of others, the work described in this thesis
is my own or was done in collaboration with my advisory committee. This thesis does
not include proprietary or classifled information.
Aditya Newalkar
Certiflcate of Approval:
Victor P. Nelson
Professor
Electrical and Computer Engineering
Charles E. Stroud, Chair,
Professor
Electrical and Computer Engineering
Foster Dai
Associate Professor
Electrical and Computer Engineering
Stephen L. McFarland
Acting Dean, Graduate School
Alternative Techniques for Built-In Self-Test of Field Programmable
Gate Arrays
Aditya Newalkar
A Thesis
Submitted to
the Graduate Faculty of
Auburn University
in Partial Fulflllment of the
Requirements for the
Degree of
Master of Science
Auburn, Alabama
August 8, 2005
Alternative Techniques for Built-In Self-Test of Field Programmable
Gate Arrays
Aditya Newalkar
PermissionisgrantedtoAuburnUniversitytomakecopiesofthisthesisatitsdiscretion,
upon the request of individuals or institutions and at their expense. The author reserves
all publication rights.
Signature of Author
Date
Copy sent to:
Name Date
iii
Vita
Aditya Newalkar, son of Anil and Sugandha Newalkar was born on April 17, 1979,
in Mumbai, India. He graduated with Bachelor of Engineering degree in Electronics
Engineering from Mumbai University in December 2000. What started as a student
project at Indian Institute of Technology (IIT), Powai, Mumbai in year 1999 grew into
valuable two year research experience for him after graduating from Mumbai University.
While in pursuit of his Master of Science degree at Auburn University, he received
guidance of Dr. Charles Stroud in the Electrical and Computer Engineering department.
He worked as an intern in Medtronic Navigation in Louisville, CO.
iv
Thesis Abstract
Alternative Techniques for Built-In Self-Test of Field Programmable
Gate Arrays
Aditya Newalkar
Master of Science, August 8, 2005
(B.E., Mumbai University, 2000)
174 Typed Pages
Directed by Charles E. Stroud
IntheBuilt-InSelf-Testmethodoftestingthelogicandinterconnectresourcesofthe
Field Programmable Gate Arrays (FPGAs), conflguration time and time to retrieve of
thetestresultsdominatesthedurationofthetest. Thetechniquespresentedinthisthesis
ofier reduction in the conflguration time and result retrieval time for the Built-In Self-
Test using partial reconflguration and partial conflguration memory readback. Though
the work has been done targeting Xilinx Virtex-I and Spartan-II FPGAs, the method is
general enough to be applied on any FPGA featuring Partial Run Time Reconflguration
(PRTR).WealsoevaluatetheComputerAidedDesign(CAD)toolsthataremainlyused
for partial reconflguration, for their usefulness in generating test conflgurations for the
programmable interconnect and logic resources of an FPGA using the Built-In Self-Test
method.
v
Acknowledgments
TheauthorwouldliketothankDr. CharlesStroudforgivinghiminsightonthesub-
jectofBuilt-InSelf-TestforFPGAs. Theauthoradmireshisrelentlesspursuitforquality
andisthankfulforhispatience. Specialthankstoauthor?sfamily, Anil, Sugandha, Bhau
Newalkar and Siddharth Tambe for their unconditional love, support and continuous en-
couragement. The author thanks all friends from Auburn for making his time enjoyable.
Finally, principles by which Mohandas Gandhi lived his life give author strength and
inspiration.
vi
Style manual or journal used Journal of Approximation Theory (together with the
style known as \aums"). Bibliograpy follows van Leunen?s A Handbook for Scholars.
Computer software used The document preparation package TEX (speciflcally
LATEX) together with the departmental style-flle aums.sty.
vii
Table of Contents
List of Tables xi
List of Figures xiii
1 Introduction 1
1.1 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 Programmable Logic Blocks . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 Programmable Interconnection Network . . . . . . . . . . . . . . . 3
1.1.3 Programmable I/O Cells . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Flow of Design with FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Advantages of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Reconflgurable Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Dynamic Reconflguration . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Static Reconflguration . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3 Partial Reconflguration . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Testing of FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Built-In Self Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.6.1 BIST for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Review of Partial Reconfiguration and BIST 14
2.1 Architecture of Virtex-I and Spartan-II FPGAs . . . . . . . . . . . . . . . 14
2.1.1 PLB Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1.2 Interconnect Architecture . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.3 Block RAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Conflguration of the FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 SelectMAP Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Boundary Scan Mode . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Start-up Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Conflguration Memory Architecture of Virtex-I and Spartan-II FPGAs . . 23
2.3.1 Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.3.2 Frame Organization . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3.3 Conflguration Registers . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3.4 Full Reconflguration Bitstream . . . . . . . . . . . . . . . . . . . . 32
2.4 Readback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 Readback Veriflcation . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.2 Readback Capture . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.3 Readback Operations . . . . . . . . . . . . . . . . . . . . . . . . . 34
viii
2.5 Partial Reconflguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.5.1 Partial Reconflguration without Shutdown Sequence . . . . . . . . 35
2.5.2 Partial Reconflguration with Shutdown Sequence . . . . . . . . . . 36
2.5.3 BitGen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.5.4 JBits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.6 BIST for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.1 Logic BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.6.2 Interconnect BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.6.3 BIST for Xilinx FPGAs . . . . . . . . . . . . . . . . . . . . . . . . 54
2.6.4 Using JBits API to Generate Interconnect BIST Conflgurations . . 57
2.7 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3 Partial Reconfiguration and Readback for Logic BIST 61
3.1 Floorplan of Logic BIST to Aid Partial Reconflguration . . . . . . . . . . 62
3.2 Generating Partial Reconflguration Files . . . . . . . . . . . . . . . . . . . 63
3.2.1 Using BitGen . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.3 Generating a Test Plan for Logic BIST . . . . . . . . . . . . . . . . . . . . 66
3.4 Experimental Results for Logic BIST . . . . . . . . . . . . . . . . . . . . . 73
3.5 Partial Conflguration Memory Readback to Retrieve the BIST Results . . 76
3.5.1 Commands for Partial Conflguration Memory Readback . . . . . . 80
3.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4 Generating Routing BIST Configurations using JBits 83
4.1 Overview of Routing BIST Architecture . . . . . . . . . . . . . . . . . . . 84
4.1.1 Testing the Interconnects in Parallel . . . . . . . . . . . . . . . . . 86
4.2 The Routing BIST RTPCores . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.2.1 Conflguring the TPG . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.2.2 Conflguring the ORA . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.2.3 Routing the WUTs . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
4.2.4 Populating the PLB Array . . . . . . . . . . . . . . . . . . . . . . 94
4.2.5 Generating the XDL File . . . . . . . . . . . . . . . . . . . . . . . 98
4.3 Experimental Results of Routing BIST . . . . . . . . . . . . . . . . . . . . 98
4.3.1 Partial Reconflguration and Routing BIST . . . . . . . . . . . . . 102
4.3.2 Test Phase Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . 104
4.4 Calculation of the Total Number of Interconnect BIST Conflgurations
Required . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
4.4.1 Hex Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4.2 Single Interconnects . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4.3 Switch Box CIPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
4.4.4 MUX CIPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
4.5 Generating Conflgurations for Switch-Box CIPs . . . . . . . . . . . . . . . 120
4.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
ix
5 Summery and Future Work 123
Bibliography 127
Appendices 132
A Steps in Writing Parent RTPCore 133
B Steps in Writing Child RTPCores 138
C Complete Program Source 140
D Complete List of Connections Between the Mux CIPs 158
x
List of Tables
2.1 Virtex TAP Controller Pins . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Constants Used in the Address Calculation [Xil03d]. . . . . . . . . . . . . 26
2.3 Variables Used for Address Calculation [Xil03d] . . . . . . . . . . . . . . 26
2.4 Calculating the Location of the LUT RAM Bit in Virtex-I Bitstream
[Xil03d] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 Equations for Calculating PLB FF Location in the Bitstream [Xil03d] . . 27
2.6 PLB Column Frame Organization . . . . . . . . . . . . . . . . . . . . . . 27
2.7 Conflguration Registers [Xil03d] . . . . . . . . . . . . . . . . . . . . . . . . 29
2.8 Command Header Format [Xil02d] . . . . . . . . . . . . . . . . . . . . . . 29
2.9 Conflguration Commands and their Usage [Xil03d] [Xil04] . . . . . . . . 30
2.10 Readback Commands Required to Perform Readback on PLB Conflgura-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.11 Classes Used for Bit Level Manipulation of PLB Elements [Xil01d] . . . . 42
2.12 Classes Used for Bit Level Manipulation of Switch Box CIPs[Xil01d] . . . 43
2.13 Classes Used for Bit Level Manipulation of Output MUX CIPs[Xil01d] . . 44
2.14 Classes Used for Bit Level Manipulation of input MUX CIPs[Xil01d] . . . 44
2.15 Model of Interconnect Resources in the Package
com.xilinx.JRoute2.Virtex.ResourceDB [Xil01d] . . . . . . . . . . . . . . . 45
3.1 Command Listing for Scenario 1 . . . . . . . . . . . . . . . . . . . . . . . 70
3.2 Command Listing for Scenario 2 . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 Command Listing for Scenario 3 . . . . . . . . . . . . . . . . . . . . . . . 72
xi
3.4 Command Listing for Scenario 4 . . . . . . . . . . . . . . . . . . . . . . . 72
3.5 Partial Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.6 Partial Frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.7 Sizes of Partial Bitstreams vs. Full Bitstreams . . . . . . . . . . . . . . . 75
3.8 Comparison of Boundary Scan Access Method and Partial Conflguration
Memory Readback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
3.9 Bitstream for Partial Conflguration Memory Readback . . . . . . . . . . . 81
4.1 Connectivity between Output Multiplexers and Interconnects . . . . . . . 87
4.2 Input and Output ports of TPGCounterCore . . . . . . . . . . . . . . . . 91
4.3 Input and Output Ports of ORACore . . . . . . . . . . . . . . . . . . . . . 93
4.4 Command Line Arguments Available for the JBits Program . . . . . . . . 99
4.5 Possible Values of the Command Line Arguments . . . . . . . . . . . . . . 100
4.6 Command Listing for Generating Partial Bitstreams for Routing BIST . . 103
4.7 Sizes of Partial Bitstreams vs. Full Bitstreams . . . . . . . . . . . . . . . 104
4.8 Routing BIST and Test Phase Sequence . . . . . . . . . . . . . . . . . . . 106
4.9 Mapping of the CIPs in Various JBits Classes [Xil01d] . . . . . . . . . . . 111
4.10 MUX CIPs in Virtex-I Architecture and their Functions [Xil01d] . . . . . 119
4.11 MUX CIP Groups Tested in Parallel . . . . . . . . . . . . . . . . . . . . . 119
4.12 Testing MUX CIPs for Stuck-On and Stuck-Ofi Faults [SWHA98] . . . . 120
C.1 Input and Output ports of LUT5 . . . . . . . . . . . . . . . . . . . . . . . 140
D.1 Mux CIPs Mux28to1 and Connecting Single Interconnects . . . . . . . . . 158
D.2 MUX CIPs Mux16to1 and Connecting Interconnects . . . . . . . . . . . . 159
xii
List of Figures
1.1 General Architecture of FPGA . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Typical Architecture of PLB [Str02] . . . . . . . . . . . . . . . . . . . . . 3
1.3 Typical CIP Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Spatial Vs. Temporal Computing [DeH00] . . . . . . . . . . . . . . . . . 7
1.5 BIST for FPGA [AS01] . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Internal Architecture of Virtex-I Slice [Xil01b] . . . . . . . . . . . . . . . 16
2.2 Difierent Types of CIPs Found in FPGA [SWHA98] [FH03] . . . . . . . . 18
2.3 Switch Box CIP and Xilinx Interconnect Architecture . . . . . . . . . . . 19
2.4 Block RAM in Virtex-I and Spartan-II FPGAs . . . . . . . . . . . . . . . 20
2.5 Xilinx Virtex-I and Spartan-II Addressing Scheme [Xil03d] . . . . . . . . 25
2.6 Design Flow of the Application with JBits . . . . . . . . . . . . . . . . . . 38
2.7 JBits Program for Manual Routing . . . . . . . . . . . . . . . . . . . . . . 47
2.8 BIST for FPGA Interconnect Resources [SWHA98] . . . . . . . . . . . . . 51
2.9 FPGA Floorplan for Online Interconnect Testing [AES01] . . . . . . . . . 53
2.10 FPGA Floorplan with \Galaxy" BIST [SNLA02] . . . . . . . . . . . . . . 54
2.11 Complete Testing of Switch Boxes [SWHA98] . . . . . . . . . . . . . . . . 56
2.12 Scan Cell Interfacing with IEEE 1149.1 [HGWS99] . . . . . . . . . . . . . 58
3.1 Floorplan for BIST Test Session . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Four Difierent Test Plans for Testing Two Slices . . . . . . . . . . . . . . 69
4.1 HorizontalandVerticalInterconnectResourcesTestedforShortsandOpens 85
xiii
4.2 Hex and Single Wires Tested for Shorts and Opens . . . . . . . . . . . . . 88
4.3 Conflguration of Comparator-Based ORA . . . . . . . . . . . . . . . . . . 94
4.4 JBits Program for Routing the WUTs . . . . . . . . . . . . . . . . . . . . 95
4.5 Boundary Condition for Populating PLB Array in Vertical Direction . . . 97
4.6 Switch Box CIP and Xilinx Interconnect Architecture . . . . . . . . . . . 107
4.7 Sections of Switch Box CIP . . . . . . . . . . . . . . . . . . . . . . . . . . 110
4.8 Test Conflgurations Needed to Completely Test Switch Box CIPs . . . . . 113
4.9 Test Conflgurations Continued... . . . . . . . . . . . . . . . . . . . . . . . 114
4.10 Routing BIST Conflguration for Testing Mux CIPs . . . . . . . . . . . . . 115
4.11 Testing MUX CIPs in Parallel [RPFZ99] . . . . . . . . . . . . . . . . . . . 118
4.12 Problem of Undetected Faults Due to Invisible Logic in MUX [AS01] . . . 118
A.1 JBits Program for Instantiating a Counter core . . . . . . . . . . . . . . . 136
A.2 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 137
C.1 Conflguration of LUT5 RTPCore . . . . . . . . . . . . . . . . . . . . . . . 141
C.2 JBits Program for User Interaction and Populating the PLB Array . . . . 143
C.3 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 144
C.4 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 145
C.5 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 146
C.6 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 147
C.7 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 148
C.8 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 149
C.9 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 150
C.10 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 151
xiv
C.11 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 152
C.12 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 153
C.13 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 154
C.14 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 155
C.15 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 156
C.16 JBits Program Continued... . . . . . . . . . . . . . . . . . . . . . . . . . . 157
xv
Chapter 1
Introduction
Field programmable gate arrays (FPGAs) have evolved from simple programmable
logic devices (PLDs)likeprogrammable array logic (PALs)andprogrammable logic arrays
(PLAs). These early devices were used in digital design as glue logic. As the need of
the digital system designers grew from simple decoders to more complicated designs
like protocol resolvers, multiple PLDs were connected through a programmable routing
architecture to form the FPGA [BR96]. This architecture gives the user the ability to
program the interconnects to realize various types of complex digital designs. Due to
their size and programmability, testing modern FPGAs has become a complex and time
consuming task.
1.1 FPGA Architecture
The Figure 1.1 shows the general architecture of a typical FPGA. The FPGA con-
sists of uncommitted resources of an NxM array of programmable logic blocks (PLBs),
programmable input and output (I/O) cells, a programmable interconnection network,
and a conflguration memory to program the device.
1.1.1 Programmable Logic Blocks
The PLBs of most FPGAs contain multiplexers, look up tables (LUTs) and  ip-
 ops. An important characteristic of a PLB is its functionality, deflned as the number
of difierent boolean functions that it can implement [BFRV92]. The elements in the
1
Programmable I/O Blocks Programmable Logic Blocks
Programmable Interconnect Network
Figure 1.1: General Architecture of FPGA
2
PLB
Outputs
MUXs
Output
PLB
Outputs
LUT/RAM
LUT/RAM
FF
FF
Figure 1.2: Typical Architecture of PLB [Str02]
PLB architecture can be programmed to function in difierent modes of operation. The
LUT can be used either in a LUT mode of operation or random access memory (RAM)
mode of operation. In the LUT mode, the element can implement combinational logic
functions of multiple inputs (typically 3 to 4). In the RAM mode of operation, the PLB
can logically be conflgured to behave either as a synchronous or asynchronous, single
port or dual port RAM. The  ip- ops can be conflgured in latch mode or edge-triggered
mode, with asynchronous or synchronous preset/clear, and programmable clock enable.
The multiplexers can be selected to connect the LUT outputs to the  ip- ops or to
bypass the  ip- ops [Str02]. Thus the PLB contains the functionality to implement any
combinational or sequential logic functions using the logic resources in the architecture
as shown in the Figure 1.2. The mode of operation for each element is selected when
the device is programmed or conflgured.
1.1.2 Programmable Interconnection Network
The programmable interconnect network in an FPGA, also called its routing archi-
tecture [BFRV92], consists of segments of wires of various lengths and programmable
3
Wire A Wire B
Configuration
Memory Bit
Figure 1.3: Typical CIP Structure
switches. There are global routing resources and local routing resources. Global routing
resources facilitate the routing of the signals between the PLBs that are separated by
other PLBs. Local routing resources facilitate the routing between PLBs that are next
to each other in the array [SNLA02]. The connections are made via conflgurable inter-
connect points (CIPs), also referred to as programmable interconnect points (PIPs). A
CIP consists of a transmission gate controlled by conflguration memory bit. As shown
in the Figure 1.3, the connection between the wire segments A and B is made or broken
depending upon the logic value of the conflguration memory bit [Str02].
1.1.3 Programmable I/O Cells
Most of the I/O pins of the FPGA can be conflgured in input, output or bi-
directional mode of operation. The I/O cells can also be programmed as registered
or latched I/Os depending on the design. The I/O cells support TTL as well as CMOS
I/O standards thereby eliminating need for the voltage shifters for interfacing [Lat02]
[Xil01b].
4
1.2 Flow of Design with FPGAs
The circuit designer typically implements the design in a hardware description lan-
guage (HDL) and synthesizes the circuit description with the help of one or more com-
puter aided design (CAD) tools. The CAD tools generate a bitstream flle that contains
programming instructions and data to establish the application speciflc system function-
ality of the various programmable resources of the device like PLBs, routing architecture
and I/O blocks. This bitstream flle is then loaded into the FPGA chip using one of
the conflguration interfaces provided for the FPGA [Jay01]. The process of loading a
design-speciflc bitstream into one or more FPGAs to deflne the functional operation of
the PLBs, the interconnect resources and I/O blocks is known as conflguring or down-
loading the bitstream to the device [Xil99]. The signiflcance of the FPGA design lies in
the fact that static random access memory (SRAM) based FPGAs can be reconflgured
anunlimitednumberoftimes, implementingadifierentdesigneachtime. Inordertoim-
plement a difierent design simply requires overwriting the previous conflguration loaded
into the SRAM with a new bitstream through the conflguration interfaces provided by
the FPGA manufacturer [Xil99].
1.3 Advantages of FPGAs
FPGAs provide a low cost solution to low volume products where user programma-
bility is needed at the deployment time. Application speciflc integrated circuits (ASICs)
edge out FPGAs in high volume products in terms of unit costs and performance param-
eters, which are signiflcantly better than FPGAs. The reason for signiflcant difierence
in performance parameters is that the  exibility provided by the programmability in
5
FPGAs is at the cost of substantial signal delays and area overhead introduced by the
programming circuitry [AR94]. However, FPGAs ofier some advantages over ASICs
including:
? Low cost solution for low volume applications,
? Low non-recurring engineering (NRE) costs, and
? Rapid prototyping [Mil94].
1.4 Reconflgurable Computing
Traditionally, software is considered to be a component that is  exible, relatively
slow and ine?cient compared to hardware. The hardware is perceived to be customized
to the problem and faster compared to the software [DW99]. Hardware can be de-
signed to execute functions concurrently. Therefore, at any given time there are multiple
computing elements actively performing their functions. This is referred to as spatial
computing. In a conventional processor, instructions are executed serially using memory
orregisterstostoretheprogramvariables. Thisiscalledtemporalcomputing. Figure1.4
shows examples of spatial and temporal computing. A conventional digital signal pro-
cessor (DSP) would take multiple instruction cycles to execute a fllter algorithm (Figure
1.4(b)), while the spatial implementation of the same fllter in an FPGA gives a new
result every cycle in a pipelined fashion, thus higher throughput is observed (Figure
1.4(a)) [DeH00]. The idea of reconflgurable computing tries to bring together best of
both worlds. Reconflgurable computing relies on devices, like FPGAs, that are user pro-
grammable an unlimited number of times. The more generalized resources or structures
like LUTs,  ip- ops, and SRAMs, that are provided in an FPGA, can be conflgured to
6
+ + + +
W1 W2 W3 W4
Yi?6
Xi
X2
X4
W1
W2
W3X3
X1
W4
t1
Ax
t2
Ay
X4       X3//X[i?3]
X3        X2//X[i?2]
X2          X1//X[i?1]
Ax        Ax ? 1
X1         Ax//x[i]
t1            W1 x X1
Ay        Ay + 1
[Ay]         t1
t1       t1+t2
t2        W4 * X4
t1         t1 + t2
t2          W3 * X3
t1         t1  + t2
t2            W2 x X2
(a) Spatial Computation
(b) Temporal Computation
Figure 1.4: Spatial Vs. Temporal Computing [DeH00]
execute the functions spatially. The process of reconflguration gives the ability to load
difierent functions in the FPGA serially in time, taking advantage of temporal comput-
ing [DeH00] [DW99]. Because of this inherent parallelism in the FPGA architecture,
these devices frequently show an order of magnitude higher performance than a general
purpose processor [DeH00][CHW00][GSB+00] [DW99].
1.4.1 Dynamic Reconflguration
Dynamic or runtime reconflguration is a process where the reconflgurable unit is
conflgured without interrupting the conflgured system function [ESSA00]. The reason
for such an arrangement might be that the design is partitioned into many small parts,
either too large or too many to flt in the FPGA simultaneously. These partitions of the
original design can be loaded into the FPGA without interrupting the function of other
7
partition(s) already loaded into the FPGA. An external agent like a microprocessor may
be used to control which partition(s) is(are) loaded and in what order [ESSA00].
1.4.2 Static Reconflguration
Static or compile time reconflguration is an idea that can be deflned as an inverse
of dynamic reconflguration, where the reconflgurable unit is conflgured while it is idle
or inactive. Most FPGAs are capable of reading the conflguration data from an elec-
trically erasable programmable read-only memory (EEPROM) when the power is turned
on [Xil99]. This is referred to as power-on conflguration. This is an example of static
reconflguration.
1.4.3 Partial Reconflguration
Complete reconflguration of an FPGA chip can be an onerous process [HLS98].
The conflguration time varies depending on the size of the bitstream to be loaded into
the FPGA. This delay could be unacceptable in high performance systems, which are
expected to be reconflgured many times to execute the system function. If the circuit
implemented in one conflguration is not signiflcantly difierent from the one implemented
in the next conflguration, the conflguration time can be reduced if the next conflgu-
ration bitstream were to contain only the programming instructions and data for the
programmable resources of the FPGA that are conflgured difierently from the previous
conflguration. The size of the partial reconflguration bitstream is now reduced as it con-
tainsonlythedifierencebetweenthepriorfullconflgurationandthesubsequentreconflg-
uration [HLS98]. The problem in implementing such a scheme is that the FPGAs should
have architectural support for conflguring only part of their programmable resources,
8
referred to as partial reconflguration. For the applications that need to reconflgure only
part of their logic depending on the circumstances, this raises an exciting possibility of
gaining signiflcant time advantage. Fault-tolerant applications are one example of such
applications that beneflt from partially reconflgurable architectures [ESSA00].
1.5 Testing of FPGAs
Commercially available FPGAs have reached gate counts of 8 million, feature banks
of RAMs, hundreds of user I/Os and are capable of running at clock speeds of 400 MHz
[Xil02e]. Such high performance FPGA-based systems, when subjected to aging and
environment (temperature, humidity, vibration, cosmic rays and fi-rays) are vulnerable
tofaults[ESSA00]. ThereforehavingagoodFPGAtestingmethodisevermoreessential.
Testing of FPGAs poses a difierent set of challenges than ASICs. The challenge is
to test PLBs as well as interconnect resources in all possible modes of operation. It is an
important consideration for safety-critical applications because if the test methodology
tests only the normal mode of operation for a given system function, when the FPGA
is reconflgured to implement difierent system function, the latent faults may take over
and hamper the system function [AS01] [SS99]. Testing the PLBs and interconnect
in all possible modes of operation is advantageous for the fault-tolerant applications to
identify if any particular mode of PLB operation is faulty so it can be used in one of
the other fault-free modes. The testing method is required to detect single and multiple
faults in PLBs and interconnects. Meeting this requisite entails selection of a method
thatiscapableofin-systemtesting. Ideally, thetestingmethodshouldnotintroduceany
overhead of area and delay penalties [AS01]. Diagnostics provided by the test method
9
should enable the user to identify and locate the defective module for fault-tolerant
applications [SNLA02].
1.6 Built-In Self Test
Built-In Self Test (BIST) is a design-for-testability (DFT) approach in which test-
ing (test pattern generation, application and output response analysis) is accomplished
through built-in hardware features [AKS93]. In other words, the BIST circuitry is part
of the hardware that it tests. One of the advantages of BIST includes the capability of
in-system testing without the need of external test equipment.
For ASIC testing, BIST has area overhead and delay penalties. However, when
viewed in the context of FPGAs, BIST ofiers a unique advantage over the external
or internal dedicated BIST circuitry: SRAM based FPGAs are reconflgurable and are
capable of implementing any given design. While testing, the FPGA may be conflgured
as a BIST circuit, the tests are run and the results are obtained. If the device passes the
test, it can be reconflgured to implement the desired system function. If one or more
faults are detected and identifled, the system function can be reconflgured to avoid the
fault(s) for fault-tolerant applications. Thus, the BIST circuit would \disappear" in the
reconflguration process after testing of the device is complete. Therefore, it can be said
that testability is achieved without any area or performance penalties [AS01].
1.6.1 BIST for FPGAs
An example of the structure of BIST for FPGAs is as follows: a group of PLBs
are conflgured as test pattern generators (TPGs), blocks under test (BUTs) and output
10
BUT BUT BUT BUT
BUT BUT BUT BUT
ORA ORA ORA ORA
TPG TPG
BIST Start BIST Done
Pass/Fail
Figure 1.5: BIST for FPGA [AS01]
response analyzers (ORAs) as illustrated in Figure 1.5. The TPGs generate test pat-
terns that are applied as inputs to the BUTs. All BUTs are conflgured and tested in
identical modes of operation. The outputs of the identically programmed BUTs for a
set of test patterns generated by TPGs are compared by the output response analyzers
(ORAs) giving a single Pass/Fail indication at the end of the BIST sequence depend-
ing on whether a mismatch in the BUT outputs was observed [SCKA96] [SKCA96]
[AES01] [SNLA02] [AS01]. The structure of TPGs can be as simple as a N-bit counter
or a linear feedback shift register (LFSR) [Str02]. ORAs consist of comparators with a
latch to retain any mismatch observed by the comparison.
The problem with BIST in regards to PLB and interconnect testing is that the
FPGA needs to be reconflgured many times, each time testing a difierent mode of oper-
ation. For example, to completely test programmable logic in Xilinx 4000 and Spartan
11
series FPGAs, 24 BIST conflgurations are needed, while to completely test interconnects
it takes 206 conflgurations [SLS03]. Therefore, with BIST for FPGAs, testing all modes
of operation of the FPGA requires a large number of time consuming reconflgurations
and thus poses a problem for high performance systems which cannot afiord to spend
system down time in lengthy BIST conflguration.
1.7 Thesis Statement
Since some FPGA architectures are capable of partial reconflguration to speed-up
the reconflguration process, the work presented in this thesis focuses on optimizing the
BIST method for FPGA testing using partial reconflguration. Chapter 2 presents more
details about the full reconflguration and partial reconflguration facility ofiered by the
new Xilinx FPGA families like Spartan-II, Spartan-III, Virtex and Virtex-II. In Chapter
2 reviews CAD tools used for partial reconflguration. It presents details about the
boundary scan interface, used to partially reconflgure these FPGAs. It provides details
about the current state-of-art BIST method for testing PLBs as well as interconnect
resources in FPGAs. Chapter 3 describes how the partial reconflguration can be used for
logic BIST of Xilinx Spartan-II, Spartan-III, Virtex and Virtex-II FPGAs and presents
results from actual BIST of PLBs in Spartan-II and Virtex devices to illustrate the
improvements obtained with partial reconflguration. Chapter 4 explores a technique to
generateinterconnectBISTconflgurationsusingJavaapplicationprogrammer?sinterface
library, JBits. The chapter also describes the experiments performed to generate partial
interconnect BIST conflgurations and the efiects of partial reconflguration on the size
of the routing BIST conflgurations. Chapter 4 also includes estimates of the number
of BIST conflgurations required to test the routing resources in Virtex I and Spartan
12
II FPGAs and considers partial reconflguration for routing BIST. Finally, Chapter 5
presents the summary and conclusions as well as suggestions for the future research and
development. While this thesis will focus on Xilinx FPGAs, it is important to emphasize
that these techniques can be applied to any FPGA that supports partial reconflguration.
13
Chapter 2
Review of Partial Reconfiguration and BIST
ThischapterbeginswithareviewoftheoperationandarchitectureofXilinxFPGAs.
The conflguration memory architectures of Virtex-I and Spartan-II families are then
reviewed. These two FPGAs have nearly identical architectures and will be the target of
thisresearch. Thedifierencesbetweenpartialreconflgurationandfullreconflgurationare
discussed along with the CAD tools used to generate partial reconflguration viz. BitGen
and JBits. Finally, a description of how BIST methods areapplied to test programmable
logic and interconnect of FPGAs is given. In this section, we review the prior work done
using JBits to automatically generate interconnect BIST conflgurations.
2.1 Architecture of Virtex-I and Spartan-II FPGAs
The PLB and interconnect architecture of Xilinx Virtex-I and Spartan-II is similar.
Therefore, unless specifled, whenever reference is made towards Virtex-I architecture it
is assumed that it applies to Spartan-II.
2.1.1 PLB Architecture
The unit logic cell in the PLB, consists of a 4-input LUT, a  ip- op, and additional
dedicated logic. A slice consists of two of these unit logic cells. There are two identical
slices in a PLB [Xil01b]. The internals of a single Virtex-I slice are shown in Figure 2.1.
14
Look-Up Table
Each logic cell in the PLB features a 4-input LUT. The LUT can be used in the
LUTmode, inaRAMmodeorinashiftregistermodeofoperation. IntheLUTmodeof
operation, it acts as a 4-input combinational logic function generator. In the RAM mode
ofoperation,Virtex-IandSpartan-IIcontainsupportforimplementing16x1synchronous
RAM or combining two LUTs in a slice to implement 32x1 or 16x2 synchronous RAM.
The LUT can also implement up to a 16-bit shift register in the shift register mode of
operation [Xil01b].
Flip-Flops
The  ip- ops can be conflgured as edge-triggered  ip- ops or level-sensitive latches,
can be set or reset synchronously or asynchronously, and contain a clock enable signal.
The signals used to set or reset the  ip- ops are shared by both the  ip- ops within
the slice. A global reset signal initializes the storage elements [Xil01b]. The value
with which the  ip- ops are initialized is specifled by a speciflc bit in the conflguration
bitstream. The input signals can be applied at the input of the  ip- op through the
LUT or directly, bypassing the LUT.
Additional Logic
Thededicatedcarrylogicisusedtoimplementthecarrychainsfoundinwideadders
and counters. In order to realize the wide adders and counters, the carry logic takes
carry input from the previous stages [Xil01b]. A dedicated multiplexer CY, illustrated
in Figure 2.1, is utilized to implement wide arithmetic logic functions [Xil03c].
15
F4
F3
F2
F1
I0
I1
I2
I3
I0
I1
I2
I3
LUT
G4
G3
G3
G1
CE
CLK
SR
Cout
Cin
CK
WE
A4
WSO
WSH
BY DG
BX DI
WE DI
LUT
WE DI
O
O
F6
F5
INIT
INITD Q
EC
REV
Q
EC
D
REV
CY
CY
F5in
BY
BX
XB
F5
X
YQ
XQ
Y
YB
Figure 2.1: Internal Architecture of Virtex-I Slice [Xil01b]
16
Using multiplexer F5 in Figure 2.1, either of the outputs of the LUTs can be se-
lected. Thus implementing a 5-input function generator, a 4:1 multiplexer, or selected
functions of up to nine inputs [Xil01b]. The F6 multiplexer, on the other hand, facil-
itates implementation of any 6-input function, an 8:1 multiplexer, or selected functions
of up to 19 inputs [Xil01b].
2.1.2 Interconnect Architecture
CIPs are the programmable switches used to make connections in the global and
local routing resources and the PLBs. There are three basic types of CIPs that can
be found in the FPGA interconnect network: cross-point CIP, break-point CIP and
multiplexerCIP,asillustratedinFigure2.2. Thecross-pointCIPconnectsordisconnects
the connection between a wire segment in the horizontal plane and a wire segment in
the vertical plane, depending on the value loaded in the conflguration memory bit. The
state of the memory bit controlling the break-point CIP determines if the two segments
in the same plane would be connected [SWHA98]. The Xilinx FPGAs feature a switch
box CIP or global routing matrix as referred in the literature from Xilinx [Xil01b]. The
switch box CIP comprises an array of break-point CIPs that can be programmed to
provide a variety of connections in horizontal and vertical routing resources as well as
the PLB inputs and outputs as shown in Figure 2.2(d) [Xil01b]. The multiplexer or
MUX CIP controls connection to the common interconnect from one of the k possible
connections. There are k conflguration memory bits associated with a MUX CIP. The
complete set of switch box CIPs has 24 Single wires emerging from its four sides that
allow connection between the four neighboring switch box CIPs. The Single lines or
x1 lines are part of the local interconnects and provide signal connectivity between the
17
(b) Cross?Point CIP
Wire B
Wire AWire AWire B
(a) Break?Point CIP
Wire AWire B
output
CB
CB
CB
(c) Multiplexer CIP(d) Switch Box CIP
Figure 2.2: Difierent Types of CIPs Found in FPGA [SWHA98] [FH03]
adjacent PLBs. A total of 12 bufiered Hex wires at each of the four sides drive the signal
to switch box CIP that is six PLBs away. The Hex lines or the x6 lines span between
the PLBs separated by flve PLBs and are part of global interconnect resources. A total
of 12 Long wire segments provide connectivity across the horizontal width and vertical
length of the chip [Xil01b]. The Long wires often carry signals to multiple PLBs and
span all the PLBs in horizontal or vertical direction. The switch box featured in Virtex-I
and Spartan-II architecture along with the Hex and Single interconnects, is shown in the
Figure 2.3.
18
24
Single East
Switch Matrix
Hex South
Outputs from PLB
8
24
Single West
Single South
24
Hex North
Hex West Hex East
(0..3)
Inputs to PLB
(4, 6, 8, 10)
(5, 7, 9, 11)
(4, 6, 8, 10)
(5, 7, 9, 11)
(4, 6, 8, 10) (5, 7, 9, 11) (4, 6, 8, 10) (5, 7, 9, 11)
(0:3)
(0:3) (0:3)
(0:23)
(0:23)
(0:23)
(0:23)
24
Single North
(0:7)
Figure 2.3: Switch Box CIP and Xilinx Interconnect Architecture
19
WEA
ENA
RSTA
ADDRA[#:0]
DIA[#:0]
WEB
ENB
RSTB
ADDRB[#:0]
DIB[#:0]
CLKA
CLKB DOB[#:0]
DOA[#:0]
Figure 2.4: Block RAM in Virtex-I and Spartan-II FPGAs
2.1.3 Block RAMs
Large memory blocks are provided in the architecture and are referred to as block
RAMs. These block RAMs are located along the two outside columns of the PLB array
and each memory block RAM occupies the same height as that of 4 PLBs. Thus a PLB
array 64 PLBs high will have 16 memory block RAMs in each outer column and thus 32
total block RAMs. The block RAM is, as illustrated in Figure 2.4, a synchronous, dual
port memory and has a total capacity of 4096 bits. The width of data and address bus
is conflgurable and can be set as per the design requirement.
2.2 Conflguration of the FPGA
Typically the bitstream can be downloaded bit serially or byte-wide, i.e. one bit or
one byte of conflguration data is written into the conflguration memory each clock cycle.
How the bitstream is loaded into the FPGA depends upon the conflguration mode. The
conflguration mode can be selected by setting particular logic levels at the mode-select
20
pins of the FPGA. Difierent conflguration modes ofier difierent capabilities e.g. speed
of conflguration, partially reconflguration etc. Difierent sequences of events may also
take place in difierent conflguration modes. Thus selecting the mode of conflguration is
an important design decision. Virtex-I and Spartan-II FPGAs support eight difierent
modes of conflguration. The partial reconflguration support is available in selectMAP
and boundary scan modes of conflguration [Xil02c].
2.2.1 SelectMAP Mode
In the selectMAP mode, one byte of the bitstream is written every clock cycle
into the conflguration data bus interface (pins D[0:7]). To load a given conflguration,
selectMAP takes the least time among all the conflguration modes available [Xil01b].
First the conflguration memory is cleared. The conflguration control circuitry senses the
mode pins and the mode of conflguration is determined to be selectMAP. The bitstream
is them loaded byte-by-byte on every rising edge of the conflguration clock. To ensure
the veracity of the bitstream, a cyclic redundancy check (CRC) check is performed at the
end. If the CRC checksum loaded is difierent from the internally calculated CRC, the
conflgurationsequenceisaborted. OtherwisethenormalStartup-Sequenceiscommenced
as will be discussed in subsection 2.2.3 [Xil02d].
2.2.2 Boundary Scan Mode
In the boundary scan mode, one bit of the bitstream is written into the test access
port (TAP) of the FPGA each clock cycle. The IEEE 1149.1 test access port and bound-
ary scan architecture is an IEEE standard for in-system testing [IEE90]. The boundary
scan has a four wire interface as shown in Table 2.1. All FPGAs from Xilinx support
21
boundaryscanmodeofconflgurationandcontainallthemandatoryelementsintheIEEE
1149.1 standard: the TAP controller, the instruction register, the instruction decoder,
the boundary scan register, and the bypass register [Xil01b] [Xil02e] [Xil03b] [Xil03a].
Table 2.1: Virtex TAP Controller Pins
TDI Test Data In
TDO Test Data Out
TMS Test Mode Select
TCK Test Clock
The TAP controller is a 16-state flnite state machine. The logic value of the TMS
pin at the rising edge of TCK determines the next state of the TAP controller. The data
can be shifted into the data registers by selecting the data register scan sequence or the
instruction register by selecting the instruction scan sequence.
The Virtex-I and Spartan-II devices implement all the mandatory commands as
well as additional commands to the IEEE 1149.1 standard. These additional commands
allow read and write access to the conflguration memory. The boundary scan interfaces
provide two user deflned serial interfaces to the core of the FPGA. In order to use them,
the interfaces must be incorporated in the design. The user deflned serial interfaces are
activeaftertheconflgurationiscompletedandmaybeaccessedusingspecialinstructions
to the TAP controller [Xil02b] [Xil02a] [Xil].
Theconflgurationcontrolcircuitrysensesthemodepinsandmodeofconflgurationis
determined to be boundary scan. The CFG IN instruction is loaded into the instruction
register to allow write access to the conflguration memory. The bitstream is then loaded
bit serially using the boundary scan interface. If the CRC is determined to be correct
then the JSTART instruction is loaded in the instruction register which will initiate the
Start-up Sequence [Xil02b] [Xil02a].
22
2.2.3 Start-up Sequence
After the bitstream is completely and successfully written into the conflguration
memory, the Start-up Sequencer state machine in the FPGA initiates the Start-up Se-
quence. Start-up is the transition from the conflguration mode to normal operational
mode of the FPGA [Xil02d]. The Start-up Sequence includes activation of global reset
for initialization of the device. Xilinx provides a CAD tool, called BitGen, to control the
Start-up Sequence according to the options set by the user. The subsection 2.5.3 gives
an overview of this tool.
2.3 Conflguration Memory Architecture of Virtex-I and Spartan-II FPGAs
The conflguration memory of Xilinx Virtex-I FPGAs is divided into sections called
frames. A frame contains conflguration data for each section of the device, extending
vertically from top to the bottom of the device. Multiple conflguration frames clubbed
together form a column [Xil02d]. The columns can belong to one of the following types:
Center: The center column contains the conflguration for the four global clock pins and
routing in the center of the device.
Conflgurable Logic Blocks: The PLBs are sometimes also referred to as conflgurable
logic blocks (CLBs). This type of column contains the conflguration for all the
PLBs and routing in that column, along with two I/O blocks (IOBs) at the top
and bottom of the column.
IOB: The IOB columns contain the conflguration for all the IOBs on the left and right
edges of the device.
23
Block RAM Interconnect: These columns contain the conflguration for all intercon-
nect of the block RAMs of the device.
Block RAM Content: Thesecolumnscontaintheinitialdatacontentswithwhichthe
block RAMs will be pre-loaded during conflguration [Xil03d].
A frame is the smallest unit of reconflguration. The least data that needs to be
written into the conflguration memory, in order to conflgure a portion of FPGA, is one
frame. The length of the frame increases with the dimensions of PLB array to account
for the increase in programmable logic and routing resources in the array. The length of
the frame is written into a dedicated internal register in the full conflguration process.
As the FPGA is fully conflgured at least once before the partial reconflguration, it is not
necessary to write frame length for the partial reconflguration [Xil02d].
2.3.1 Addressing
The conflguration memory address space is divided into RAM blocks and PLB
blocks. The RAM block contains the block RAM content columns. The PLB blocks
include the Center, PLB, IOB and block RAM interconnect columns. These blocks are
thenfurtherdividedintomajorandminoraddresseswhereeachconflgurationcolumnhas
a unique major address and each frame has a unique minor address within its column
[Xil03d]. For the Virtex-I family, the following addressing scheme is in place for the
conflguration memory as shown in Figure 2.5 (which also includes the number of frames
in each column):
? the address ?0? is assigned to the center column,
? the even major addresses of PLB column are on the left side of the device,
24
Block RAM
Interconnect (27 frames)
CLB ColumnCenter ColumnCLB Column
(48 frames) (48 frames)
(48 frames)
Left IOB Column Block RAM
(54 frames)
CLB Column
Block RAM
Interconnect (27 frames)
CLB Column
(48 frames)
(54 frames)
Content (64 frames)
Content (64 frames)
Right IOB Column
...
RAM1
BIC1
Center
RAM0
BIC0
(8 frames)
GCL
K
2 22
IOBs
22
IOBs IOBsIOBs
...
Right
IOIO
Left
Block RAM
Figure 2.5: Xilinx Virtex-I and Spartan-II Addressing Scheme [Xil03d]
? the higher even major addresses are assigned to left IOB columns,
? the left block RAM interconnect columns mark the end of even major addresses,
? the address ?1? is assigned to the PLBcolumn at the right side of the center column
? the odd major addresses of PLB column are on the right side of the device,
? the higher odd major addresses are assigned to right IOB columns, and
? therightblockRAMinterconnectcolumnsmarktheendofthePLBaddressblock.
The major and minor addresses for any slice or LUT bit in any row/column can be
easily calculated, by inserting the values of the constants in the Table 2.2 and Table 2.3,
in the formulae given in Table 2.4 and Table 2.5.
25
Table 2.2: Constants Used in the Address Calculation [Xil03d]
Term Deflnition
Chip Cols Number of PLB columns on the Virtex device.
Chip Rows Number of PLB rows on the Virtex-I device.
Chip Rams Number of block RAM columns on the Virtex-I device RAM Space
Spacing of block RAM columns (in terms of PLB columns).
FL Number of 32-bit words in the frame.
RW 1 for Read, 0 for Write.
CLB Col Column number of the desired PLB.
CLB Row Row number of the desired PLB.
Slice 0 or 1.
FG 0 for the F-LUT, 1 for the G-LUT.
lut bit The desired bit from the given LUT. Bits in the LUT are indexed from
0 to 15.
XY 0 for the X Flip-Flop, 1 for the Y Flip-Flop.
RAM Col Column number of the desired block RAM.
RAM Row Row number of the desired block RAM.
ram bit The desired bit from the given block RAM. Bits are indexed from 0 to
4095.
Table 2.3: Variables Used for Address Calculation [Xil03d]
MJA Frame Major Address.
MNA Frame Minor Address.
fm st wd The index of the word within a full conflguration segment that corre-
sponds to the starting word of the desired frame. A full conflguration
segment is deflned as the following: 1) for PLB/IOB, all PLB, IOB,
and RAM interconnect frames beginning at MJA=0, MNA=0 and 2)
for block RAM, all RAM content frames for the given RAM column.
Words are numbered starting at 0.
fm wd The index of the 32-bit word within a frame that contains the desired
bit. Words in a frame are numbered starting at 0.
fm wd bit idx The bit index of the desired bit within frame word fm wd. Words are
indexed in big-endian style, with bit 31 on the left and bit 0 on the
right.
fm bit idx Bit index within a frame of the desired bit. Numbered starting with
0 as the left-most (flrst) bit. Bit numbering within a frame continues
across all the words in the frame.
26
Table 2.4: Calculating the Location of the LUT RAM Bit in Virtex-I Bitstream [Xil03d]
MJA if (CLB Col ? Chip Cols ? 2),
then Chip Cols - CLB Col ? 2 + 2
else 2 ? CLB Col - Chip Cols - 1
MNA lut bit + 32 - Slice ? (2 ? lut bit + 17)
fm bit idx 3 + 18 ? CLB Row - FG + RW ? 32
fm st wd FL ? (8 + (MJA - 1) ? 48 + MNA) + RW ? FL
fm wd  oor(fm bit idx ? 32)
fm wd bit idx 31 + 32 ? fm wd - fm bit idx
Table 2.5: Equations for Calculating PLB FF Location in the Bitstream [Xil03d]
MJA if (CLB Col ? Chip Cols ?2)
then Chip Cols - CLB Col? 2 + 2
else 2 ? CLB Col - Chip Cols - 1
MNA Slice ? (12 ? XY - 43) - 6 ? XY + 45
fm bit idx (18 ? CLB Row) + 1 + (32 ? RW)
fm st wd FL ? (8 + (MJA - 1)? 48 + MNA) + RW ? FL
fm wd  oor(fm bit idx ? 32)
fm wd bit idx 31 + 32 ? fm wd - fm bit idx
2.3.2 Frame Organization
The frame can be viewed as being vertically superimposed on the device, with the
beginning of the frame at the top of the device. As shown in Table 2.6, the flrst 18 bits
control the two IOBs at the top of the column. The subsequent groups of 18 bits are
allocated for each PLB row. Finally the last 18 bits control the two IOBs at the bottom
of the PLB column. The frame data is then padded with ?0?s to make it an integral
multiple of 32-bit words [Xil03d].
Table 2.6: PLB Column Frame Organization
Top 2 IOBs PLB R1 PLB R2 ::: PLB Rn Bottom 2 IOBs
18 18 18 ::: 18 18
27
2.3.3 Conflguration Registers
The Virtex-I FPGAs provide conflguration registers to control the conflguration
process. The conflguration architecture deflnes eleven 32-bit conflguration registers,
summarized in Table 2.7. To conflgure the FPGA, commands are written into these
conflguration registers followed by the data frames containing the conflguration data,
which are then loaded into the conflguration memory of the FPGA [Xil02d].
The conflguration register where the commands are to be written is selected by a
32-bit word called command header format (Table 2.8) or Type-I header. The fleld word
count in the command header format, gives the number of words to be written in the
subsequent write sequence. With the command header alone, 2048 32-bit words can be
written. The conflguration architecture also deflnes the large block count header exten-
sion format also known as Type-II header format, that supports larger write sequences
[Xil02d].
Command Register (CMD)
The state of the conflguration state machine, the Frame Data Register (FDR), and
some of the global signals are determined by the command loaded in the command
register. The commands are executed each time a new value is loaded into the Frame
Address Register (FAR) [Xil03d]. The commands and their functions are summarized
in Table 2.9.
Conflguration Option Register (COR)
The bits in the Conflguration Option Register (COR) determine the behavior of
speciflc signals used during conflguration and the Start-up Sequence. The flfteenth bit
28
Table 2.7: Conflguration Registers [Xil03d]
Register Name R/W Function
Command (CMD) R/W Controls the operation of the conflgura-
tion state machine.
Conflguration Option (COR) R/W Sets various options for events that take
placeintheconflgurationandbehaviorof
the device after conflguration.
Control (CTL) R/W Sets the preferences for the behavior of
the device after the conflguration.
Cyclic Redundancy Check (CRC) R/W Used while conflguring the device to load
CRCchecksumthatisverifledagainstthe
internally counted one.
Frame Address (FAR) R/W Used to load the frame address of the en-
suing frame data. For PLB data, this is
automatically incremented after a com-
plete frame is loaded. For block RAM
data the frame address has to be incre-
mented manually.
Frame Data Input (FDRI) W Writing the conflguration data into the
conflguration memory.
Frame Data Output (FDRO) R Reading the conflguration data and
states of registers,  ip- ops and LUTs
from the conflguration memory.
Frame Length (FLR) R/W Determines the size of the frame in 32-bit
words.
Legacy Output (LOUT) W Fordaisychainingthebitstreamoflegacy
devices.
Mask (MASK) R/W Mask register for writes to CTL register.
Status (STAT) R Loaded with current values of various
control and status signals.
Table 2.8: Command Header Format [Xil02d]
Type Write/Read Destination Register Address Byte Address Word Count
32-bit Words
31:29 28:27 26:13 12:11 10:0
001 10/01 xxxxxxxxxxxxxx xx xxxxxxxxxxx
29
Table 2.9: Conflguration Commands and their Usage [Xil03d] [Xil04]
Command Code Description
WCFG 1 Write Conflguration Data: Used prior to writing conflguration
data to the FDRI. It takes the internal conflguration state ma-
chine through a sequence of states that control the shifting of the
FDR and the writing of the conflguration memory.
LFRM 3 Last Frame: This command is loaded prior to writing the last
(pad) data frame if the GHIGH B signal was asserted. This com-
mand is not necessary if the GHIGH B signal was not asserted.
This allows overlap of the last frame write with the release of the
GHIGH B signal.
RCFG 4 Read Conflguration Data: Used prior to reading frame data from
the Frame Data Output (FDRO). Similar to the WCFG com-
mand in its efiect on the Frame Data Register (FDR).
START 5 Begin Start-up Sequence: Starts the Start-up Sequence. This
command is also used to start a shutdown sequence prior to par-
tial reconflguration. The Start-up Sequence begins with the next
successful CRC check.
RCAP 6 Reset Capture: Used when performing capture in single-shot
mode. This command must be used to reset the capture signal if
single-shot capture has been selected.
RCRC 7 Reset CRC: Used to reset CRC register.
AGHIGH 8 Assert GHIGH B Signal: Used prior to reconflguration to pre-
vent contention while writing new conflguration data. All PLB
outputs and signals are forced to a one.
SWITCH 9 Switch CCLK Frequency: Used to change the frequency of the
Master CCLK.
of this register is reset by the \SHUTDOWN" command, used for shutting down the
FPGA for partial reconflguration. The \START" command sets this bit to value ?1?,
initiating the Start-up Sequence [Xil03d]. The BitGen tool, which is used to generate
the conflguration flle for the device, provides options to set/reset bits in this register.
30
Cyclic Redundancy Check (CRC)
The Cyclic Redundancy Check (CRC) register provides a means of checking for
transmission errors in the bitstream. A 16-bit CRC checksum is calculated every time
data is written into speciflc registers using the following polynomial:
CRC-16 = X16 +X15 +X2 +1 [Xil02d]
The CRC register is used to store the checksum. In complete reconflguration, CRC
check is performed twice by loading a pre-calculated CRC block-check value. The second
CRC checksum is calculated with the data of the last frame. A non-zero resulting value
indicates error in transmission, therefore, conflguration is aborted.
Frame Address Register (FAR)
The Frame Address Register contains the address of the frame being loaded. The
address is partitioned into block type (PLB or RAM block), major address, and minor
address. The minor address is auto-incremented each time a complete data frame is
loaded and major address is auto-incremented if the last frame for the PLB column is
completely loaded. For RAM blocks the major address needs to be loaded separately
[Xil03d].
Frame Data Input Register (FDRI)
The Frame Data Input Register is used to specify the size of the conflguration data
in words that would be written to the conflguration memory. Type-I or Type-II headers
are used depending on how large the data is. The FDRI is used to hold this header
information [Xil03d].
31
Frame Length Register (FLR)
The length of the frame without the pad word is set in terms of 32-bit words in this
register. As the devices grow in the array size, the frame length increases to incorporate
the conflguration data of increased routing and logic resources [Xil03d].
2.3.4 Full Reconflguration Bitstream
The commands in the bitstream for full reconflguration of a Virtex-I device can be
dividedinto3commandsets. Theflrstcommandsetinitializestheinternalconflguration
logic for loading the data frames. A default value is assigned to the CRC register. The
frame length is set in Frame Length Register (FLR). The Conflguration Option Register
(COR) is loaded with the value that would specify the desired behavior of the device
after the conflguration. The SWITCH command is loaded into the CMD register to
change the conflguration clock frequency to the clock frequency specifled in the COR.
The second command set writes the conflguration data frames. The command,
WCFG (Write Conflguration), is loaded into the CMD register. This, among other
things, activates the circuitry that writes the data loaded into the FDRI into the conflg-
uration memory cells. The data word count is specifled in command word of Type 1 or
if the data word count is too large, the command word of Type 2 follows command word
of Type 1. Typically three large frame sets are loaded containing the PLB conflguration,
the block RAM conflguration and the last frame data. At the end of the third frame
set, the CRC checksum is loaded into the CRC register. The Last Frame command
(LFRM) is loaded into the CMD register indicating to the conflguration circuitry that
the following frame set will be the last frame.
32
The third command set triggers the Start-up Sequence with the START command
and completes the CRC checking and activates the FPGA.
2.4 Readback
Readback is the process of reading data from the conflguration memory [Xil02d].
Readback can be utilized to compare the stored conflguration against the actual bit-
stream, as well as to read the current state of all internal PLB and IOB registers, LUTs
operating in RAM mode and block RAM values. The former is known as readback veri-
flcation and latter is referred to as readback capture. Both veriflcation and capture can
be done in one readback sequence.
2.4.1 Readback Veriflcation
Readback veriflcation can be obtained without any changes in the conflguration
memory, through selectMAP and boundary scan mode [Xil02d]. This readback data
can then be verifled against a bitmap flle generated by the BitGen tool when run with
\readback" option enabled for each design.
2.4.2 Readback Capture
In order to examine the state of the internal logic resources, the readback capture
capability must be enabled. An additional readback capture option allows a single cap-
ture or multiple captures after the device is conflgured. When asserted, the register
states are captured in unused space in the conflguration memory on the rising edge of
the clock signal [Xil] [Xil02d].
33
Table 2.10: Readback Commands Required to Perform Readback on PLB Conflguration
Synchronization word AA99 5566h
Packet Header: Write to FAR Register 3000 2001h
Packet Data: Starting frame Address 0000 0000h
Packet Header: Write to CMD Register 3000 8001h
Packet Data: RCFG 0000 0004h
Packet Header: Read from FDRO 2800 6000h
Packet Header Type 2: Data Words 48{ |-h
The logic allocation flle provided by BitGen indicates the absolute position of the
 ip- op bits in the complete readback flle. This information will prove to be important
while retrieving BIST results, as will be discussed in Chapter 3.
2.4.3 Readback Operations
Readback is performed by reading a data packet from the Frame Data Output
Register (FDRO) register. There are three types of data packets to be read (one for
PLB conflguration with capture data and two for block RAMs). The commands needed
to be given in order to accomplish this are summarized in Table 2.10.
The complete conflguration memory readback is initiated by writing the starting
frame address of (0000 0000)h in the FAR register. The number of words to be read
for full readback capture is function of the size of the device, e.g. for an XCV100 the
number of 32-bit readback words would be 22,554 [Xil02d].
The bits in the readback bitstream indicate three types of information 1) conflgu-
ration data, 2) captured data and 3) pad bytes. The pad bytes align the frame data to
a 32-bit word boundary. It can be noted that readback bitstream does not contain any
CRC check information.
34
2.5 Partial Reconflguration
The sequence of events taking place while partially reconflguring the device is con-
siderably difierent than that of full reconflguration. For partial reconflguration, it is
required that device be fully conflgured once and that the conflguration interface be
active after the full conflguration is complete. That is to say, the I/O pins used for
selectMAP mode of reconflguration would retain their conflguration function [Xil02d].
The boundary scan mode is a permanent interface and is always present [Xil03d].
ThebitstreamperformingpartialreconflgurationofanylogicresourceoftheFPGA,
henceforth simply referred to as a partial bitstream, contains the major and minor ad-
dresses of the frame containing the conflguration data for that resource. The major
and minor addresses of the frame are calculated using formulae given in Table 2.4 and
Table 2.5 [Xil03d]. The partial bitstream contains instructions to write the address to
the FAR register. The ensuing instructions load the FDRI register with the number of
words to be written into the conflguration memory. After these instructions, the frame
data follows. There are two ways in which the FPGA can be partially reconflgured: with
or without shutdown [Xil02c].
2.5.1 Partial Reconflguration without Shutdown Sequence
If the device is not shut down, the functions implemented in the parts of the FPGA
not afiected by partial reconflguration may continue to work without interruption. The
logic changes in the PLB or routing take place once the corresponding frame gets com-
pletely written into the device. This mode would be used for operations such as online
testing and fault-tolerance [AES01].
35
2.5.2 Partial Reconflguration with Shutdown Sequence
If the device is shut down, then parts of the FPGA not afiected by partial recon-
flguration would stop executing the conflgured function. At the start of the Shutdown
Sequence, thedummyword(FFFFFFFF)handthesynchronizationword(AA995566)h
arewritten. Thedummy wordprovidestheclockcyclesnecessarytoinitializetheconflg-
urationlogic. Thesynchronizationwordisusedtoalignthebitstreamonthe32-bit-word
boundary [Xil04]. The Shutdown bit is set in the COR register. The START command
is then loaded into the CMD register to start the Shutdown Sequence. The CRC value is
reset. As the Shutdown Sequence requires all the other logic in the device to be disabled,
the clock to all sequential logic is disabled. The AGHIGH command is then loaded into
the CMD register to prevent contention on the internal signals while writing the new
data. As the GHIGH B signal is asserted due to the AGHIGH command, the LFRM
command is written into the CMD register. This allows writing the Last Frame packet
as GHIGH B signal is released [Xil03d].
2.5.3 BitGen
BitGen is a command line tool that converts the netlist flle in Xilinx native format
(.ncd flle) into a conflguration bitstream flle. The FPGA can then be conflgured with
this bitstream. This tool gives a number of options to control the tasks, in and after the
Start-up Sequence, for the design implemented. These tasks include: the timing of the
start-up signals, clock rate to be used for the conflguration, signal assignment of some of
the I/O pins used during conflguration once the conflguration is over, etc. The BitGen
options are set as per the design being implemented. BitGen also gives options for
generating a partial reconflguration bitstream that contains only the difierence between
36
the .ncd flle, and the old bitstream. The use of BitGen for partial reconflguration during
BIST will be discussed in more detail in Chapter 3.
2.5.4 JBits
JBits is a set of Java classes which provide an Application Program Interface (API)
into the Xilinx XC4000, Virtex-I, Virtex-II series FPGA bitstreams [GLS99]. The JBits
APIfacilitateswritingapplicationsthatwouldmodifythebitstreamon-the- y, conflgure
and readback from the FPGA conflguration memories [Xil01a]. The API gives the
programmer gate level access to the FPGA and the Java programming language allows
a programmer to create many layers of abstraction. JBits can therefore be used to write
custom CAD tools featuring dynamic partial reconflguration or traditional CAD tools
to produce place-and-route for the FPGA families supported [Xil01a].
The simplest of the applications that can be built with JBits API would take the
bitstream generated by BitGen as input and conflgure an FPGA board with it. More
advanced applications contain circuit designs specifled with the JBits API calls and
generate the output in Xilinx Design Language (XDL) which deflnes the netlist in sym-
bolic format along with the bitstream. The design  ow with JBits API is illustrated in
Figure 2.6.
In order to use JBits API, the programmer writes a Java program that utilizes JBits
API calls (henceforth simply referred to as JBits program) containing calls to conflgure
the PLB and routing resources of the FPGA. Upon execution, the JBits program gen-
erates the desired conflguration of logic as well as interconnect resources of the FPGA.
Thus, the programmer specifles the design of the circuit using JBits API calls embedded
37
Application
XHWIF
Bitstream from Xilinx Tools
Bitstream for Virtex?I
xdl2ncd.exe
Design in XDL 
Bitgen Virtex?I Board
Bitstream for Virtex?I
Bitstream for Spartan?II
JBits API
Spartan II
Board
Figure 2.6: Design Flow of the Application with JBits
38
into a regular Java language program [Xil01a]. The JBits program can then be com-
piled with the Java compiler (javac). The JBits program runs under the Java Virtual
Machine (JVM) environment. The output of the program can be a regular bitstream,
or core flles (.ctf flles), that may run from the hardware simulator described in [Xil01c],
or an XDL flle containing the design description which can then be viewed and pro-
cessed using conventional Xilinx tools like FPGA editor or BitGen [Xil01a]. Therefore,
the application program does not rely on conventional place-and-route (PAR) tools for
automatic routing. The JBits API thus combines the  exibility of an HDL as well as
the functionality of a PAR tool. JBits API provides a conflguration memory readback
API which has a facility to read back the state of the logic elements in PLB [Xil01a].
Therefore, tools can be written to interface with the FPGAs. Graphical tools, such as
BoardScope, demonstrate the use of this API for tracing the logic values of  ip- ops,
LUTs and internal signals [LG98].
For the bitstream produced by the JBits program to work correctly, when an FPGA
isconflguredwithit, thereareanumberofstructures, internaltoJBitsAPI,thatneedto
be initialized in a speciflc sequence. For the simplest applications that can be developed
with JBits API, these internal structures can be initialized with a relative ease as their
numberissmallandtheprocessofinitializationiswelldocumentedinthedocumentation
includedwiththeJBitssoftware[Xil01a]. However, fortheadvancedapplicationsalarge
numberofstructuresinternaltoJBitsneedtobeinitialized. Thesequenceinwhichthese
internal structures need to be initialized is not as well documented in [Xil01a] [Xil01d].
Therefore, the documentation provided with JBits software is insu?cient for a test en-
gineer to write advanced applications. In this thesis, we have developed a sequence of
39
steps that correctly initializes these internal structures. A JBits program that imple-
ments this sequence, is observed to generate a bitstream that correctly conflgures the
FPGA to produce the desired results and textual representation of the design in XDL.
Deflnitions Used
As JBits API is completely written in Java [Xil01d], therefore we flnd it convenient
to follow the terminology of object oriented languages wherever possible.
class \A class is a blueprint or prototype that deflnes the variables and the methods
common to all objects of a certain kind" [Jav04].
object \An object is a software bundle of related variables and methods" [Jav04]. The
object is an instantiation of a class. For example, in JBits API, a physical pin of
an FPGA is modeled in class \Pin". When a reference to particular pin is to be
made, an instance of the class \Pin" is created.
method \A function deflned in a class" [Jav04]. The methods are invoked from the
main program to change the state or to retrieve the state of the object(s).
package \A package is a collection of related classes and interfaces providing access
protection and namespace management" [Jav04]. Packages are the mechanism
created so that names of the identiflers declared in one class deflned in one package
donotcollidewithanotherclassdeclaredinaseparatepackageandcreateconfusion
for the compiler. The fully qualifled name of the class is the class name preflxed
by the name of the package that it belongs to. In this thesis, whenever referring
to a class in JBits API, we will identify the class by its fully qualifled name.
40
Device Model
There are two models provided by JBits API to access and manipulate the FPGA
resources. In the flrst model, each physical FPGA resource (like a logic element in
the PLB, IOB and block RAM) is mapped into one of the classes in the package
com.xilinx.JBits.Virtex.Bits [Xil01d]. Each class has static 2-dimensional array(s) of
integers representing conflguration memory bit(s) associated with the resource(s). The
applications that need to set and reset conflguration memory bits, that modify bitstream
on-the- y may use this model [Xil01a]. This model is not suitable for designing circuits
and specifying routing between the PLBs.
There exists another model which is more suitable for the advanced applications
that specify the design in the JBits program. This model provides a logical abstraction
oftheunderlyingFPGAbymodelingeverylogicelementinthePLB,IOBorblockRAM
as a runtime parameterized core (RTPCore) [GL99]. The inputs and outputs of a PLB,
IOB or block RAM, Single, Hex and Long wires are modeled as pins. The RTPCores
contain ports and signals which are the HDL like features of JBits API. The logical
connections between RTPCores are specifled through ports and signals.
If the programmer simply specifles the logic elements within the PLB that are used
inthedesignandthelogicalconnectionsbetweenthem, theninthismodel, thesynthesis,
placement and routing of these cores is completely automatic and is done by another
Java program called JRoute [Xil01d]. The automatic synthesis, placement or routing
can be a serious impediment when it does not happen the way the designer intended it.
JBits API gives the programmer the ability to specify the placement and routing of the
RTPCores. The programmer assigns pins to the ports of an RTPCore that needs to be
41
placed. Note that a port may still have unrouted nets and buses attached to it and the
router will continue to place them.
Model of PLBs
As stated above, in the flrst model, for every element in the PLB like the  ip- op,
LUT, multiplexer, and combinational logic element, there exists a Java class. The class
has a static 2-dimensional integer array as a data member [Xil01d]. The bits and their
positions in the array correspond to the state and location of the conflguration memory
bit associated with that element. Table 2.11 provides the mapping of classes for the
elements contained in a Virtex-I PLB.
There exists another model for the PLB where the  ip- ops, LUTs, multiplexers
and logic gates are viewed as RTPCores and the inputs and outputs of these elements
are viewed as pins. In order to conflgure the PLB, these RTPCores are instantiated in
the program and pins are connected to the ports of the RTPCores. It is su?cient to
specify the logical connections between the RTPCores. The router appropriately places
the partially routed cores.
Table 2.11: Classes Used for Bit Level Manipulation of PLB Elements [Xil01d]
CIPs related to slice 0 com.xilinx.JBits.Virtex.Bits.S0Control
CIPs related to slice 1 com.xilinx.JBits.Virtex.Bits.S1Control
Logic Value of slice 0 FF com.xilinx.JBits.Virtex.Bits.CLB
Logic Value of slice 1 FF com.xilinx.JBits.Virtex.Bits.CLB
Logic Values of slice 0 LUT com.xilinx.JBits.Virtex.Bits.LUT
Logic Values slice 1 LUT com.xilinx.JBits.Virtex.Bits.LUT
42
Table 2.12: Classes Used for Bit Level Manipulation of Switch Box CIPs[Xil01d]
com.xilinx.JBits.Virtex.Bits.BiHexToSingle
com.xilinx.JBits.Virtex.Bits.UniHexToSingle
com.xilinx.JBits.Virtex.Bits.SingleToSingle
com.xilinx.JBits.Virtex.Bits.OutMuxToSingle
Routing Model
There are two models for the routing resources. The conflguration memory bits cor-
respondingtotheswitchboxCIPsaremodeledbythestatic2-dimensionalintegerarrays
split into four classes shown in the Table 2.12 [Xil01d]. The conflguration memory bits
corresponding to the output MUX CIPs are modeled by the static 2-dimensional integer
arrays in eight classes shown in the Table 2.13 [Xil01d]. The conflguration memory bits
corresponding to the input MUX CIPs are modeled by the static 2-dimensional integer
arrays split in twenty eight classes shown in the Table 2.14 [Xil01d]. These classes are
useful when explicitly specifying if the CIP is to be turned on or ofi, or when explicitly
specifying the connection between the routing resources.
A separate model exists for the applications where the routing between two PLBs,
PLBs and IOBs, and PLBs and block RAMs is specifled. In this model, interconnect
resources are modeled as static one-dimensional integer arrays as depicted in Table 2.15.
These integer arrays do not represent the CIPs but instead they model the actual inter-
connect resource. Therefore, in order to use the interconnect resources in conjunction
with the HDL-like features in the JBits program, an object is created with appropriate
static integer array passed as an argument.
43
Table 2.13: Classes Used for Bit Level Manipulation of Output MUX CIPs[Xil01d]
com.xilinx.JBits.Virtex.Bits.OUT0 1st Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT1 2nd Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT2 3rd Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT3 4th Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT4 5th Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT5 6th Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT6 7th Output of the Out Mux
com.xilinx.JBits.Virtex.Bits.OUT7 8th Output of the Out Mux
Table 2.14: Classes Used for Bit Level Manipulation of input MUX CIPs[Xil01d]
com.xilinx.JBits.Virtex.Bits.S0BX BX input of the slice 0
com.xilinx.JBits.Virtex.Bits.S0BY BY input of the slice 0
com.xilinx.JBits.Virtex.Bits.S0CE CE input of the slice 0
com.xilinx.JBits.Virtex.Bits.S0Clk Clk input of the slice 0
com.xilinx.JBits.Virtex.Bits.S0SR SR input of the slice 0
com.xilinx.JBits.Virtex.Bits.S1BX BX input of the slice 1
com.xilinx.JBits.Virtex.Bits.S1BY BY input of the slice 1
com.xilinx.JBits.Virtex.Bits.S1CE CE input of the slice 1
com.xilinx.JBits.Virtex.Bits.S1Clk Clk input of the slice 1
com.xilinx.JBits.Virtex.Bits.S1SR SR input of the slice 1
com.xilinx.JBits.Virtex.Bits.TS0 TS0 input of the slice 0
com.xilinx.JBits.Virtex.Bits.TS1 TS1 input of the slice 0
com.xilinx.JBits.Virtex.Bits.S0F1 1th input of F LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0F2 2th input of F LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0F3 3th input of F LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0F4 4th input of F LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0G1 1th input of G LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0G2 2th input of G LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0G3 3th input of G LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S0G4 4th input of G LUT in the slice 0
com.xilinx.JBits.Virtex.Bits.S1F1 1th input of F LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1F2 2th input of F LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1F3 3th input of F LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1F4 4th input of F LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1G1 1th input of G LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1G2 2th input of G LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1G3 3th input of G LUT in the slice 1
com.xilinx.JBits.Virtex.Bits.S1G4 4th input of G LUT in the slice 1
44
Table 2.15: Model of Interconnect Resources in the Package
com.xilinx.JRoute2.Virtex.ResourceDB [Xil01d]
Interconnect Type Class
Horizontal Long Long Horiz
Vertical Long Vert Horiz
Horizontal Hex Hex Horiz East,
Hex Horiz West.
Vertical Hex Hex Vert North,
Hex Vert South.
Single Single East,
Single West,
Single North,
Single South.
Routing
To route the Virtex-I and Virtex-II family of FPGAs, JBits API provides a Java
program known as JRoute. The capabilities of JRoute include routing between two PLB
pins,aPLBpinandanIOBpinoraPLBpinandablockRAMpin. JRoutealsofeatures
the capability of connecting a single source pin to multiple sink pins. The programmer
has a choice between automatic routing, template based routing and manual routing of
the FPGA [Kel00].
Automatic Routing
The programmer deflnes the source pin and the sink pin and leaves the task of
routing to the automatic router. The call is described as follows [Kel00] [Xil01d]:
route(source, sink);
45
Template Based Routing
A template contains the direction and the type of a wire. The template does not
identify the wire in itself. The static integers in the class com.xilinx.JRoute2.Virtex.
ResourceDB.CenterWires denote the direction and type. For example: SINGLE EAST
represents any Single wire in the east direction. Similarly, HEX SOUTH is any Hex wire
in the south direction. Therefore, in turn, it represents general guidelines to the router.
route (source, sink, Template t);
Thesourceandthesinkareobjectsofclasscom.xilinx.JBits.CoreTemplate.Pin,andthus
model the physical resources of the FPGA. The template is an array of paths existing
between the source and the sink. Consider the following example, which instructs the
router to take the specifled path while routing pins slice 0, X  ip- op output (S0 XQ) in
(row,col)=(11,12) to the input of slice 0, F LUT (S0 F1) in (row,col)=(18,5) [Xil01a]:
int[] template = { TemplateRouter.OUTMUX,
TemplateRouter.HEX_WEST,
TemplateRouter.HEX_NORTH,
TemplateRouter.SINGLE_NORTH,
TemplateRouter.SINGLE_WEST,
TemplateRouter.INPUT };
Pin source = new Pin(Pin.CLB, row, col, CenterWires.S0_XQ);
jroute.route(source, CenterWires.S0_F1, template);
The template router has a limitation that it can only be used to route between the
two PLBs [Kel00] [Xil01d].
46
JBits jbits = super.getJBits();
ResourceFactory rf = ResourceFactory.getResourceFactory(jbits);
row = 1;
col = 3;
//The sink resource is the output of the second multiplexer 
Pin outmux = new Pin(Pin.CLB, row, col, CenterWires.OUT[2]);
Segment seg = rf.getSegment(outmux);
//First step: Mark the resource as saved
seg.save();
Pin src7  = new Pin(Pin.CLB, row, col, CenterWires.S0_X);
Pin sink7 = new Pin(Pin.CLB, row, col, CenterWires.OUT[2]);
     
//Second step: Connect the input of the slice 0 ?X? flip?flop to the second output multiplexer 
JBitsConnector.makeConnection(jbits, src7, sink7, ps);
Figure 2.7: JBits Program for Manual Routing
Manual Routing
The programmer can specify each interconnect resource to be used for the routing.
ItisatwostepprocessasshownintheFigure2.7. First, therequiredresourceisreserved
using calls provided in the class com.xilinx.JBits.JRoute2.Virtex.ResourceFactory. This
class keeps track of the utilization of the interconnect resources. The resource can be
eithersavedfortheimmediateuseorsimplymarkedas\used"forlateruse. Inthesecond
step,acalltoanappropriatemethodincom.xilinx.JBits.JRoute2.Virtex.JBitsConnector
is performed to complete the manual routing [Kel00] [Xil01d]. The fourth argument to
the call to makeConnection method is given to display the output of the routing to the
console.
47
2.6 BIST for FPGAs
In this section, we explore BIST for FPGAs in more detail. The key points to be
considered to understand BIST for FPGAs are: how BIST does in-system testing of
the FPGAs, how complete fault coverage is achieved, the diagnostic resolution that the
method provides, and the scalability of the approach. Some of the notable attempts to
implement BIST for FPGAs are [RZ00] [RPFZ99] [HGWS99] [SWHA98] [SXCT00]
[AES01] [AS01] [SKCA97] [SNLA02] [SLS03] [TM03].
BIST for FPGAs is typically divided into logic BIST and interconnect BIST ac-
cording to the FPGA resources that it tests. BIST can also be considered according to
the state of operation of the FPGA at the time of testing. If part of FPGA is tested
without afiecting the normal operation of other part of FPGA, it is referred to as on-line
BIST and, if FPGA device needs to be shut down prior to testing, then it is referred to
as ofi-line BIST [AES01] [SNLA02].
2.6.1 Logic BIST
The general idea of ofi-line logic BIST, as proposed in [SKCA96] [SCKA96], is
to conflgure groups of PLBs as TPGs and ORAs that test the BUTs. As the PLBs
can be conflgured in difierent modes of operation, the logic BIST must test all of these
modes. The process of testing the PLB in a given mode of operation is referred to as
a test phase. To completely test the PLB in all of its modes of operation requires set
of test phases referred to as a test session. Each test phase consists of reconflguring
the FPGA with the BIST circuitry, initiating the BIST sequence, and reading the BIST
results from the ORAs [HGWS99]. For in-system testing, BIST conflgurations would
48
be stored in a system memory and a system controller would be responsible for loading
each conflguration into the conflguration memory of the FPGA, initializing the TPGs,
BUTs, and ORAs via a Global Reset, and reading the Pass/Fail results in the ORAs
at the end of BIST sequence. Generating the test patterns and analyzing the output
responsesareperformedconcurrentlybytheBISTcircuitryinthedeviceasshowninthe
Figure 1.5 [AS01]. All the BUTs are conflgured identically. As the same test stimulus is
applied to these functionally and architecturally equivalent BUTs by two or more TPGs,
the output responses received from fault-free BUTs should be the same. Therefore a
comparator-based ORA compares results from difierent BUTs and produces a Pass/Fail
result by latching any mismatch observed during the BIST sequence.
In the next test session the roles of the PLBs are changed from BUTs to ORA
or TPGs and vice versa. This approach requires only two test sessions to completely
test the PLB in the FPGA as long as at least half of the PLBs are BUTs in each test
session [AS01]. The cases of combinations of faults that cannot be detected by this
method have negligible chance of occurrence.
The requirement of two logic BIST test sessions to completely test the logic re-
sources, is independent of the device size, provided that number of PLBs required to
implement TPGs (2*NTPG) for the logic resources do not exceed the number of PLBs
in the array (N). If NTPG > N2 , then more than two BIST test sessions are required to
test the logic resources of the FPGA [SLS03]. Therefore, there is a potential penalty
of additional test sessions for smaller FPGAs. For FPGAs featuring larger PLB arrays,
excessiveloadingofTPGsmightposeaproblemdependingonthearchitecture. Herethe
solution is to divide the PLB array into quadrants with each quadrant conflgured with
an independent BIST architecture executing in parallel. There is no delay associated
49
with this and, therefore, this BIST architecture scales linearly for the large PLB arrays
[SLS03] [AS01].
This BIST approach has been implemented on commercially available FPGAs such
as XC4000 and Spartan from Xilinx [SLS03], ORCA 2C from Lucent Technologies
[AS01] and Altera Flex8000 [SCKA96] for programmable logic resources. In order to
extend this approach to any FPGA architecture, 1) the PLB should be capable of im-
plementing the functionality of LFSR-based or counter-based TPG and an ORA/scan
cell, 2) the routing architecture should feature global and local routing elements, and
3) the FPGA should be capable of in-system reconflguration, as the PLBs need to be
reconflguredseveraltimesforcompletetesting. Asallthesecapabilitiescanbeeitherim-
plemented or are available on most of the commercially available FPGAs, this approach
is architecture independent.
2.6.2 Interconnect BIST
The interconnect BIST tests a group of wire segments controlled by CIPs, known
as wires under test (WUTs). The WUTs are tested for the fault models described
below. Counter-basedTPGsaresuitableforgeneratingthetestpatterns. Allpossible2n
combinationsoftestvectorsareappliedtotheWUTs,providedn isnotlarge [SWHA98].
The mismatch between a good and faulty WUT is latched by a comparison-based ORA.
The general architecture of interconnect BIST is shown in Figure 2.8.
The comparison-based ORA has a limitation that a fault will go undetected if the
WUTs being compared have equivalent faults, thereby giving equivalent responses for
the test patterns applied. Another problem associated with this arrangement is that
the diagnostic resolution obtained may be insu?cient for fault-tolerant applications.
50
D
PLB PLB
ORA
Pass/Fail
X
VY Z U
W
S
Q
P
R
C
A B
C
ED
Start
BIST
Done
TPG
Figure 2.8: BIST for FPGA Interconnect Resources [SWHA98]
[SNLA02] addresses the fault masking problem associated with the comparison-based
ORA, through two-testing analysis. This ensures that every WUT is tested at least
twice with a difierent group of wires tested by difierent TPG and ORA. The fault and
diagnosticsconcernsarealleviatedbyemployingacombinationofstrategieslikereplacing
the comparison-based ORA by a scan based ORA, comparing the fault signature against
a fault dictionary, isolating the faulty wire using the progressive deletion, interchanging
the roles of TPG and ORA, and divide-and-conquer for locating the faults in the various
segments of the wire. The approach described in [SXCT00] implements a parity-based
ORA that checks parity generated by TPG along with the response of WUTs.
51
Fault Models
The following fault models are considered in BIST for interconnect resources:
? Bridging faults between the wires,
? Opens in the wires,
? CIPs stuck-on (stuck-closed),
? CIPs stuck-ofi (stuck-open),
? Wires stuck-at-0,
? Wires stuck-at-1.
The bridging faults are observed between the wires that run parallel, where there
is a likelihood of having a short. The parallel wires afiected with bridging faults, only
when subjected to opposite logic values, fail to transmit correct logic value at the other
end. Therefore, applying opposite logic values sensitizes the bridging fault between
the parallel running wires [Str02]. As the counter-based TPG applies exhaustive test
patterns to the WUTs, the opposite logic is present on each wire segment with both wire
segments monitored by an ORA.
A stuck-ofi CIP would prevent the transmission of the signal between the wire
segmentsthatitconnects. IfthetestpatternsappliedalongthesetofWUTsarereceived
correctly by the ORA, the CIP of the set of WUTs, cannot be stuck-ofi [SNLA02].
A stuck-on CIP would be unable to break the connection between the wire segments
that it connects, similar to a bridging fault. In order to detect the CIP stuck-on fault,
the opposite logic values are applied at the two ends of the CIP and each end of the wire
52
T
O
O
T
V?STAR
H?STAR
Figure 2.9: FPGA Floorplan for Online Interconnect Testing [AES01]
segment controlled by the CIP is monitored by ORAs. The fault is detected when an
incorrect value is read from one of the ends of the CIP [SNLA02].
The interconnects shorted to VDD or GND would cause the wire stuck-at 1 or 0
faults. The broken continuity in the metal wires are referred to as opens in the wires.
The test to detect the wire stuck-at faults and opens in the wire is to determine the
ability of the wire to transmit both 0 and 1 between the ends.
Previous Implementations of Interconnect BIST
For on-line BIST, horizontal and vertical self test areas (STARs) are conflgured
in the FPGA without disturbing the system function conflgured in other part of the
FPGA. The STARs then rove across the FPGA such that difierent routing resources of
FPGA are brought under test. Once a fault is located, the faulty routing resources can
be excluded from being used again or used as long as it does not form contention with
other resources. The TPG, WUT and ORA conflguration for roving STARs is shown in
Figure 2.9.
53
T
O O O
OOO
T
O
O
O
O
T
T
T
T
(b) Global?to?global 
Cross Point CIP Testing
(a) H?STARs for Testing Horizontal
       Interconnects
Figure 2.10: FPGA Floorplan with \Galaxy" BIST [SNLA02]
In the ofi-line BIST, the system function is not conflgured while the FPGA is being
tested. Therefore, FPGA interconnects are conflgured with parallel horizontal-STARs
and vertical-STARs, also referred to as galaxy BIST phases (Figure 2.10). Thus the
test patterns are applied to the global and local interconnect resources under test in
difierent test phases. While the number of test phases remains the same, the ofi-line
BIST reduces the number of times the FPGA needs to be reconflgured in order to test
the routing resources, as compared to the on-line BIST approach [SNLA02].
2.6.3 BIST for Xilinx FPGAs
The work related to BIST for Xilinx FPGAs can be found in [SCKA96] [RFZ97]
[RPFZ99] [SXCT00] [SLS03] [RZ00]. All of them target the Xilinx 4000 series FPGA
family. Xilinx 4000 and Spartan-I family of FPGAs do not support partial reconflgura-
tion.
54
[RFZ97] and [SXCT00] target Single interconnects that span between the adjacent
PLBs. The methodology described in [RFZ97] is one of the flrst attempts to target
the switch-box CIPs. The paper discusses testing of switch-box CIPs for stuck-on and
stuck-ofi faults in three test phases. This is elaborated in Figure 2.11. The approach
presented in [SXCT00] generates a parity bit for the WUTs consisting of single lines.
It seeks to eliminate the fault masking inherent in comparison-based ORAs, where both
sets of WUTs have equivalent faults. Another advantage of this approach is that the
TPG has to control only a single set of WUTs instead of two as in the case shown in
Figure 2.8. The Single lines tested by this approach comprise only about 10% of the
total routing resources of Xilinx 4000 series FPGA. These are simpler to test than MUX
CIPs and global routing resources that crisscross the complete PLB array [SLS03]. The
design of the TPG is also more complicated because of the overhead of calculating parity
as opposed to simple counter-based TPG used in [SLS03].
[SLS03]usescomparison-basedORAsfortestingallthelogicaswellasinterconnect
resources available in Xilinx 4000 and Spartan-I. The BIST strategy implemented is
derived from the earlier works such as [AS01] [HGWS99] [SNLA02]. For logic BIST,
two identical TPGs drive the test patterns to the BUTs. The structure of the TPG is
difierent depending on the mode of operation being tested. The output of each BUT is
compared by two comparison-based ORAs on both the sides. The BIST test sessions are
columnorientedduetothepresenceofmoreverticalroutingresourcesthanthehorizontal
ones and the presence of dedicated carry chain routing that is vertically oriented.
For interconnect BIST, comparison-based ORAs and 2-bit counter-based TPGs are
used. The 2-bit counter-based TPG is implemented within a single PLB of Xilinx 4000
family for generating 4-bit test patterns using the current and next state of the counter.
55
CB
CB
CB F
CB CB
CB
A
E
DC
a) Test Phase #1 b) Test Phase #2
CB
CB
CB F
CB CB
CB
A
E
DC
B
B
CB
CB
CB F
CB CB
CB
A
E
DC
B
S?On S?Off S?On
S?On S?On
S?On S?Off S?On
S?Off
S?Off
S?On
S?On
S?On
S?On
S?Off
S?Off
S?Off
S?On
F
E
D
C
B
A
CIP
Test Phase
1 2 3
c) Test Phase #3 d) CIP Faults Detected
Open CIP
Logic 1
Logic 0
Closed CIP
Figure 2.11: Complete Testing of Switch Boxes [SWHA98]
56
A 0,1 and 1,0 combination exists in any pair of the four bits of the test pattern, at least
once in the four test pattern sequences [SLS03].
A minimum of 12 test phases are required to completely test the PLBs, including
the dedicated carry-logic and 206 conflgurations are required to test the programmable
interconnectresources[SLS03]. Athirdofthetestconflgurationsofprogrammableinter-
connectareattributedtothepresenceofwiderMUXCIPsthatrequireoneconflguration
for each of its inputs [SLS03] [SNLA02]. Other reasons cited for the higher number of
test conflgurations than reported in the earlier works are: reduced observability of ded-
icated carry logic for testing, sharing of routing resources with the adjacent PLBs, the
local routing resources disallowing inputs to and outputs from the PLBs to come from
and go to buses on any side of the PLB [SLS03].
Usually a scan chain mechanism would be used to for retrieving the ORA results
as shown in Figure 2.12. As illustrated in [HGWS99], scan chain interfaced with IEEE
1149.1 boundary scan interface improves diagnostic resolution. However, due to the
lack of architectural features in the Xilinx 4000 and Spartan series PLBs, retrieving
BIST results through a scan chain connected to an IEEE 1149.1 boundary scan interface
requires additional test phases. Thus the Pass/Fail results from the ORAs are obtained
through complete conflguration memory readback.
2.6.4 Using JBits API to Generate Interconnect BIST Conflgurations
Testing logic resources of an FPGA using JBits API is claimed in [SMG01]. The
approach described does not conflgure the FPGA with logic or routing BIST architec-
ture. The JBits program described in this approach does functional level testing of the
conflguration memory of the FPGA and the LUTs. The JBits program writes an FPGA
57
1
0
1
0
1
0
ORA 1
Pass/Fail from
ORA N/2 ?1
Pass/Fail from
BIST Done
To TDO
TDI
TCK
Logic 1
Figure 2.12: Scan Cell Interfacing with IEEE 1149.1 [HGWS99]
conflguration into the conflguration memory and then reads the conflguration data back.
While testing the LUTs, the JBits program writes a ?0? value into the LUTs and reads
back the data from the LUTs. Any mismatch between the two, is signaled on the user
input/output terminal. The method described for interconnect testing, conflgures two
LUTs with 16-bit shift registers. The flrst shift register drives logic patterns on the
WUT. The second shift register records the outputs. The contents of the second LUT
are then read back using conflguration memory readback. The approach does not con-
sider any of the fault models for the interconnect test. Therefore, fault coverage is an
issue. The approach considers the MUX in the PLB that drives the wire as a part of
the wire. Therefore, either the MUX or the wire may be faulty if there is a mismatch
between the data pattern being driven and the data pattern recorded.
[FH03] explores testing interconnect resources by implementing routing BIST using
JBits. The counter-based TPG used is 11 PLBs tall and 2 PLBs wide. The comparison-
based ORA used has height of 20 PLBs. Of the Single, Hex and Long wires, only Single
wires are tested. The approach only tests switch box CIPs. As the Hex wires are not
58
tested, the CIPs at the either end of the Hex wires also remain untested. The CIPs are
nottestedforstuck-on(stuck-closed)faults. Outof960switchboxesinaVirtexXCV150
chip that features a 24?36 array of PLBs, 776 switch boxes are tested. In this approach,
it is not possible to determine the location of the fault. For the approach to work, an
FPGA should feature a PLB array with at least 20 rows. This raises concerns about the
scalability of the approach. Finally, the memory required to store the conflguration, is
not optimized through partial reconflguration.
2.7 Thesis Statement
Improvements in performance of BIST for FPGAs can be obtained by minimizing
thesizeoftheconflgurationdatarequiredforeachtestphaseandminimizingtheamount
of readback data that needs to be read in order to extract the BIST results. With partial
reconflguration, regularity across the test phases can be exploited to generate the partial
bitstreams that are much smaller in size than the full conflguration bitstreams. With
the partial conflguration memory readback, it is possible to read only the contents of the
ORA columns instead of the complete conflguration memory. This thesis examines the
improvements that can be obtained with partial reconflguration and readback. It also
examines implications of the design and generation of logic and routing BIST conflgu-
rations as a result of partial reconflguration and readback. While this thesis focuses on
Xilinx Virtex-I and Spartan-II FPGAs the proposed methods can be used in any FPGA
that supports partial reconflguration and readback.
TheJBitsAPIcontainscompletefunctionalitytogeneratelogicBISTconflgurations
for Virtex-I and Virtex-II FPGAs. In order to generate logic BIST conflgurations for
Virtex-I and Virtex-II family of FPGAs using JBits, RTPCores can be designed for
59
counter-based TPGs, BUTs conflgured in difierent modes of operation and comparator-
based ORAs. The RTPCores can be routed by a parent core. In this thesis, we identify a
semi-automaticroutingmethodusingJRoutethatissuitableforgeneratingroutingBIST
conflgurations. TheresponseoftheORAscanberetrievedusingthepartialconflguration
memoryreadbackfacilityofieredbytheVirtexarchitecture. Strictlyspeaking,JBitsAPI
isacollectionofuserabstractionsbuiltovertheconflgurationmemorymapoftheFPGA.
Therefore, porting JBits API to other FPGAs requires knowledge of the conflguration
memorymapoftheFPGA,whichFPGAmanufacturersarereluctanttosharewiththeir
customers. However,methodsexisttoidentifythecorrespondencebetweenthebitstream
ofisets and the resources controlled. Therefore, this method of generating interconnect
conflgurations using JBits may be ported to other FPGAs as long as the contract with
JBits is fully implemented for the target FPGA.
60
Chapter 3
Partial Reconfiguration and Readback for Logic BIST
The performance of BIST depends on the number of test phases and test sessions,
the total time to download the test phases, the size of the memory required to store
all BIST conflgurations needed to guarantee 100% stuck-at fault coverage and the time
requiredtoretrievetheORA Pass/Fail results. Partialreconflgurationcanbeefiectively
used as a strategy to reduce download time and to minimize the size of the memory re-
quired to store all logic and routing BIST conflgurations. In this chapter we present the
experiments performed to quantify the performance improvement in terms of reduced
time required while loading the logic BIST conflguration by partial reconflguration and
reading the results by partial conflguration memory readback. In the next chapter we
discuss the minimizing conflguration time required to load the routing BIST conflgura-
tions using partial reconflguration.
In each logic BIST test phase, PLBs conflgured as BUTs are tested in one of their
modesofoperation. Inthenexttestphase,themodeofoperationoftheBUTsischanged.
The difierence between the conflguration bitstreams for the two test phases, is expected
tobesmallasnowonlytheBUTsareconflguredinthedifierentmodeofoperation, while
theTPGs, ORAsandroutingusuallymaintainthesamefunctionalityandconflguration.
Partial readback of the conflguration memory would reduce the time required to retrieve
the ORA results compared to reading the entire conflguration memory.
61
3.1 Floorplan of Logic BIST to Aid Partial Reconflguration
The frame organization of the Virtex-I and Spartan-II architecture is column based.
If the BUTs are also aligned in the columns then all the changes in the conflguration
bitstream are limited to the frames associated with the columns of BUTs. This would
reducetheamountofconflgurationdataneededtobestoredanddownloaded; asafterthe
flrst full conflguration, only data that needs to be saved in the memory and downloaded
for the next BIST conflguration are the frames that have changed from the conflguration
alreadyintheplace. However, ifthe oorplanisroworiented, therewouldbeachangein
theconflgurationofBUTswhicharenowrow-based. Thiswouldresultinachangeinthe
conflguration of every conflguration memory column that holds the PLB conflguration.
Hence, there would be a change in the frame-data corresponding to those columns. Thus
the number of frames that need to be loaded is twice the number of frames that need be
loaded if the  oorplan of the BIST test phase were column oriented, since half as many
PLB columns are under test in a given test session.
Another important efiect of having a column-oriented test session is observed while
performing partial conflguration memory readback to retrieve the ORA Pass/Fail results
at the end of each BIST sequence. In the column-oriented BIST test phases, the ORA
results would be conflned to (N2 ?1)?2 frames, as there are two ORAs per column. In
ordertoretrievetheORAPass/Failresults,onlytheframesthatmapthe ip- opsinthe
PLBsconflguredasORAsneedtoberead. The ip- opscontainingthePass/Failresults
of the test phase lie in difierent frames and therefore have difierent minor addresses. As
a result, the number of ORA Pass/Fail results is the number of ORA columns multiplied
by two. However, in the row-oriented BIST test phase, it is necessary to read back N ?
62
Column of BUTs Column of BUTsColumn of TPGs Column of ORAs Column of ORAsColumn of BUTs Column of BUTsColumn of ORAs Column of BUTs Column of ORAs Column of BUTs Column of BUTsColumn of ORAs Column of ORAs Column of BUTs Column of TPGs
(a) Test Session 1 (b) Test Session 2
Figure 3.1: Floorplan for BIST Test Session
2 frames to retrieve all the ORA Pass/Fail results as there would be (N2 ?1)?2 ORA
Pass/Fail results in every PLB column.
Therefore the  oorplan of a BIST test session i.e. TPG, BUT and ORA, should be
column-based to aid partial reconflguration and readback as shown in the Figure 3.1.
3.2 Generating Partial Reconflguration Files
In this section we present the process of generating the partial reconflguration flles
for logic and interconnect BIST. A JBits program or a C program may be written to
produce the XDL flles containing the netlist of logic or interconnect BIST circuits for
Virtex-I and Spartan-II families of FPGAs. We used a C program that automatically
generates XDL flles for the logic BIST test phases, developed by Dr. Charles Stroud.
This program is run with speciflc command line arguments to directly generate the XDL
63
flles for the target chip. The XDL flles are then converted into NCD flles using the
program \xdl.exe" as follows:
xdl.exe -xdl2ncd outfile.xdl
The logic BIST conflguration bitstreams for the target chip are generated from the
NCD flles using the BitGen program. The partial bitstreams can be generated following
the process described in the next subsection.
3.2.1 Using BitGen
In order to generate the partial reconflguration flles, BitGen is run from the com-
mand line with the design netlist flle (.ncd) passed as an argument and the command
line option \ActiveReconflg" is disabled by default (ActiveReconflg:No). This means
that the generated bitstream would contain the shut down instruction (Shutdown and
AGHIGH commands) and the GSR signal would get activated after the partial recon-
flguration is done [Xil02c] [Xil02d]. After the partial reconflguration is complete, the
GSR signal clears the ORA  ip- ops and initializes the TPGs and BUTs for the next
test phase.
The following is an example usage of BitGen for generating partial BIST conflgura-
tion flles:
bitgen -g ActiveReconfig:No -r \
bist_phase00.bit bist_phase01.ncd bist_phase01.bit
64
Here -r option is used to create the partial bitstream (bist phase01.bit). This flle
contains frames that are difierent between the full conflguration bitstream of the new de-
sign(bist phase01.ncd)andthecurrently-loadedfullconflgurationbitstream(bist phase00.bit).
The \n" simply indicates the continuation of the command on the next line.
BitGen Command Line Options
The following command line options are useful when generating partial reconflgu-
ration bitstream, readback bitstream, interconnect or logic BIST:
-b Whenthisoptionisspecifledonthecommandline,BitGenproducesanASCIIversion
ofthebitstream(extension.rbt). TheASCIIfllerepresentsthebitstreaminhuman
readable format. The ASCII flle can be used for debugging when the bitstream is
modifled for the fault-injection emulation.
-g Persist When this option is specifled on the command line, the readback circuit
is conflgured in the output bitstream and contents of the conflguration memory
can be retrieved. This option should be enabled for retrieving the ORA Pass/Fail
results using the conflguration memory readback.
-g Readback When this option is specifled on the command line, BitGen produces an
ASCIIfllethatcontainsreadbackcommands(extension.rba)anditsbinaryversion
(extension .rbb) as well as an ASCII flle with readback data (extension .rbd) that
can be used to verify if the conflguration was loaded properly.
-l When this option is specifled on the command line, BitGen produces an ASCII report
flle (extension .ll) that enumerates all the components in the design that can be
read back or captured. The flle contains information about the location of the bits
65
in the readback bitstream, frame address, frame ofiset, type of logic resource and
name of the component. After the ORA frames are retrieved, the location of the
ORA Pass/Fail results in the frame are indicated by the logic allocation flle.
3.3 Generating a Test Plan for Logic BIST
As the partial bitstream records only the changes in the conflguration memory
between the two conflgurations, a partial reconflguration bitstream can be downloaded
only after its reference conflguration precedes it. Therefore, in general, the sequence in
which the full and partial bitstreams are downloaded is important. The sequence of the
partial bitstreams, and consequently logic or interconnect BIST test phases, should be
such that a minimum number of resources change their conflguration from one phase
to the next. If only a small number of logic resources change their conflguration, the
numberofframesthatchangefromoneBISTtestphasetothenextissmall. Thisresults
in a smaller size of the partial reconflguration bitstream containing the BIST test phase
and faster download times.
Limitations in the Virtex-I architecture prevent testing two slices in a PLB at the
same time. For example, there are only eight output muxes to drive the signals from
the two slices to the routing resources. If the implemented design contains any feedback
loops or drives multiple outputs of the multiplexer, the number of signal outputs from
the BUT that can be compared in a single BIST test phase is further reduced. The BUT
has twelve output signals that need to be compared by two difierent ORAs. The ORA
implemented in a single slice is capable of comparing three pairs of output signals from
two identically conflgured BUTs. Due to the limited number of output multiplexers to
carry the signals from BUT to ORA, only a single slice can be conflgured as a BUT.
66
Hence, in a single test phase, one can test a single slice. This doubles the number of
test conflgurations as well as having an impact on generating a test plan for partial
conflgurations.
In order to study the efiect of partial reconflguration on logic BIST, we generated
difierent logic BIST conflgurations to bring individual slices under test and conflgured
them in difierent modes of operation. There are four options available for conflguring
and testing the two slices.
? The slice not under test maintains the same operation mode from the previous test
phase while the slice under test changes its mode of operation (Scenario 1),
? Both slices change the operation mode simultaneously (Scenario 2),
? The slice under test alternates from slice 0 and slice 1 in the successive test phases
and the slice not under test maintains its mode of operation from the previous
conflguration (Scenario 3),
? Themodeofoperationischangedkeepingthesamesliceundertestandtheslicenot
under test maintains its mode of operation from the flrst conflguration (Scenario
4).
The C program is capable of generating all four scenarios described above for two
consecutive test phases. This program demonstrates the improvement in time required
to load test conflgurations and is su?cient to draw conclusions about the test plan
that would beneflt the most from the partial reconflguration. The size of the bitstream
containing full conflguration of a test phase and consequently, the conflguration time
67
required to load the bitstream, does not vary over the test phases. Therefore, we are
able to generalize our results for all logic BIST test phases.
Initially, the FPGA is fully conflgured with the bitstream containing both slices
conflguredintestphase1, andslice0istested. Thesubsequentconflgurationsarepartial
conflgurations. The slice under test is indicated by shaded region in Figure 3.2. Scenario
1 (Figure 3.2(a)) illustrates the case when one of the slices is partially reconflgured
with the second test conflguration (from test phase 1 to test phase 2), while the other
slice maintains its previous conflguration. In Scenario 2 (Figure 3.2(b)), the case under
investigation is when both slices change their test conflgurations (from test phase 1 to
test phase 2) at the same time. As noted before, only one slice can be tested. In Scenario
3(Figure3.2(c)), slice1isbroughtundertestwhilebothslice0andslice1remainintest
phase 1. Finally, in the Scenario 4 (Figure 3.2(d)), the slice under test is flrst tested in
the test phase 1, then tested in test phase 2 while other slice maintains its conflguration.
The Table 3.1 shows the command list executed to prepare the full and partial bit-
stream flles for executing the test plan in the Scenario 1. The flrst four lines of the
command listing shown in the Table 3.1 generate the corresponding NCD flles from
the XDL flles. On line 5, the full conflguration bitstream is generated from the ini-
tial conflguration, i.e. where the slice 0 and 1 are both conflgured with test phase 1.
The line 6 lists the command line for generating a partial bitstream containing only
the frames difiering between the full conflguration (LogicBIST WestP1S0O0.bit) and
the new conflguration (LogicBIST WestP2S0O1.ncd). The result is a partial bitstream
(LogicBIST Part West P12S00O01.bit). We compile a full conflguration using the com-
mand on line 7. This is the full conflguration with which the FPGA is conflgured, after
loading the full conflguration bitstream, followed by the partial bitstream generated by
68
Phase2Phase2
Phase1
Phase1Phase1
Phase2
Phase1Phase1
(c) Scenario 3(a) Scenario 1
Slice 0 Slice 0 Slice 1Slice 1
(b) Scenario 2
Slice 0 Slice 1 Slice 0 Slice 1
Partial Configuration
Partial Configuration
Partial Configuration
Full Configuration
Full Configuration
Partial Configuration
Partial Configuration
Partial Configuration Phase2Phase2
Phase1Phase1
Phase1Phase1
Phase1 Phase2
(d) Scenario 4
Phase1
Phase1
Phase1
Phase1 Phase2
Phase1
Phase1 Phase1
Phase2 Phase2
Phase1 Phase1
Phase2 Phase2
Phase2
Phase1
Figure 3.2: Four Difierent Test Plans for Testing Two Slices
69
line 6. The full conflguration is needed while generating the next partial conflguration.
The lines 7 and 9 are exactly same as line 5 except that the bitstreams are generated
for difierent test phases. The lines 8 and 10 are exactly same as line 6 except that
partial bitstreams contain the difierence between LogicBIST WestP2S0O1.bit and Log-
icBIST WestP1S1O0.ncd (line 8) and LogicBIST WestP1S1O0.bit and
LogicBIST WestP2S1O1.ncd (line 10).
Table 3.1: Command Listing for Scenario 1
Line Command
1. xdl.exe -xdl2ncd LogicBIST WestP1S0O0.xdl
2. xdl.exe -xdl2ncd LogicBIST WestP2S0O1.xdl
3. xdl.exe -xdl2ncd LogicBIST WestP1S1O0.xdl
4. xdl.exe -xdl2ncd LogicBIST WestP2S1O1.xdl
5. bitgen -d -l -b LogicBIST WestP1S0O0.ncd
6. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP1S0O0.bit LogicBIST WestP2S0O1.ncd Log-
icBIST Part West P12S00O01.bit
7. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP2S0O1.ncd
8. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -g Read-
back -r LogicBIST WestP2S0O1.bit LogicBIST WestP1S1O0.ncd Log-
icBIST Part WestP21S01O01.bit
9. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP1S1O0.ncd
10. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -g Read-
back -r LogicBIST WestP1S1O0.bit LogicBIST WestP2S1O1.ncd Log-
icBIST Part WestP12S11O01.bit
InordertogeneratethelogicBISTconflgurationsforScenario2,weusethesequence
of commands given in Table 3.2. The  ow of commands is essentially the same as that
of Scenario 1.
70
Table 3.2: Command Listing for Scenario 2
Line Command
1. xdl.exe -xdl2ncd LogicBIST WestP2S0O0.xdl
2. xdl.exe -xdl2ncd LogicBIST WestP2S1O0.xdl
3. bitgen -d -l -b LogicBIST WestP1S0O0.ncd
4. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP1S0O0.bit LogicBIST WestP2S0O0.ncd Log-
icBIST Part WestP12S00O00.bit
5. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP2S0O0.ncd
6. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP2S0O0.bit LogicBIST WestP1S1O0.ncd Log-
icBIST Part WestP21S01O00.bit
7. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP1S1O0.ncd
8. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP1S1O0.bit LogicBIST WestP2S1O0.ncd Log-
icBIST Part WestP12S11O00.bit
71
InordertogeneratethelogicBISTconflgurationsforScenario3,weusethesequence
of commands given in Table 3.3. The  ow of commands is essentially the same as that
of Scenario 1 and 2. We do not repeat the process of converting XDL to NCD flles, as
they would be available to us from the Scenario 1 and 2.
Table 3.3: Command Listing for Scenario 3
Line Command
1. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP1S0O0.bit LogicBIST WestP1S1O0.ncd Log-
icBIST Part WestP11S01O00.bit
2. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP1S1O0.ncd
3. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP1S1O0.bit LogicBIST WestP2S0O0.ncd Log-
icBIST Part WestP12S10O00.bit
4. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP2S0O0.ncd
5. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP2S0O0.bit LogicBIST WestP2S1O0.ncd Log-
icBIST Part WestP22S01O00.bit
InordertogeneratethelogicBISTconflgurationsforScenario4,weusethesequence
of commands given in Table 3.4. The  ow of commands is essentially the same as that of
Scenarios 1, 2 and 3. The partial bitstream (LogicBIST Part WestP11S01O00.bit) and
the full conflguration as a resultant of that (LogicBIST WestP1S1O0.bit) are common
between Scenario 3 and Scenario 4, as the sequence of flrst two conflgurations is the
same.
Table 3.4: Command Listing for Scenario 4
1. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP1S1O0.bit LogicBIST WestP2S1O0.ncd Log-
icBIST Part WestP12S11O00.bit
2. bitgen -d -l -b -w -g Persist:Yes LogicBIST WestP2S1O0.ncd
3. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -
r LogicBIST WestP2S1O0.bit LogicBIST WestP2S0O0.ncd Log-
icBIST Part WestP22S10O00.bit
72
Table 3.5: Partial Frames
Scenario 1 Scenario 2
Conflg Slice 0 Slice 1 Frames Conflg Slice 0 Slice 1 Frames
Phase Phase Phase Phase
Full 1 1 Full 1 1
Partial 2 1 138 Partial 2 2 264
Partial 1 1 195 Partial 1 1 274
Partial 1 2 138 Partial 2 2 264
Table 3.6: Partial Frames
Scenario 3 Scenario 4
Conflg Slice 0 Slice 1 Frames Conflg Slice 0 Slice 1 Frames
Phase Phase Phase Phase
Full 1 1 Full 1 1
Partial 1 1 92 Partial 1 1 92
Partial 2 1 274 Partial 1 2 264
Partial 2 2 92 Partial 2 2 92
3.4 Experimental Results for Logic BIST
The partial bitstream is composed of the frames that are difierent between the
conflguration with which the FPGA is currently conflgured and the new conflguration.
Table 3.5 and Table 3.6, give the mode of operation for each slice and the number
of conflguration data frames of the current conflguration that difier from the previous
conflguration. Theflrstcolumninthetablesindicatesthattheflrstconflgurationisafull
conflguration while the subsequent conflgurations are partial. In Scenario 4, the total
number of frames of the bitstreams required to execute the test phases 1 and 2 would be
least. However, in the Scenario 4, the conflguration of both slices is changed at the same
time. As more and more test phases are considered, as opposed to two here, the number
of frames changed while switching the phase would be a dominant factor. The Scenario 1
shows a lower number of frames difiering when changing the test phase. In this Scenario,
73
the slice under test is conflgured in one test phase while the other slice maintains its
conflguration to test phase one. As a result, when all the test phases are considered,
the Scenario 1 may be expected produce fewer changed frames while switching the test
phases. Therefore, Scenario 1 depicts a test plan that beneflts the most from the partial
reconflguration.
The conflguration time is the time to load the FPGA with the speciflc design. The
conflguration time is directly proportional to the size of the bitstream. If all the other
factors, like the rate at which the conflguration bitstream is loaded, and the mode in
which the FPGA is conflgured, remain the same, the gain in the conflguration time is a
function of the ratio of the size of the partial bitstream in bytes to the size of the full
bitstream in bytes. The \Ratio" fleld in the Table 3.7 indicates the ratio of the size of
the partial bitstream in bytes and the size of the corresponding full bitstream in bytes.
From the Table 3.7 it is clear that as the devices get bigger (XC2S15 has PLB array
of 8x12 while XCV50 and XC2S50 have PLB matrix of 16x24), the ratio of the size
of the full bitstream to the size of the partial bitstream increases. Not only that, the
ratio is highest when the partial bitstream comprises two logic BIST phases with only
a few difierences in conflguration. Thus the beneflts of partial reconflguration are more
pronounced for the logic BIST test conflgurations for larger devices. The reason is that
full reconflguration becomes a more and more expensive process as the PLB array grows
and that high architectural regularity in a logic BIST test phases translates to lower
numbers of frames changing their conflgurations with the next logic BIST phase.
74
Table 3.7: Sizes of Partial Bitstreams vs. Full Bitstreams
XC2S15
Full Bitstream Partial Bitstream
Direction Phase Slice Size Size Ratio Previous Phase
(bytes) (bytes) Phase Slice
West 1 0 24797 NA NA NA NA
West 1 1 24797 6301 3.94 1 0
West 2 1 24797 8597 2.88 1 1
West 2 0 24797 6301 3.94 2 1
West 2 0 24797 8717 2.84 1 0
West 1 1 24797 8597 2.88 2 0
West 2 1 24797 8717 2.84 1 1
XC2S50
Full Bitstream Partial Bitstream
Direction Phase Slice Size Size Ratio Previous Phase
(bytes) (bytes) Phase Slice
West 1 0 69985 NA NA NA NA
West 1 1 69985 6829 10.25 1 0
West 2 1 69985 17179 4.07 1 1
West 2 0 69985 6829 10.25 2 1
West 2 0 69985 17805 3.93 1 0
West 1 1 69985 18561 3.77 2 0
West 2 1 69985 17717 3.95 1 1
XCV50
Full Bitstream Partial Bitstream
Direction Phase Slice Size Size Ratio Previous Phase
(bytes) (bytes) Phase Slice
West 1 0 69984 NA NA NA NA
West 1 1 69984 6828 10.25 1 0
West 2 1 69984 17716 4.07 1 1
West 2 0 69984 6828 10.25 2 1
West 2 0 69984 17804 3.93 1 0
West 1 1 69984 18472 3.77 2 0
West 2 1 69984 17716 3.95 1 1
XC2S50
Full Bitstream Partial Bitstream
Direction Phase Slice Size Size Ratio Previous Phase
(bytes) (bytes) Phase Slice
East 1 0 69985 NA NA NA NA
East 1 1 69985 6389 10.95 1 0
75
East 2 1 69985 16489 4.24 1 1
East 2 0 69985 6389 10.95 2 1
East 2 0 69985 16577 3.93 1 0
East 1 1 69985 17245 4.22 2 0
East 2 1 69985 16489 4.24 1 1
3.5 Partial Conflguration Memory Readback to Retrieve the BIST Results
AfterthecompletionofaBISTtestphase, the ip- opsineachoftheORAscontain
the Pass/Fail result for that test phase. Using the boundary scan interface, the results
can then be shifted to the TDO pin and subsequently read by the system controller
to determine faulty/fault free states of the FPGA. A diagnostic algorithm can be per-
formed on the BIST results to determine the location of the faulty PLBs [AS01]. Using
the boundary scan interface, the results can be retrieved using four methods: 1) using
existing boundary scan registers, 2) user deflned internal scan registers, 3) conflgura-
tion memory readback, 4) integrated ORA and scan register [HGWS99]. The third and
fourth methods have proved to be the most valuable in actual BIST implementations
[SLS03] [SNLA02] [AS01].
The Virtex series devices allow partial conflguration memory readback of the con-
flguration memory. The sequence of commands given to accomplish the partial conflgu-
ration memory readback is given in section 2.4. The FAR is set to the major and minor
address of the frame containing the ORA  ip- op that contains the Pass/Fail results. As
all the logic resources in a single column share a common major address, all the  ip- ops
in one column have a common major address. The minor addresses of the  ip- ops,
however, depend on the slice that  ip- ops belong to. The  ip- ops in the same slice
will have common minor address irrespective of the row or the column containing the
76
slice. Therefore, allthe ip- opslyinginasinglecolumnofthePLBarrayandinasingle
slice have common major and minor addresses. As there are two slices in each PLB, each
having two  ip- ops, regardless of the number of PLBs in a column,  ip- ops in a PLB
column are mapped into four difierent conflguration frames. Thus, the Pass/Fail results
from all the ORAs in a single column can be retrieved by reading only four frames. The
column-based logic BIST  oorplan for XCV100 would contain M2 -1=14 ORA columns,
where M equals the number of columns of PLBs. Therefore, all the Pass/Fail results
from all ORAs in the FPGA would be contained in 56 frames. In the selectMAP mode,
the readback data frame of XCV100 contains thirteen 32-bit words [Xil02d].
In order to perform the partial conflguration memory readback on 56 frames we
must flrst write 56?6+1 command words into the FPGA as the partial conflguration
memory readback is iterative in nature [MG00] [Xil02d]. Therefore, the total number
ofcommandbyteswrittenforthepartialconflgurationmemoryreadbackofalltheORAs
in a BIST test phase for an XCV100 is 1124. The total number of clock cycles required
would depend on the conflguration mode the FPGA is set in. For SelectMAP mode of
conflguration, one byte can be read from/written to the conflguration memory in each
clockcycle. Therefore,thetotalnumberofbytesinthereadbacktoretrievethePass/Fail
results from all the ORAs in a BIST test phase on XCV100 would be 56?13?4 = 2912.
The total number of clock cycles to perform partial conflguration memory readback for
the ORA Pass/Fail results would be 2912+1348 = 4260. The number of clock cycles
to perform partial conflguration memory readback would see an eight-fold increase in
the boundary scan mode which is a serial mode. Therefore, the total number of clock
cycles required to perform partial conflguration memory readback in boundary scan
mode would be 4260?8=34080.
77
The full conflguration memory readback requires 90216 bytes of readback bitstream
to be read and 28 bytes of command words to be written. The number of clock cycles
required for full conflguration memory readback is 21.2 times the number of clock cycles
required for partial conflguration memory readback.
For the integrated ORA and scan register, the results of a test phase are latched in
each ORA. There are (N?M2 -N)?4 ORA Pass/Fail results that need to be scanned out.
For XCV100, the number of ORA Pass/Fail results that need to be scanned out would
be 1120, which is 20 times faster than partial memory readback through boundary scan
interface. The Virtex architecture does not contain a dedicated multiplexer for utilizing
the internal scan chains. Therefore, implementing this approach on Virtex-I or Spartan-
II FPGAs presents a logic overhead of one multiplexer per ORA. Also, the  ip- ops in
the scan chain and the boundary scan signals need to be routed. The lack of dedicated
routing and logic resources to implement user deflned scan chains manifests itself in the
increase in the number of test conflgurations. The number of BUT inputs that can be
compared in ORA reduce from six to four, as two inputs need to be reserved for scan-in
from the previous stage and shift control input. Therefore, to test twelve BUT outputs
it takes three instead of two conflgurations.
From the above calculations, it can be deduced that the number of clock cycles
required for partial conflguration memory readback using the SelectMAP mode is 4?M2 -
1?ffn[6+FLR] + 4, where M is the number of columns in the PLB array, ffn is the
number of  ip- ops in a PLB and FLR is the length of the conflguration memory frame.
Therefore, the total number of clock cycles required for partial conflguration memory
readback is proportional to the product of the number of columns in the PLB array (M),
the number of  ip- ops in a PLB (ffn) and the length of the conflguration memory
78
frame (FLR). While the number of clock cycles required for integrated ORA and scan
register is proportional to the product of the number of columns in the PLB array (M)
and the number of rows in the PLB array (N). As N would be always smaller than the
product of FLR and the number of  ip  ops in a PLB, the integrated ORA scan chain
would fair better than partial conflguration memory readback despite one extra partial
conflguration.
The comparison of implementing three approaches is given in the Table 3.8. The
table depicts the number of bytes that need to be retrieved with each of the methods, as
the PLB array size increases from XCV100 to XCV150 to XCV1000. The full conflgu-
ration memory readback for XCV150 requires 121536 bytes to be read and 28 command
bytes to be written. Similarly, full conflguration memory readback for XCV1000 requires
745524 to be read and 28 command bytes to be written. For partial conflguration mem-
ory readback, XCV150 requires 72 frames, each of 60 bytes, to be retrieved. Therefore,
it requires 4320 bytes of data to be read and 1732 command bytes to be written. For
partial conflguration memory readback, XCV1000 requires 192 frames, each consisting
of 152 bytes, to be retrieved. Therefore, it requires 29184 bytes of data to be read and
4612 command bytes to be written.
Table 3.8: Comparison of Boundary Scan Access Method and Partial Conflguration
Memory Readback
XCV100
Method Additional Logic Number of clock
cycles
Full Memory Readback None 721728
(Boundary Scan)
Partial Memory Readback None 4260
79
(SelectMAP)
Partial Memory Readback None 34080
(Boundary Scan)
Integrated ORA and Scan Chain 1 MUX/PLB + routing 1120
XCV150
Full Memory Readback None 972512
(Boundary Scan)
Partial Memory Readback None 6052
(SelectMAP)
Partial Memory Readback None 48416
(Boundary Scan)
Integrated ORA and Scan Chain 1 MUX/PLB + routing 1632
XCV1000
Full Memory Readback None 5964416
(Boundary Scan)
Partial Memory Readback None 33796
(SelectMAP)
Partial Memory Readback None 270368
(Boundary Scan)
Integrated ORA and Scan Chain 1 MUX/PLB + routing 12032
3.5.1 Commands for Partial Conflguration Memory Readback
The command set for retrieving the BIST results from Virtex FPGAs using partial
conflguration memory readback is given in the Table 3.9. The FAR register is loaded
with the start frame address. The starting frame address comprises the major address
and the minor address of the frame of data to be read back. The number of 32-bit
words to be read back is calculated from the frame length of the device multiplied by the
number of frames to be read back. This value is indicated by the bits (10:0) of FDRO
register. In this example, the column number 11 in the XCV50 is to be read back. The
major address of this column is 2. The packet data to be loaded into the FAR register
is as follows: The bits (26:25) are set to (00)2 to signify that the address lies in the
one of the PLB columns, the bits (24:17) in FAR register indicate the major address,
bits (16:9) indicate the minor address. Therefore, the FAR register is written with the
80
address as (00040000)h. The column consists of 48 frames, each of 44 bytes. The read
back data is preceded by the a 32-bit pad data word. One pad frame is also included
while calculating the number of words to be read back. Therefore, the total number of
32-bit words to be readback would be 49?48=588.
Table 3.9: Bitstream for Partial Conflguration Memory Readback
FFFF FFFF
AA99 5566 Synchronization Word
3000 2001 Packet Header: Write to FAR register
0004 0000 Packet Data: Starting frame address
3000 8001 Packet Header: Write to CMD register
0000 0004 Packet Data: RCFG
2800 624C Packet Header: Read from FDRO
0000 0000 Flush the pipeline
3.6 Summary
The  oorplan of the logic BIST should be column-based to aid partial reconflgura-
tion and partial conflguration memory readback. The plan that beneflts the most from
partial reconflguration is identifled as Scenario 4 where except for the flrst conflguration,
a slice under test changes its mode of operation while the other slice maintains its con-
flguration. The conflguration time for loading the logic BIST test phase is signiflcantly
reduced using the partial conflgurations instead of the full conflgurations. As the devices
get bigger, the ratio of the size of equivalent full conflguration bitstreams to the size of
partial bitstreams increases. Thus, less time is required to load the test phases. The
regularity in the architecture of logic BIST across the test phases is advantageous for
the partial reconflguration. Partial conflguration memory readback is a preferred over
the full conflguration memory readback to retrieve the ORA Pass/Fail results, however
81
partial conflguration memory readback still lags behind the integrated ORA and scan
chain method of retrieving the ORA Pass/Fail results.
82
Chapter 4
Generating Routing BIST Configurations using JBits
InthischapterweexplorehowJBitsAPIcanbeusedtogeneratetestconflgurations
for testing interconnect resources. Though it is possible to generate logic as well as
routing BIST conflgurations using JBits, in this thesis, a method has been implemented
for generating routing BIST conflgurations for Virtex-I and Spartan-II FPGAs using
JBits API. For implementing routing BIST using JBits API, one approach is to use the
RTPCores to conflgure the PLBs as counter-based TPGs or comparator-based ORAs.
These RTPCores are referred to as BIST RTPCores. The routing BIST architecture
implemented by the BIST RTPCores and the fault models targeted by the routing BIST
architecture, are presented in section 4.1.
The JBits program developed for this thesis automatically generates bitstreams and
XDL flles containing routing BIST conflgurations using the JBits API for Virtex-I. The
BIST RTPCores used as TPGs and ORAs and the RTPCores responsible for routing the
WUTs between the TPGs and ORAs, as well as populating the PLB array, are described
in the section 4.2. The JBits program outputs a bitstream for Virtex-I FPGAs and an
XDL flle that enumerates the design specifled in the program. The header of the XDL
flle denotes the chip, package and speed grade information [Xil00]. As the architecture
of the Virtex-I and Spartan-II FPGAs is the same, the header of the XDL is modifled to
re ect the target Spartan-II chip, package and speed grade information. The modifled
XDL flle is further processed to generate a bitstream that tests the interconnects of
Spartan-II FPGAs. The details about command line options and results obtained by
83
running the program are given in the section 4.3. This section also presents an estimate
of the number of routing BIST conflgurations required to test all routing resources of
Virtex-I and Spartan-II FPGA.
In the subsection 4.4.3, we present a set of four BIST conflgurations to completely
test switch box CIPs encountered in Virtex-I and Spartan-II, for CIP stuck-on and CIP
stuck-ofi faults. We also propose the desired modiflcations in order to generate the
conflgurations to test the switch box CIPs using JBits. The section 4.5 proposes the
desired modiflcations in the current implementation of the JBits program in order to
generate the conflgurations to test the switch box CIPs using JBits.
4.1 Overview of Routing BIST Architecture
The BIST RTPCores instantiate the architecture for routing BIST: Two counter-
based TPGs generate two-bit exhaustive test patterns over eight parallel WUTs. Two
comparison-based ORAs compare the WUTs from difierent TPGs for a mismatch. The
TPGs, WUTs and ORAs are shown in the Figure 4.1. The BIST RTPCores conflgure
two slices with TPGCounterCores and two slices with ORACores which are comparison-
based ORA that compare the WUTs from difierent TPGs for a mismatch. The BIST
RTPCores inherit the functionality of an RTPCore. This enables the BIST RTPCores
to take the advantage of HDL-like features and let the router do the most of the routing.
When the speciflc resource under test, like a Hex wire or Single wire, is to be specifled,
the BIST RTPCores directly refer the physical resources of the FPGA into the design.
In order to ensure the fault detection ofiered by this architecture, we consider the fault
models described in the Chapter 2. The routing BIST conflgurations generated by the
program, test for bridging faults, wire stuck-at faults, CIPs stuck-ofi and CIPs stuck-on
84
ORA
TPG TPG ORA ORA
TPGTPGORA
(a)H?STAR and V?STAR while Testing Horizontal
Hex Interconnects going E?W or Vertical Hex Interconnects 
(b)H?STAR and V?STAR while Testing Horizontal
going S?N
Hex Interconnects going W?E or Vertical Hex 
Interconnects going N?S
Figure 4.1: Horizontal and Vertical Interconnect Resources Tested for Shorts and Opens
faults afiecting Single as well as Hex wires. To sensitize the bridging faults between
adjacent wires, both logic levels, logic-0 and 1, need to be presented at least once in the
test phase on the parallel running Hex and Single wires. To identify the wires that run
parallel, and thus are susceptible to bridging faults, we must have the physical layout
of the FPGA. In the absence of this we rely on the graphical layout presented by the
FPGA Editor. As the exhaustive test patterns would contain both combinations of
opposite logic values (0,1 and 1,0) at least once, the bridging faults between the parallel
wires should be detected. If any of the CIPs along the WUTs is afiected by stuck-open
faults, then the comparison-based ORA would record a mismatch between the outputs
of the two identically conflgured TPGs driving the WUTs. In order to detect the CIPs
stuck-closed fault, the TPG controls both the segments associated with the open CIP
and applies opposite logic patterns to each segment. Thus both combinations of logic,
01 and 10, need to be tested.
85
4.1.1 Testing the Interconnects in Parallel
Multiple Hex and Single interconnects can be driven by an output mux. The TPG
drives exhaustive test patterns to the fan-outs of the output mux and tests them in
parallel. The number of Hex and Single resources that can be reached from an output
multiplexer are listed in the Table 4.1. A set of interconnects routable from one output
multiplexer does not overlap with another set of interconnects reachable from the other
output multiplexer. While identifying the Hex and Single interconnects that may be
tested in parallel, we notice that only the following types of Hex interconnects may be
grouped together: Hex East and Hex South, or Hex East and Hex West, or Hex North
and Hex South, or Hex North and Hex West. Testing East and South interconnects and
WestandNorthinterconnectstogetherissimplercomparedtotestingEastandWestand
North and South interconnects. When testing the Hex wires, in the flrst interconnect
BIST phase, two identically conflgured TPGs drive two-bit exhaustive test patterns on
four Hex interconnects going South-North and four Hex interconnects going East-West.
Two comparison-based ORAs detect any mismatch that results from a fault. In the
second interconnect BIST phase, four Hex interconnects going North-South and four
Hex interconnects going West-East are tested. The TPGs and the ORAs are arranged
in the PLB array as shown in Figure 4.2(a). When testing the Single interconnects, the
TPGs and the ORAs are arranged in the PLB array as shown in Figure 4.2(b).
86
Table 4.1: Connectivity between Output Multiplexers and Interconnects
Interconnect Output Multiplexer0 1 2 3 4 5 6 7
Hex Horiz East 1 1 1 1 1 1 1 1
Hex Horiz West 1 1 1 1 1 1 1 1
Hex Vert South 1 1 1 1 1 1 1 1
Hex Vert North 1 1 1 1 1 1 1 1
Single East 1 2 1 2 1 2 1 2
Single West 1 2 1 2 1 2 1 2
Single South 2 1 2 1 2 1 2 1
Single North 2 1 2 1 2 1 2 1
4.2 The Routing BIST RTPCores
The components of a JBits program observe a certain hierarchy. The components
at a certain level in the hierarchy are designed to perform certain tasks. While designing
the BIST RTPCores, the hierarchy should be taken into consideration. This enables
the RTPCores to make optimum use of the functionality ofiered by the JBits API.
Observing these restrictions also results in maintainable code. Some of the important
design decisions are:
? The parameters that deflne and modify the behavior of the RTPCore,
? Which RTPCores in the hierarchy should conflgure PLBs,
? Which RTPCores in the hierarchy should conflgure the routing,
? How the bitstream and XDL flle is generated,
? How does the programmer control routing, and
? Which logical nets and buses are re ected in the XDL flle and which are ignored.
87
(a) Testing Hex Lines for Bridging Faults and Opens
PLB 6 PLB 7
TPG TPG TPG
ORAORATPG TPG
ORAORA
TPG TPG
ORAORA
ORA ORA
TPG
Slice0
Slice0 Slice1
Slice1Slice1
Slice1Slice1Slice0
Slice0 Slice1 Slice0
Slice0Slice1Slice0
Slice0 Slice1
PLB 0 PLB 1
PLB 0
ROW 0
ROW 6
WUT
WUT
WUT
WUT
WUT
WUT
ORA
(b) Testing Single Lines for Bridging Faults and Opens
ORA TPGTPG
Slice1Slice0 Slice0 Slice1
PLB 2 PLB 3
ORA ORATPGTPG
Slice1Slice0 Slice0 Slice1
PLB 1PLB 0
WUTWUT
Figure 4.2: Hex and Single Wires Tested for Shorts and Opens
88
AstheseconsiderationsaregenericandpertaintoeveryJBitsapplication,discussion
on all of the above considerations is beyond the scope of this thesis. In this thesis we
restrict ourselves to the discussion on how these considerations come into play while
developing BIST RTPCores. The steps described in the Appendix A and Appendix
B adhere to hierarchy in JBits API and are  exible enough to let the programmer
control the routing. The program developed for this thesis is given in Appendix C and
contains four classes { SimpleRouteBISTApp, SimpleRouteBIST, TPGCounterCore and
ORACore.
SimpleRouteBISTApp is the entry point of the application. Therefore, it takes care
of user interaction and command line arguments. Many useful functions pertaining to
the user interaction are contained in the class com.xilinx.util.JBitsCommandLineApp.
Therefore, SimpleRouteBISTApp derives its functionality from this class. SimpleRoute-
BISTApp performs following functions:
1. Parses the command line arguments.
2. Populates entire PLB array with the routing BIST RTPCore generated by Sim-
pleRouteBIST. In order to accomplish this, the parameters to the RTPCore Sim-
pleRouteBIST are varied, as will be explained in subsection 4.2.4.
3. Generates the XDL flle, CTF flle and the bitstream for the target FPGA.
The class SimpleRouteBIST is a parent RTPCore. The RTPCore derives its func-
tionality from the class com.xilinx.JBits.CoreTemplate.RTPCore, which contains useful
functions to deflne logical nets and ports. The behavior of SimpleRouteBIST depends
on two parameters { i and j, which identify row and column numbers in the PLB array,
89
respectively, where the TPGs should be placed. When testing the Hex lines, the ORA is
always six PLB blocks away from the PLB conflgured as a TPG. Therefore, when testing
Hex lines, SimpleRouteBIST class conflgures a slice of the PLB six blocks away as an
ORA. When testing Single lines, SimpleRouteBIST class conflgures the ORA in a slice
of the PLB adjacent to the one conflgured as a TPG. SimpleRouteBIST class then calls
a method internalRoute with parameters width, row and col. The parameter width tells
the method the size of the WUT group. The parameter width depends on the number
of  ip- ops contained in a slice, which is two in case of Virtex-I. The parameters row
and col specify the row and column number of the TPG. The method then routes the
WUT group of size width?2 originating from the PLB(row,col) to the PLB(row,col+6)
whengeneratingroutingBISTconflgurationtestingHexwiresrunningEast-West. When
WUTsareHexwiresrunningWest-East, theco-ordinatesofthePLBconflguredasORA
change to (row,col-6). Similarly, when testing the vertical Hex wires going from North-
South, the ORA would be placed in the PLB (row-6,col). Finally, when the WUTs are
vertical Hex wires with direction South-North, the coordinates of the PLB conflgured as
an ORA would be (row+6,col). SimpleRouteBIST performs following functions:
1. Assigns placement constraints to the child cores depending on the parameters,
2. Conflgures two slices of a PLB as identical counter-based TPGs,
3. ConflguresasliceofanotherPLBasanORAcomparingtheoutputsoftheidentical
TPGs, and
4. Conflgures routing between the two PLBs with the specifled WUTs.
90
The classes TPGCounterCore and ORACore are child RTPCores. The child RTP-
Cores are spawned by the parent RTPCore SimpleRouteBIST. Therefore the placement
constraints are deflned by the parent RTPCore. The behavior of these RTPCores de-
pends on the parameter width. The value of this parameter determines how many slices
the RTPCore occupies in the PLB array. The function of specifying the routing be-
tween the child RTPCores is also performed by SimpleRouteBIST. The architecture of
each of these RTPCores is elaborated in the subsection 4.2.1 and 4.2.2. The bitstream
manipulation is the responsibility of the top level RTPCore: SimpleRouteBISTApp.
4.2.1 Conflguring the TPG
The program uses a counter-based TPG to generate exhaustive test patterns for the
WUTs. Virtex-IandSpartan-IIPLBscontainfour ip- ops,twoineachslice. Therefore,
a Virtex-I or Spartan-II slice can be conflgured as a 2-bit counter, generating four test
vectors. The logical buses and ports deflned in the class TPGCounterCore are shown in
the Table 4.2. The RTPCore SimpleRouteBIST deflnes upper level logical buses. The
ports connect the upper level logical buses to internally deflned ones. The TPG  ip- ops
must be conflgured to be reset during the start-up sequence in ensure that the two TPGs
being compared by the ORA are synchronized to produce identical test patterns during
the BIST sequence.
Table 4.2: Input and Output ports of TPGCounterCore
Port Width (in Bits) Direction Function
clk 1 IN Counter Clock
dout 2 OUT Counter Output
ce 1 IN Counter Enable/Disable
sr 1 IN Counter Set/Reset
91
4.2.2 Conflguring the ORA
The functionality of the comparator-based ORA is implemented in the class ORA-
Core. The logical input and output ports of the ORACore are shown in the Table 4.3.
A Virtex-I slice conflgured as an ORA is shown in the Figure 4.3. As stated earlier, each
of the two identical TPGs drives 2 WUTs. Four WUTs are connected to the addr input
port of the ORACore. This input port, when the design is placed and routed, corre-
sponds to the A1, A2, A3, and A4 inputs of the G LUT. Therefore, when the design is
placed and routed, the inputs of the G LUT are connected to the WUTs. The G LUT
needs to be conflgured to output a logic 1 if there is any mismatch between the logic
values of the WUTs driving identical test patterns. Therefore, the LUT is conflgured
with the expression (A1 XOR A2) OR (A3 XOR A4).
The Pass/Fail result is latched by the  ip- op. Since the 4 inputs of G LUT have
been exhausted, we cannot use the Y  ip- op to form a feedback path to the input of
the G LUT. Therefore, the F LUT and the X  ip- op is used. The F LUT ORs the
output of the G LUT and X  ip- op, which contains the Pass/Fail result of the test
phase. Thus, the output of the G LUT needs to be connected to the Y output of the
PLB to form a feedback loop to the A1 input of the F LUT. Here we utilize the facility
given by JBits API to allocate FPGA physical resources, such as the Y output pin of
the PLB, to any of the logical ports of the RTPCore. Using this facility, port lutOut is
assigned to the Y output pin. The F LUT is conflgured to implement logic expression
(A1 OR A2). Finally, the output of the X  ip- op (XQ) is connected to the A2 input of
the F LUT. The X  ip- op must be conflgured to be reset during the start-up sequence
prior to execution of the BIST sequence.
92
Table 4.3: Input and Output Ports of ORACore
Port Width (in Bits) Direction Function
clk 1 IN Clock Input
addr 4 IN Input to the G LUT
lutOut 1 IN-OUT Feedback to the Latch
oraOut 1 IN-OUT Pass/Fail Output of the ORA
sr 1 IN Set/Reset Input
4.2.3 Routing the WUTs
JBits API gives  exibility to the programmer to specify the physical routing after
the RTPCores have been placed. The programmer can skip this step and let JRoute
do the completely automatic logical-to-physical mapping at the expense of the control
over the routing. In order to generate routing BIST conflgurations, we must maintain
control over the process of routing the interconnect resources to be tested. This is
the rationale behind method internalRoute(int i, int j). This method may be called
anywhere in the JBits program before the call to static method connect(Bus) in the class
com.xilinx.JBits.CoreTemplate.RTPCore.Bitstream is made. This method flrst deflnes
thephysicalsourcesandsinkpinsalongwiththeWUTs. Theoutputpinsofthe ip- ops
of the two TPGs are deflned as source pins. The input pins of the G LUT of the ORA
are the sink pins. The static integers corresponding to the WUTs are selected from the
class com.xilinx.JRoute2.Virtex.ResourceDB.CenterWires.
Thesephysicalresourcesarethenroutedusingtheautomaticroutingmethod. Rout-
ing RTPCores using this semi-automating method has several advantages:
1. The routing resources to be tested can be specifled under user control,
2. It is not necessary to specify routing of the other elements not under test in the
path,
93
A3
A4
A1
A2
A3
A4
WUT 1
WUT 0
Virtex?I Slice
A1
A2
WUT 3
WUT 2
G LUT
F LUT
X FF
XQ
Y
Figure 4.3: Conflguration of Comparator-Based ORA
3. The complexity of routing the switch box CIPs is avoided except while testing
CIPs stuck-ofi faults.
4.2.4 Populating the PLB Array
The class SimpleRouteBISTApp takes care of populating the PLB array with the
instances of RTPCore SimpleRouteBIST. The command line arguments that are parsed
in the main routine, specify the Virtex-I chip and the package being used. The number
of columns and rows in the PLB array of the chip is obtained with the static methods
deflned in the class com.xilinx.JBits.Virtex.Devices: getClbCols(device name) and get-
ClbRows(device name), respectively. It is important to keep in mind that JBits API
deflnes the lower left corner of the PLB array as the origin and calculates the X and
94
CoutSrcPin = new Pin[w];
OraPin = new Pin[w];
WutPin = new Pin[w];
JRoute jroute = Bitstream.getVirtexRouter();
          CoutSrcPin[0] = new Pin(Pin.CLB, row, col, CenterWires.Slice_XQ[slice]);
          CoutSrcPin[1] = new Pin(Pin.CLB, row, col, CenterWires.Slice_XQ[slice+1]);
          CoutSrcPin[2] = new Pin(Pin.CLB, row, col, CenterWires.Slice_YQ[slice]);
          CoutSrcPin[3] = new Pin(Pin.CLB, row, col, CenterWires.Slice_YQ[slice+1]);
          OraPin[0] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG1[slice]); 
          OraPin[1] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG2[slice]);
          OraPin[2] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG3[slice]);
          OraPin[3] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG4[slice]);
          WutPin[1] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[1]);
          WutPin[0] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[0]);
          WutPin[2] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[2]);
          WutPin[3] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[3]);
          for (int i=0;i<w;i++) {
                    jroute.route(CoutSrcPin[i], WutPin[i]);
                    jroute.route(WutPin[i], OraPin[i]);
          }
}
private void internalRoute(int w, int row, int col) 
throws RouteException, ConfigurationException  { 
Figure 4.4: JBits Program for Routing the WUTs
95
Y coordinates of the PLB accordingly. The location of the origin is difierent from the
one followed by the FPGA Editor and Xilinx data manuals, where the PLB located at
the upper left corner is regarded as the origin. This is important while determining the
location of the ORA relative to the location of the TPG.
Suppose the device contains MAX COL columns and MAX ROW rows in the PLB
array. The program that tests the horizontal Hex interconnects loops over the parameter
i, which is varied from 0 to MAX ROW as the outer loop and the parameter j, which is
varied from 0 to MAX COL-6, as the inner loop, to populate the entire PLB array with
the routing BIST test phase. Once the value of j reaches MAX COL-6, it can no longer
be incremented as this would place the ORA beyond the maximum number of columns,
which triggers an exception condition in the JBits program. All Virtex-I devices have a
MAX COL value that is a multiple of six. Therefore, the edges of the horizontal BIST
RTPCore align with the PLB columns in all the Virtex-I devices. However, MAX ROW
is not a multiple of six. Therefore, when populating the PLB array with the vertical
BIST RTPCore, the TPG and ORA need to be routed through the top IOB cell. This
condition is shown in the Figure 4.5. Care should be taken that the corresponding IO
pin is set in the tristate mode.
A special condition exists because the Hex lines end at the PLB six blocks over.
Because of this, there is an ORA conflgured every six blocks. Therefore when the inner
loop iterates flve times, the PLB is already conflgured as an ORA. Therefore, any more
incrementin j causestheprogramtothrowanexceptionconditionasittriestooverwrite
the PLB conflgured as an ORA. This condition is detected and j is incremented by six
instead of one.
96
ORA
TPG
IOB Cell
Hex
Interconnects
at the Boundary
of the Device
Figure 4.5: Boundary Condition for Populating PLB Array in Vertical Direction
97
4.2.5 Generating the XDL File
The JBits API gives allows the programmer to specify the output flle format of
the application. The output flle format of the JBits program can be XDL and/or bit-
stream. The choice of the output flle format is specifled by calling the static meth-
ods, generateXDL() and/or generateBitstream() from class CoreOutput in the package
com.xilinx.JBits.CoreTemplate.RTPCore. These method calls are placed in the run()
method of class SimpleRouteBISTApp. The XDL flle can be later converted into an
NCD flle using the \xdl.exe" program. The default XDL flle generated by the JBits
program does not contain the package and speed grade of the device and therefore does
not conform to the XDL version 1.6 speciflcations [Xil00]. Therefore, the default XDL
flle generated by the JBits program cannot be converted to an NCD flle without modifl-
cations. The class SimpleRouteBISTApp takes Virtex-I device name, package and speed
grade as the command line parameters. The SimpleRouteBISTApp modifles the XDL
flle generated to re ect the package and the speed grade information in the header so
that the XDL flle can be converted into an NCD flle and viewed in the Xilinx ISE tool
suite.
4.3 Experimental Results of Routing BIST
The program takes command line arguments as listed in Table 4.4 to generate the
routing BIST conflgurations for the target FPGA. Table 4.5 indicates all the possible
values of some of the command line arguments.
98
Table 4.4: Command Line Arguments Available for the JBits Program
Command Line Arg Function
Chip Name of the Target Chip
inflle One of the null bitstreams provided by Xilinx
outflle Name of the Output bitstream
Package Package Identifler for the Target Chip
SpeedGrade Speedgrade of the Target Chip
Rec Under Test WUT Groups To Test
99
Table 4.5: Possible Values of the Command Line Arguments
Chip
xcv50
xcv300
xcv800
xcv1000
xc2s15
xc2s30
xc2s50
xc2s100
xc2s150
xc2s200
Rec Under Test
Hex Horiz East 0 1 2 3
Hex Horiz West 4 6 8 10
Hex Horiz East 5 7 9 11
Hex Vert North 0 1 2 3
Hex Vert South 4 6 8 10
Hex Vert North 5 7 9 11
Single East 0 1 2 3
Single East 4 5 6 7
Single East 5 9 10 11
Single East 12 13 14 15
Single East 16 17 18 19
Single East 20 21 22 23
Single West 0 1 2 3
Single West 4 5 6 7
Single West 8 9 10 11
Single West 12 13 14 15
Single West 16 17 18 19
Single West 20 21 22 23
Single North 0 1 2 3
Single North 4 5 6 7
Single North 8 9 10 11
Single North 12 13 14 15
Single North 16 17 18 19
Single North 20 21 22 23
Single South 0 1 2 3
Single South 4 5 6 7
Single South 8 9 10 11
Single South 12 13 14 15
Single South 16 17 18 19
Single South 20 21 22 23
100
The Virtex FPGAs can be directly conflgured with the bitstream generated by the
program developed for this thesis. The bitstream of Virtex-I FPGA is not compatible
with Spartan-II.Therefore, thebitstreamgeneratedattheoutput oftheprogramcannot
be used on Spartan-II FPGAs. An XDL flle contains symbolic names of the nets used
in the design and the conflguration of the routing and PLB resources. As Virtex-I and
Spartan-II have equivalent routing and PLB architectures, the XDL flle generated for
one architecture can be used for the other one, with a little modiflcation. Thus, the XDL
flles are more portable across the architectures than the bitstream. Hence, an XDL flle
containing a description of a routing BIST conflguration for a Virtex-I FPGA, may be
used with a little modiflcation to generate the same routing BIST conflguration for a
Spartan-IIFPGA.InordertogeneratetheroutingBISTconflgurationfortheSpartan-II,
thecommandlineargumentssuppliedtotheprogramindicatethenameandthepackage
of the target Spartan-II chip. The name of the target chip deflnes the total number of
rows and columns in the PLB array. This information is used to set the maximum limit
on the values of parameters i and j.
The program for generating the BIST conflguration for testing four Hex horizontal
resources of Spartan-II FPGA, is run as follows:
java %PATH_TO_PROGRAM%/RouteBIST.SimpleRouteBISTApp \
-xc2s50 %JBits%/data/Bitstream/XCV50/null50GCLK1.bit \
%OUTPUT_DIR%/SpartanRBIST.bit Hex_Horiz_West_4_6_8_10
The %PATH TO PROGRAM% is an operating system variable which is set to
the directory where the java package resides. The %JBits% is an operating system
variable indicates the installation directory of JBits 2.8. The operating system variable
101
%OUTPUT DIR% is the destination directory for writing the resultant bitstream and
the XDL flles. The \n" indicates the continuation of the command line. Finally, the
command line argument Hex Horiz West 4 6 8 10 instructs the program to generate
routing BIST conflguration for Hex horizontal resources indexed 4, 6, 8 and 10.
4.3.1 Partial Reconflguration and Routing BIST
Partial reconflgurations can be generated from the full conflgurations for routing
BIST using BitGen. For the BitGen tool to generate the partial bitstream containing
the frames that are difierent between the two full conflgurations, it needs the design
netlist (NCD) flle. The XDL flles generated by the JBits program are converted into
design netlist flles using the \xdl.exe" program. The partial bitstream is generated using
the command list shown in the Table 4.6. The commands on the line 1, 3, 5 and 7
generate the bitstream and the XDL flle containing full conflguration for testing hori-
zontal Hex interconnects 0, 1, 2, 3 (line 1), horizontal Hex interconnects 4, 6, 8, 10 (line
2), horizontal Hex interconnects 5, 7, 9, 11 (line 3) and vertical Hex interconnects 0,
1, 2, 3 (line 4). The XDL flles generated corresponding to the lines 1, 2, 3,and 4 are:
simpleRBIST P0.xdl, simpleRBIST P1.xdl, simpleRBIST P2.xdl, simpleRBIST P3.xdl,
respectively. The lines number 5, 6, 7 and 8 convert the XDL flles into the NCD flles.
The line 9 generates the full conflguration bitstream with which the FPGA would be
flrst conflgured. The line 10 generates the partial reconflguration bitstream (simpleR-
BIST Part P01.bit) from the bitstream with which the FPGA is currently conflgured
(simpleBIST P0.bit) and the design netlist flle (simpleRBIST P1.ncd). The lines 11
and 13 are exactly same as that of line number 9 except the full bitstream is gener-
ated for difierent test phase. After loading the full conflguration (simpleRBIST P0.bit)
102
followed by the partial conflguration (simpleRBIST Part P01.bit), the efiective conflg-
uration would be simpleRBIST P1.bit (line 11). Therefore the full conflguration for
Phase 1 (simpleRBIST P1.bit) acts as the reference conflguration for the subsequent
partial reconflguration (simpleRBIST Part P12.bit) as generated on line 12.
Table 4.6: Command Listing for Generating Partial Bitstreams for Routing BIST
Line Command
1. java RouteBIST.SimpleRouteBISTApp -xcv50
\%JBits%ndatanBitstreamnXCV50nnull50GCLK0.bit" simpleRBIST P0.bit
Hex Horiz East 0 1 2 3
2. java RouteBIST.SimpleRouteBISTApp -xcv50
\%JBits%ndatanBitstreamnXCV50 nnull50GCLK0.bit" simpleRBIST P1.bit
Hex Horiz West 4 6 8 10
3. java RouteBIST.SimpleRouteBISTApp -xcv50
\%JBits%ndatanBitstreamnXCV50nnull50GCLK0.bit" simpleRBIST P2.bit
Hex Horiz East 5 7 9 11
4. java RouteBIST.SimpleRouteBISTApp -xcv50
\%JBits%ndatanBitstreamnXCV50nnull50GCLK0.bit" simpleRBIST P3.bit
Hex Vert North 0 1 2 3
5. xdl -xdl2ncd simpleRBIST P0.xdl
6. xdl -xdl2ncd simpleRBIST P1.xdl
7. xdl -xdl2ncd simpleRBIST P2.xdl
8. xdl -xdl2ncd simpleRBIST P3.xdl
9. bitgen -d -l -b -w -g Persist:Yes simpleRBIST P0.ncd
10. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -r simpleRBIST P0.bit
simpleRBIST P1.ncd simpleRBIST Part P01.bit
11. bitgen -d -l -b -w -g Persist:Yes simpleRBIST P1.ncd
12. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -r simpleRBIST P1.bit
simpleRBIST P2.ncd simpleRBIST Part P12.bit
13. bitgen -d -l -b -w -g Persist:Yes simpleRBIST P2.ncd
14. bitgen -d -l -b -w -g Persist:Yes -g ActiveReconflg:No -r simpleRBIST P2.bit
simpleRBIST P3.ncd simpleRBIST Part P23.bit
103
TheTable4.7liststhesizeofthefullbitstreaminbytes, sizeofthepartialbitstream
in bytes and the ratio of the size of the full bitstream vs. size of the partial bitstream.
As the size of the partial reconflguration depends on the reference full conflguration
that precedes it, the phase of the reference conflguration is listed in the Previous Phase
column.
Table 4.7: Sizes of Partial Bitstreams vs. Full Bitstreams
XCV50
Full Bitstream Previous Phase Ratio
Direction Phase Size Phase Size
(bytes) (bytes)
West 1 69968 0 33043 2.11
West 2 69968 1 29823 2.34
West 3 69968 2 30123 2.32
XC2S50
Full Bitstream Previous Phase Ratio
Direction Phase Size Phase Size
(bytes) (bytes)
West 1 69978 0 33043 2.11
West 2 69978 1 29823 2.34
West 3 69978 2 30123 2.32
XCV300
Full Bitstream Previous Phase Ratio
Direction Phase Size Phase Size
(bytes) (bytes)
West 1 219055 0 59283 3.7
West 2 219055 1 55407 4.0
West 3 219055 2 57155 3.8
4.3.2 Test Phase Sequence
To determine the order in which the routing resources should be tested, we experi-
mentedwiththeroutingBISTconflgurationsforSingleandHexresources. Itisexpected
that smaller number of frames would difier between the two test phases testing resources
104
of same type (Single or Hex) as compared to the number of frames difiering between
the two test phases testing resources of difierent types. The Table 4.8 gives the size of
the full conflguration bitstream in bytes, size of partial conflguration bitstream in bytes
and the ratio of the size of full conflguration bitstream in bytes to size of the partial
conflguration bitstream in bytes. Each test phase is identifled by the type of resources
tested.
The speedup due to partial reconflguration is more observable in the case of the
FPGA featuring a large PLB array (XCV300) than the FPGA featuring a small PLB
array(XCV50). WhentwoconsecutivetestphasestesteithertheHexorSingleintercon-
nects, a small number of frames difier between the two. The size of the partial bitstream
in bytes is larger when the two phases involved test difierent routing resources.
Therefore,toreapthebenefltsofpartialreconflguration,forlargerFPGAs,allSingle
interconnectsshouldbetestedinconsecutivetestphasesandallHexinterconnectsshould
be tested in consecutive test phases. It is observed that when a test phase testing Hex
interconnectsfollowsatestphasetestingSingleinterconnects,asmallernumberofframes
difier between the two consecutive test phases. There would be only one occurrence of
this event in the test sequence if all Single interconnects are tested in consecutive test
phases and all the Hex resources are tested in consecutive test phases. This enables
us to choose the best case scenario for the partial conflguration. Therefore, the Single
interconnects should be tested flrst.
105
Table 4.8: Routing BIST and Test Phase Sequence
XCV50
Full Bitstream Previous Phase Ratio
Direction Phase Size Phase Size
(bytes) (bytes)
West Hex 69978 Hex 32238 2.17
West Single 69978 Single 19414 3.6
West Single 69978 Hex 25122 2.7
West Single 69978 Hex 28586 2.44
West Hex 69978 Single 25854 2.7
West Hex 69978 Single 27298 2.6
XCV300
Full Bitstream Previous Phase Ratio
Direction Phase Size Phase Size
(bytes) (bytes)
West Hex 219055 Hex 59283 3.69
West Single 219055 Single 46099 4.75
West Single 219055 Hex 63603 3.44
West Single 219055 Hex 63419 3.45
West Hex 219055 Single 71667 3.06
West Hex 219055 Single 49767 4.4
4.4 Calculation of the Total Number of Interconnect BIST Conflgurations
Required
We refer to the Virtex-I switch box shown in Figure 4.6. We envision four BIST test
sessions for obtaining 100% stuck-at fault coverage for interconnect testing in Virtex-I
and Spartan-II FPGAs. In the flrst test session, half of the Hex, Single and Long
interconnect resources and the associated CIPs are tested for wire stuck-at, bridging and
CIPs stuck-open faults. In the second test session, the placement of TPGs and ORAs is
interchanged and other half of the Hex, Single and Long interconnects are tested. In the
third test session the switch-box CIPs are tested for CIPs stuck-on and stuck-ofi faults.
106
24
Single East
Switch Matrix
Hex South
Outputs from PLB
8
24
Single West
Single South
24
Hex North
Hex West Hex East
(0..3)
Inputs to PLB
(4, 6, 8, 10)
(5, 7, 9, 11)
(4, 6, 8, 10)
(5, 7, 9, 11)
(4, 6, 8, 10) (5, 7, 9, 11) (4, 6, 8, 10) (5, 7, 9, 11)
(0:3)
(0:3) (0:3)
(0:23)
(0:23)
(0:23)
(0:23)
24
Single North
(0:7)
Figure 4.6: Switch Box CIP and Xilinx Interconnect Architecture
In the fourth and flnal test session the MUX CIPs are tested for stuck-on and stuck-ofi
faults.
4.4.1 Hex Interconnects
Each TPG implemented in a slice is capable of driving four Hex wires in a routing
BIST test phase. As there are 48 Hex wires emerging from the switch box, flrst six
interconnect test conflgurations are required to detect wires stuck-at and bridging faults
in half of the Hex lines. Another six interconnect test conflgurations which are generated
by interchanging the positions of TPGs and ORAs, brings the other half of the Hex lines
107
under test. These test phases also test the CIPs associated with the wires for CIP stuck-
open faults. These CIPs are tested for stuck-open and stuck-closed faults in the second
test session. Therefore, a total of twelve conflgurations are generated.
4.4.2 Single Interconnects
Each TPG implemented in a slice is capable of driving four Single wires in a routing
BIST test phase. A switch box provides connection to 96 Single wires. Therefore, to test
the Single wires for the wire-stuck-at and bridging faults and the associated CIPs for
stuck-open faults, we need a total of 24 interconnect BIST conflgurations. First twelve
routing BIST conflgurations test half of the Single wires in the PLB array. Other twelve
conflgurations test remaining half of the Single wires by interchanging position of TPGs
and ORAs. The CIPs associated with Single interconnects are tested for stuck-open and
stuck-closed faults in the second test session by applying opposite logic at both ends of
the CIP while observing the two wire segments by two difierent ORAs.
4.4.3 Switch Box CIPs
To test the switch-box CIPs for stuck-closed faults, the TPG applies opposite logic
patterns on two wire segments connected by the CIP. In the approach described in
[RFZ97], three test conflgurations were all that were required to completely test the
switch box CIPs for the CIP stuck-open as well as CIP stuck-closed faults. These three
conflgurations were suitable for testing the switch box CIPs in Xilinx XC4000 FPGAs.
The architecture of the switch box CIP as encountered in a Virtex-I FPGA is difierent
from that of its predecessors. We have deduced this architecture from JBits 2.8 docu-
mentationasprovidedbyXilinxandverifleditagainstreportfllesgeneratedbytheXDL
108
program that lists all CIPs in the Virtex-I architecture. The structure of switch box CIP
foundinVirtex-IdevicesisdepictedinFigure4.7(a). AllSingleNorthinterconnectspro-
vide connection to Single South interconnects with the same index and all Single East
interconnects can be connected to Single West interconnects bearing the same index
e.g. interconnect Single South23 can be connected to Single North23 by CIP labeled B.
Any Single West interconnect is connected to exactly one Single North interconnect and
exactly one Single South interconnect, e.g. interconnect Single West0 can be connected
to Single North23 by CIP labeled F and to Single South2 by CIP labeled I. Similarly,
Single East interconnect is connected to exactly one Single North interconnect and ex-
actly one Single South interconnect. The possible connections for Single South2 and
Single East0 are shown in Figures 4.7(b) and 4.7(c), respectively. The efiect of this in-
terconnectarchitectureisthatwemayconnectSingle West0toSingle North23butthere
is no way to connect to Single North23 and Single East0 because Single East0 provides
one connection to Single North interconnect: Single North4. In the previous architec-
tures, like XC4000, we could have bypassed the connection Single West0 to Single East0
by connecting Single West0 to Single North23 and Single North23 to Single East0. The
disadvantage of the switch box CIP shown in the Figure 4.7 is that it requires more test
conflgurations to test for stuck-ofi and stuck-on faults than those required to test switch
box CIP in XC4000.
The classes that model the CIPs and the number of unique CIPs modeled per PLB
(third column) are as tabulated in Table 4.9.
Among the four categories of CIPs, SingleToSingle are the switch box CIPs. The
rest of them are either the break-point CIPs or cross-point CIPs. In order to test the
SingleToSingle switch box CIPs, we propose a set of four test conflgurations as shown
109
A
G
H
CB
CB
CB
Single_East0
Single_North4
Accessible from Single_East0
Single_South6
Single_South2
Single_North2
Single_West0
CB
CB
CB
I
E
Single_East20
J
(b) Interconnects and CIPs Accessible from Single_South2
CB CB
CB
CB
CBCB
A
C
DE
G
HI
Single_West0
Single_North2Single_North23
Single_South2
Single_South23
Single_North4Single_North6
Single_South6Single_South4
Single_East0
F
CB
CB
CB
(a) Architecture of Virtex?I Switch Box CIP
(c) CIPs and Interconnects
B
Figure 4.7: Sections of Switch Box CIP
110
Table 4.9: Mapping of the CIPs in Various JBits Classes [Xil01d]
Interconnect Resources Connected Class #
Bi-Directional Hex Lines and Single Lines BiHexToSingle 24
Uni-Directional Hex Lines and Single Lines UniHexToSingle 32
Single Lines and Single Lines SingleToSingle 144
Output Muxes and Single Lines OutMuxToSingle 48
in the Figures 4.8 and 4.9 that would completely test the CIPs for the CIP stuck-open
and CIP stuck-closed faults. The CIPs tested in each test conflguration are as noted in
Figures 4.8(c) and 4.9(c). Thus after four test conflgurations, the CIPs A, F, H, E, I, J
are completely tested for stuck-on and stuck-ofi faults.
Thus, after four conflgurations six unique CIPs are completely tested for stuck-
at faults. As there is no overlap between the CIPs tested in say conflguration 1 and
conflguration 5, the total number of conflgurations required to test all 144 switch box
CIPs would be
144
Number of Unique CIPs Tested Per Configuration ? Number of Configurations
As it can be seen, this number reduces as CIPs are tested in parallel.
To prove that six unique CIPs are brought under test by a set of four conflgurations,
consider the conflguration shown in the Figure 4.8(a) for detecting stuck-open faults
in the CIP that connects Single West0 and Single East0 interconnect resources. The
Single North23 and Single South6 cannot be accessed from any other Single West or
Single East interconnect resource.
Now consider the conflguration shown in the Figure 4.8(a) for detecting stuck-open
faultsintheCIPbetweenSingle West1andSingle East1interconnects. TheWest-North
and East-South Single interconnects connected to Single West1 and Single East1 inter-
connects are difierent from Single North23 and Single South6 and cannot be accessed
111
from any other Single West or Single East interconnect resource. The conflguration 5 is
exactly the same as that of conflguration 1 except the CIPs now under test are: CIP be-
tweenSingle West1-Single East1forstuck-ofi,CIPbetweenSingle West1-Single North20
for stuck-on, and Single East1-Single South3 for stuck-on. Similarly, the conflguration 6
isexactlythesameasthatofconflguration2excepttheCIPsnowundertestare: CIPbe-
tweenSingle West1-Single East1forstuck-on,CIPbetweenSingle West1-Single North20
for stuck-ofi, and Single East1-Single South3 for stuck-ofi. We observe that these CIPs
are difierent from the ones tested in any of the previous conflgurations or are tested for
difierent stuck-at faults.
TheTPGsandtheORAsalongwiththeroutingrequiredtosetuptestconflguration
4 (Figure 4.9(d)) are shown in the Figure 4.10. For the sake of clarity, consider the net
WUT0. The net WUT0 is routed through the switch boxes in a zigzag pattern. It
comprises CIPs Single North-Single South, Single South wire segments in every switch
box in Row 1, Row 2 and Row 0. WUT0 passes through PLBs conflgured with an
identity function (not shown in the flgure) to cross the rows. The output multiplexers
of the PLBs conflgured as identity functions, connect WUT0 to Single wire segments
across the rows (shown by the broken lines in Row 2 and Row 0). WUT0 makes use of
PLB input mux CIPs connecting Single West wire segments to the input of the ORA.
Therefore, none of the CIPs which are under test for stuck-ofi faults are used for routing.
The number of CIPs that can be tested in a conflguration is also afiected by the
Virtex-I architecture. Consider, there are four outputs from two counter-based TPGs
implemented in a PLB. Therefore, each output of the TPG can connect to a maximum
oftwo output muxes. AccordingtoTable4.1, eachTPGoutput connectedtotwo output
muxes can drive only three East-West or North-South or East-South or West-North or
112
Open CIP
Logic 1
Logic 0
Closed CIP
Single_West0
F
CB
CB
CB
CB
CB
A
Single_North23
Single_South23
H
Single_South6
Single_North6
B
D
(a) Test Configuration 1
(b) Test Configuration 2
S?OnS?OffA
CIP
1 2
Test Phase
F S?On S?Off
S?On S?OffH
c) CIP Faults Detected
CB
CB
CB
Single_West0 A
F
H
Single_South6
Single_North23
Single_East0
Single_East0
Figure 4.8: Test Conflgurations Needed to Completely Test Switch Box CIPs
113
Open CIP
Logic 1
Logic 0
Closed CIP
I
Single_West0 Single_East22
CB
J
Single_South2
(c) Test Configuration 3
CB
Single_West0
Single_East0
CB
CB
CB
CB
Single_North2
Single_East22
Single_West22
I
E A
K
J
Single_South2
(d) Test Configuration 4
CIP
Test Phase
3 4
E
I
J
S?On
S?Off
S?Off S?On
S?On
S?Off
c) CIP Faults Detected
CB
CB
E
Single_North2
Figure 4.9: Test Conflgurations Continued...
114
TPG TPG
ORAORA
WUT0
Input Mux Lines
Single South Single South
Single South Single South
Single West Single West
ROW 2
ROW 1
ROW 0
Figure 4.10: Routing BIST Conflguration for Testing Mux CIPs
115
West-South or East-North lines. Thus, six structures shown in the Figure 4.8(b) and
Figure 4.9(c) can be tested in parallel by two TPGs. Two comparison-based ORAs
are capable of comparing six pairs of inputs and there are a considerable number of
input muxes. So ORAs do not pose a problem. However, suppose each TPG output
is driving two Single West lines when testing the group of CIPs in parallel as shown in
the Figure 4.8(a). It has to drive four Single North lines in parallel. As two output
muxes may drive only three Single North lines (Table 4.1), four TPG outputs may test
only two structures shown in the Figure 4.8(a) in parallel, as shown in the Figure 4.10.
This has an impact on the total number of conflgurations required to completely test
the switch-box CIPs.
There are 24 CIPs of each type: A, F, H, E, I and J. As each TPG output may drive
three interconnects, eight conflgurations of type 2 and type 3 are su?cient to test all 24
CIPs of type A and E for stuck-on and F, I, H, J for stuck-ofi faults. However, we would
needtwelveconflgurationsoftype1and4totestall24CIPsoftypeAandEforstuck-ofi
and F, I, H, J for stuck-on faults because of the forementioned limitation on routability
of TPG outputs. Therefore, a total of 20 conflgurations is required to completely test
switch-box CIPs for stuck-at faults. If it were not for the peculiar output multiplexers, a
Virtex-I PLB containing two slices could generate eight-bit test patterns and two CIPs
providing connection between the horizontal Single interconnects could be tested in a
conflguration. Under these circumstances only half the number of test conflgurations are
needed to fully test the switch box CIP.
116
4.4.4 MUX CIPs
There are 70 non-decoded MUX CIPs per switch box in the Virtex-I architecture.
The MUX CIPs are grouped by the resources they control or the number of inputs as
indicatedbyTable4.10. TheUniHexMux1 andUniHexMux2 consistofsixteen6:1muxes
per PLB that control the connections to the uni-directional Hex lines. BiHexMux1 and
BiHexMux2 consistofsixteen4:1muxesthatcontroltheconnectionstothebi-directional
Hex lines. Mux13to1 consists of eight 13:1 output muxes available to connect outputs
of the PLB to the routing resources. Mux16to1 consists of 16:1 muxes that control the
connectivity of BX, BY, CE, Clk and SR inputs of the slice (therefore, ten per PLB) and
two tristate bufiers in the PLB, thus totaling twelve of them. Mux28to1 consist of muxes
that control the inputs of the LUT. There are sixteen LUT inputs per PLB. The MUX
CIPs of type Mux28to1 and Mux16to1 provide connectivity to a total of twenty four and
sixteen Single interconnects, respectively. The architecture of Mux28to1 and Mux16to1
can be represented by the Figure 4.11. As the Mux28to1 and Mux16to1 share the Single
interconnects, the MUX CIPs belonging to these types can be tested in parallel as shown
in the Figure 4.11. The group of MUX CIPs that can be tested in parallel are given in
the Table 4.11 and the complete connectivity is listed in Appendix D.
Testing a group of MUX CIPs in parallel requires a total number of conflgurations
equal to the width of the largest MUX CIP [RPFZ99]. However, this approach relies on
applying test patterns only through the selected MUX CIP inputs. This method ignores
the problem of what is referred to as \invisible logic" in [SKCA96] and thus complete
stuck-at fault coverage is not achieved. This condition is depicted in the Figure 4.12.
117
PLB
I1
O1
I3
I2
O2
Figure 4.11: Testing MUX CIPs in Parallel [RPFZ99]
V1
V0
1
0
S=0 V1
S=0
V0
s?a?1
1
X
x
0
X
Figure 4.12: Problem of Undetected Faults Due to Invisible Logic in MUX [AS01]
118
Table 4.10: MUX CIPs in Virtex-I Architecture and their Functions [Xil01d]
Name Input MUX CIPs/PLB Function
UniHexMux1 6 16 Usedtodrivetheuni-directionalHexlines
UniHexMux2
BiHexMux1 4 16 Used to drive the bi-directional Hex lines
BiHexMux2
Mux13to1 13 8 Used to produce the output bus in the
CLB
Mux16to1 16 12 Used as the inputs to various control cir-
cuits in the CLB
Mux28to1 24 16 Used as the inputs to the LUTs
UniMux8to1 8 2 Used to select the input to the tristate
bufiers
Table 4.11: MUX CIP Groups Tested in Parallel
Type MUX CIPs
Mux16to1
S0BX, S0BY
S1BX, S1BY
S0CE, S1CE
S0SR, S1SR
S0Clk, S1Clk
TS0, TS1
Mux28to1
S0F1, S0G1, S1F4, S1G4
S0F2, S0G2, S1F3, S1G3
S0F3, S0G3, S1F2, S1G2
S0F4, S0G4, S1F1, S1G1
The approach described in [SKCA96] and [AS01] obtains complete stuck-at fault
coverage for the MUX CIPs. The invisible logic due to unselected inputs of the MUX
CIP is not ignored. The stuck-ofi faults in the selected inputs are detected by applying
both, 0 and 1, to the selected input. At the same time, the unselected logic is conflgured
to apply an opposite logic values to the unselected inputs. In case of non-decoded
mulitiplexers, at least one unselected input needs to be conflgured to apply opposite
logic value as shown in Table 4.12.
119
Table 4.12: Testing MUX CIPs for Stuck-On and Stuck-Ofi Faults [SWHA98]
Inputs Conflguration Bits
I0 I1 I2 I3 C3 C2 C1 C0
0 1 0 0 0 0 0 1
1 0 0 0
1 0 0 0 0 0 1 0
0 1 0 0
0 0 0 1 0 1 0 0
0 0 1 0
0 0 1 0 1 0 0 0
0 0 0 1
Iftestedinparallel, thefourgroupsof Mux28to1 withtwentyfourinputswouldtake
96 conflgurations. Complete testing of the flve groups of Mux16to1 with sixteen inputs
would require 80 conflgurations. The 16:1 MUX CIPs controlling the tristate bufiers,
UniMux8to1, would take 16 conflgurations. Mux13to1 contain the eight output MUX
thatwouldrequire13conflgurationstocoverallthestuck-atfaultsastheymaybetested
in parallel. The MUX CIPs belonging to typesBiHexMux1, BiHexMux2, UniHexMux1
and UniHexMux2 donotshareroutingresourcesandthereforecannotbetestedtogether.
In order to test BiHexMux1, BiHexMux2 would require a total of 8 conflgurations and
UniHexMux1, UniHexMux2 would require a total of 12 conflgurations. Thus, a total of
96+80+16+12+13+8=225 conflgurations is required.
4.5 Generating Conflgurations for Switch-Box CIPs
The internalRoute method is modifled to generate the conflgurations to test the
switch box CIPs as described above. The nets such as WUT0 in the Figure 4.10 have
120
a source and a destination. The speciflc connections such as Single West[0] to Sin-
gle East[0] are specifled by declaring the corresponding routing resources as the inter-
mediate pins in the internalRoute method. The intermediate pins are then routed using
automatic routing.
Creating single-point nets with JBits is a non-trivial task. The set() call in the
class com.xilinx.JBits.Virtex.JBits is utilized. This allows us to create a net without
specifying the source pin or the destination pin.
/*Header files*/
import com.xilinx.JBits.Virtex.Bits.*;
void internalRoute(int plbrow, int plbcol, int width)
throws ConfigurationException, RouteException {
jbits.set(plbrow, plbcol,
SingleToSingle.SINGLE_WEST0_TO_SINGLE_EAST0,
SingleToSingle.ON);
jbits.set(plbrow, plbcol,
SingleToSingle.SINGLE_NORTH23_TO_SINGLE_SOUTH23,
SingleToSingle.ON);
jbits.set(plbrow, plbcol,
SingleToSingle.SINGLE_SOUTH6_TO_SINGLE_NORTH6,
SingleToSingle.ON)
}
WhentheTPGandORAcoresaredeflnedintheJBitsprogramandwhentherouter
places the design, it automatically includes the CIPs set or reset in the internalRoute
121
method in the appropriate partially placed nets. In this case, these CIPs would become
part of the output net of the TPG.
4.6 Conclusion
Generating the logic BIST and interconnect BIST conflgurations for Virtex-I and
Spartan-II requires pursuing difierent tools from that used while testing its predecessors
likeXC4000andSpartanseriesFPGAs. Thepreviousattemptsattargetinginterconnect
resources of Virtex-I FPGAs using JBits have either ignored the problem of testing
all the interconnect resources and switch boxes [FH03] or have incorrectly addressed
it [SMG01]. The current implementation of the BIST RTPCores generates six BIST
conflgurations required to test all of the Hex resources, twenty-four BIST conflgurations
required to test all the Single resources for the wire stuck-at faults and the associated
CIPsforCIPstuck-openfault. TheRTPCorescanbeextendedfortestingtheswitch-box
CIPs for CIP stuck-closed faults. The Virtex-I FPGA can be directly conflgured with
the BIST conflguration bitstreams. The bitstreams targeting Spartan-II and Virtex-E
FPGAs may be compiled from the XDL flles after modifying the headers that re ect the
target chip. The mechanism provided in the Virtex-I routing architecture for connecting
two interconnects or transporting signals in and out of the PLB is either using the
MUX CIPs or the switch box CIPs. Complete testing of switch box CIPs requires 20
conflgurations. Testing the MUX CIPs requires one conflguration per input to guarantee
that invisible logic is also tested for stuck-on faults. Therefore, MUX CIPs alone would
require224testconflgurations. Totesttheinterconnectresourcesforwirestuck-at, open
wire faults and pair-wise shorts, requires another 36 conflgurations.
122
Chapter 5
Summery and Future Work
The reduction in the size of the memory required to store the partial logic BIST
conflgurations as well as routing BIST conflgurations was discussed. Smaller size BIST
conflgurationsleadstofasterconflgurationtimes. Theadvantageofpartialconflguration
memory readback over full conflguration memory readback is predicted. Partial conflg-
uration memory readback may still lag behind the integrated ORA scan-chain method
of retrieving the ORA Pass/Fail results. However, there are no additional conflgura-
tions required and there are no additional logic and routing resources needed for partial
conflguration memory readback on Virtex-I and Spartan-II series FPGAs. A program
was developed to automatically generate routing BIST conflguration bitstreams using
JBits API for Virtex-I FPGAs. The program outputs XDL flles enumerating connec-
tions between the logic and interconnect resources for the routing BIST conflgurations.
The XDL flles may be further processed to generate partial routing BIST conflgurations
for Spartan-II FPGAs. The program implements BIST RTPCores to generate the in-
terconnect BIST conflgurations using JBits API. The interconnect BIST conflguration
bitstreams test Single and Hex interconnects for wire stuck-at faults and associated CIPs
for stuck-open faults. The WUT pairs falling at the boundary of adjacent WUT groups
e.g. WUT pairs indexed (3,4), (7,8) etc., are not tested for bridging faults. A set of four
test sessions was identifled to completely test switch box CIPs for stuck-at faults in the
Virtex-I and Spartan-II architecture. A method was proposed to deflne single point nets
in the JBits program. This method would be useful while algorithmically generating test
123
conflgurations that sensitize stuck-on faults in the multiplexer CIPs, break-point CIPs
and cross-point CIPs.
The FPGA architecture has an impact on the number of logic and interconnect
BIST conflgurations required to achieve 100% stuck-at fault coverage. The Virtex-I
PLBcontainseightoutputmuxestoconnecttwelvePLBsignalstotheroutingresources,
therebypreventingusfromroutingsignalstotwocomparison-basedORAsfromtwoslices
in the PLB conflgured as BUTs, at the same time in the logic BIST test phase. This
reduces the number of outputs from the BUTs that can be compared by an ORA. As
onlyoneslicecanbetestedinatestphase, thetotalnumberoflogicBISTconflgurations
required increases.
To develop interconnect and logic BIST conflgurations for FPGAs using CAD tools
poses di?culties because the conventional CAD tools are written with the designer in
mind. Therefore, test engineers resort to flnding new ways to use the same tool that is
available to the designers. The XDL intermediate representation has been successfully
demonstrated in the past as a useful tool to generate interconnect and logic test conflg-
urations for XC4000 and Spartan-I FPGAs. The attractiveness of this approach lies in
the simplicity and e?ciency, especially for the logic and interconnect BIST for FPGAs
where the structure shows high regularity. XDL represents the BIST circuitry in human
readable form, converts easily into Xilinx netlist format so that it may be viewed in
the graphical viewers such as FPGA Editor, and reliably generates bitstreams for the
target chip with tools such as BitGen. The disadvantage of this approach, as felt by the
author, is the maintainability and rigidness of the code that generates the XDL, which
is actually inherent among with the programs manipulating the textual flles. Even this
di?culty is dwarfed as the logic and interconnect BIST structure is very regular e.g.,
124
in routing BIST, iterating XDL circuit description for H-STAR or V-STAR testing one
row or column over the entire PLB array, generates XDL circuit description for H-STAR
or V-STAR instantiated over entire PLB array. The more serious problem is increasing
di?culty in managing the symbolic names in XDL. The reporting facility ofiered by
XDL enlists all the symbolic names of the CIPs and the logic resources. The report flles
typically run in megabytes even for a small size Virtex-I FPGA. The symbolic names
of the logic resources and CIPs are manageable once they are put into a database or a
spread-sheet.
The designers and test engineers frequently need to use the utilities such as design
rule checks, delay calculations and checking the veracity of the design specifled in higher
level languages, as ofiered by FPGA Editor. Therefore, FPGA Editor is an essential
tool while developing the logic and interconnect test conflgurations. The FPGA Editor
currently accepts design input only in Xilinx NCD format, which is interconvertible with
XDL. It is felt that XDL support should be a criterion while evaluating CAD tools
for generating logic and interconnect BIST for Xilinx FPGAs. The JBits API can be
efiectively utilized if it is viewed as a tool for specifying designs in high level languages
and generating XDL. However, the new version of JBits API for Virtex-II does not
have built-in support for XDL like its previous versions. This is not to say that one
cannot be built. The XDL can be viewed simply as another output format supported
by the JBits application along with bitstream. The FPGA place and route tools like
VPR have been modifled to take advantage of JBits API to generate design in XDL
format [BR97] [XRT03]. JBits API for Virtex-I is capable of enumerating the circuit
specifled in higher level languages in XDL format. The functionality of the new version
125
of JBits API needs to be evaluated as to how it may be used to generate the output of
the design specifled in high level languages in XDL format.
Routing capabilities of the CAD tool is another important aspect to consider, while
evaluating the tool for generating logic and interconnect BIST. While generating inter-
connect and routing BIST conflgurations for an FPGA, the test engineer must maintain
control over the routing. The primary consideration for a CAD tool would be if it would
allow the test engineer to specify the exact routing resources to use while routing source
and sink. For the test engineer to achieve 100% stuck-at fault coverage of an FPGA,
the CAD tool must also allow creation of single-point nets. This facility is hard to flnd
among conventional design entry and place-and-route tools, but it might be useful for
the test engineers. Therefore, the routing capabilities of the new version of JBits API
for Virtex-II should be probed for the satisfaction of above mentioned requirements.
126
Bibliography
[AES01] M.Abramovici,J.E.Emmert,andC.Stroud. RovingSTARs: AnIntegrated
Approach to On-Line Testing, Diagnosis and Fault Tolerance for FPGAs in
Adaptive Computing Systems. In Proceedings of Third NASA/DoD Work-
shop on Evolvable Hardware, pages 73{92, July 2001.
[AKS93] V. D. Agrawal, C. R. Kime, and K. K. Saluja. A Tutorial on Built-In Self
Test-Part 1 Principles. In IEEE Design and Test of Computers, 10(1):73{82,
March 1993.
[AR94] M. J. Alexander and G. Robins. High-performance Routing for Field Pro-
grammable Gate Arrays. In Proceedings of International ASIC Conference
and Exhibit, pages 138{141, September 1994.
[AS01] M. Abramovici and C. Stroud. BIST-Based Test and Diagnosis of FPGA
Logic Blocks. In IEEE Transactions on Very Large Scale Integration (VLSI)
Systems, 9(1):159{172, February 2001.
[BFRV92] S. Brown, R. Francis, J. Rose, and Z. Vranesic. Field-Programmable Gate
Arrays. Kluwer Academic Publishers, 1992.
[BR96] S. Brown and J. Rose. FPGA and CPLD Architectures: A Tutorial. In
IEEE Design & Test of Computers, 13(2):42{57, Summer 1996.
[BR97] V.BetzandJ.Rose. VPR:ANewPacking,Placement,andRoutingToolfor
FPGA Research. In International Workshop on Field-Programmable Logic
and Applications, pages 213{222, September 1997.
[CHW00] T. J. Callahan, J. R. Hauser, and J. Wawrzynek. The Garp Architecture
and C Compiler. In IEEE Computer, 33(4):62{69, April 2000.
[DeH00] A. DeHon. The Density Advantage of Conflgurable Computing. In IEEE
Computer, 33(4):41{49, April 2000.
[DW99] A. DeHon and J. Wawrzynek. Reconflgurable Computing: What, Why,
and Implications for Design Automation. In Proceedings of IEEE Design
Automation Conference, pages 610{615, June 1999.
[ESSA00] J. Emmert, C. Stroud, B. Skaggs, and M. Abramovici. Dynamic Fault Toler-
ance in FPGAs via Partial Reconflguration. In Proceedings of IEEE Sympo-
sium on Field-Programmable Custom Computing Machines, pages 165{174,
April 2000.
127
[FH03] D. A. Fernandes and I. G. Harris. Application of Built in Self-Test for
Interconnect Testing of FPGAs. In Proceedings of IEEE International Test
Conference, pages 1248{1257, October 2003.
[GL99] S. Guccione and D. Levi. Run-time parameterizable cores. In Proceedings
of the ACM/SIGDA International Symposium on Field Programmable Gate
Arrays, pages 252{259, February 1999.
[GLS99] S. Guccione, D. Levi, and P. Sundararajan. JBits: Java Based Interface for
Reconflgurable Computing. In Proceedings of Military and Aerospace Appli-
cations of Programmable Logic Devices (MAPLD) International Conference,
September 1999.
[GSB+00] S. C. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, and R. R.
Taylor. Piperench: A Reconflgurable Architecture and Compiler. In IEEE
Computer, 33(4):70{77, April 2000.
[HGWS99] C. Hamilton, G. Gibson, S. Wijesuriya, and C. Stroud. Enhanced BIST-
Based Diagnosis of FPGA via Boundary Scan Access. In Proceedings of
IEEE VLSI Test Symposium, pages 413{418, April 1999.
[HLS98] S. Hauck, Z. Li, and E. Schwabe. Conflguration Compression for Xilinx
6200 FPGA. In Proceedings of IEEE Symposium on FPGAs for Custom
Computing Machines, pages 138{146, April 1998.
[IEE90] Test Access Port and Boundary Scan Architecture IEEE Standard P1149.1-
1990, February 1990.
[Jav04] TheJava(tm)Tutorial.URL:http://java.sun.com/docs/books/tutorial/java/con-
cepts/ index.html, 2004.
[Jay01] R. Jayaraman. Physical Design for FPGAs. In Proceedings of ACM Inter-
national Symposium on Physical Design, pages 214{221, April 2001.
[Kel00] E. Keller. JRoute: A Run-Time Routing API for FPGA Hardware. In
Parallel and Distributed Processing, pages 874{881, May 2000.
[Lat02] Lattice Semiconductor. ORCA Series 3 FPGAs Programmable I/O Cell
(PIC): Logic, Clocking, Routing, and External Device Interface, (AP99-
042FPGA), January 2002.
[LG98] D. Levi and S. A. Guccione. BoardScope: A Debug Tool for Reconflgurable
Systems. In Proceedings of SPIE Conflgurable Computing: Technology and
Applications, pages 239{246, November 1998.
[MG00] S. McMillan and S. A. Guccione. Partial Run-Time Reconflguration Us-
ing JRTR. In Proceedings of the 10th International Workshop on Field-
Programmable Logic and Applications, pages 352{360, August 2000.
128
[Mil94] W. Miller. Real World Applications for Field Programmable Gate Array
Devices-An Overview. In Proceedings of IEEE West Coast Electronics Show
and Convention (WESCON), pages 27{29, September 1994.
[RFZ97] M. Renovell, J Figueras, and Y. Zorian. Test of RAM-Based fpga: Method-
ology and Application to the Interconnect. In Proceedings of IEEE VLSI
Test Symposium, pages 230{237, April 1997.
[RPFZ99] M. Renovell, J. M. Portal, J. Figueras, and Y. Zorian. Testing the Conflg-
urable Interconnect/Logic interface of SRAM-based FPGA?s. In Proceedings
of IEEE International Conference on Design Automation and Test in Eu-
rope, pages 618{622, March 1999.
[RZ00] M. Renovell and Y. Zorian. Difierent Experiments in Test Generation for
Xilinx FPGAs. In Proceedings of IEEE International Test Conference, pages
854{862, October 2000.
[SCKA96] C. Stroud, P. Chen, S. Konala, and M. Abramovici. Evaluation of FPGA
Resources for Built-In Self-Test of Programmable Logic Blocks. In Proceed-
ings of ACM International Symposium on Field-Programmable Gate Arrays,
pages 107{113, February 1996.
[SHGS04] C.Stroud,J.Harris,S.Garimella,andJ.Sunwoo. Built-InSelf-TestConflgu-
rations for Atmel FPGAs Using Macro Generation Language. In Proceedings
of IEEE North Atlantic Test Workshop, pages 88{90, May 2004.
[SKCA96] C. Stroud, S. Konala, P. Chen, and M. Abramovici. Built-In Self-Test for
Programmable Logic Blocks in FPGAs(Finally, A Free Lunch: BIST With-
out Overhead!). In Proceedings of IEEE VLSI Test Symposium, pages 387{
392, April 1996.
[SKCA97] C. Stroud, S. Konala, P. Chen, and M. Abramovici. BIST-based Diagnostics
for FPGA Logic Blocks. In Proceedings of IEEE International Test Confer-
ence, pages 539{547, October 1997.
[SLS03] C. Stroud, K. N. Leach, and T. A. Slaughter. BIST for Xilinx 4000 and
Spartan Series FPGAs: A Case Study. In Proceedings of IEEE International
Test Conference, pages 1258{1267, October 2003.
[SMG01] P. Sundararajan, S. McMillan, and S. A. Guccione. Testing FPGA Devices
Using JBits. In Military and Aerospace Applications of Programmable De-
vices and Technologies Conference (MAPLD), September 2001.
[SNLA02] C.Stroud, J.Nall, M.Lashinsky, andM.Abramovici. BIST-BasedDiagnosis
of FPGA Interconnect. In Proceedings of IEEE International Test Confer-
ence, pages 618{627, October 2002.
129
[SS99] A. Steininger and C. Scherrer. On the Necessity of On-line BIST in Safety-
Critical Applications. In Proceedings of Fault-Tolerant Computing Sympo-
sium, pages 208{215, June 1999.
[SSGH04] C. Stroud, J. Sunwoo, S. Garimella, and J. Harris. Built-In Self-Test for
System-on-Chip: A Case Study. In Proceedings of IEEE International Test
Conference, October 2004.
[Str02] C. Stroud. A Designer?s Guide to Built-In Self-Test. Kluwer Academic
Publishers, 2002.
[SWHA98] C. Stroud, S. Wijesuriya, C. Hamilton, and M. Abramovici. Built-In Self-
Test of FPGA Interconnect. In Proceedings of IEEE International Test Con-
ference, pages 404{411, October 1998.
[SXCT00] X. Sun, J. Xu, B. Chan, and P. Trouborst. Novel Technique for Built-In
Self-Test of FPGA Interconnects. In Proceedings of IEEE International Test
Conference, pages 795{803, October 2000.
[TM03] M. B. Tahoori and S. Mitra. Automatic Conflguration Generation of FPGA
Interconnect Testing. In Proceedings of IEEE VLSI Test Symposium, pages
134{139, April 2003.
[Xil] Xilinx Inc. Libraries Guide. ISE 6.1i.
[Xil99] Xilinx Inc. XC4000E and XC4000X Series Field Programmable Gate Arrays
Product Speciflcation, (v1.6), May 1999.
[Xil00] Xilinx Inc. Xilinx Design Language Reference (v1.6), July 2000.
[Xil01a] Xilinx Inc. JBits Tutorial, June 2001.
[Xil01b] Xilinx Inc. Virtex 2.5 V Field Programmable Gate Arrays, (DS003-1 v2.5),
April 2001.
[Xil01c] Xilinx Inc. Virtex Device Simulator (VirtexDS), June 2001.
[Xil01d] Xilinx Inc. Xilinx JBits SDK Version 2.8 for Virtex, June 2001.
[Xil02a] Xilinx Inc. Conflguration and Readback of Spartan-II FPGAs Using Bound-
ary Scan, (XAPP188 v2.1), March 2002.
[Xil02b] Xilinx Inc. Conflguration and Readback of Virtex FPGAs Using (JTAG)
Boundary Scan, (XAPP139 v1.4), April 2002.
[Xil02c] Xilinx Inc. Two Flows for Partial Reconflguration: Module Based or Small
Bit Manipulations, XAPP290 (v1.0), May 2002.
130
[Xil02d] Xilinx Inc. Virtex FPGA Series Conflguration and Readback, (XAPP138
v2.7), July 2002.
[Xil02e] Xilinx Inc. Virtex-II Pro Platform FPGAs: Introduction and Overview,
(DS031-1 v1.9), September 2002.
[Xil03a] XilinxInc. Spartan-3 1.2V FPGA Family:Complete Data Sheet, (DS099 |),
October 2003.
[Xil03b] Xilinx Inc. Spartan-II 2.5V FPGA Family: Complete Data Sheet, (DS001-1
v2.3), September 2003.
[Xil03c] Xilinx Inc. Virtex-II Platform FPGAs: Introduction and Overview, (DS031-
1 v2.0), August 2003.
[Xil03d] XilinxInc. Virtex Series Conflguration Architecture Users Guide, (XAPP151
v1.6), March 2003.
[Xil04] Xilinx Inc. Virtex-II Platform FPGA User Guide, (UG002 v1.7), February
2004.
[XRT03] W. Xu, R. Ramanarayanan, and R. Tessier. Adaptive Fault Recovery for
Networked Reconflgurable Systems. In Field-Programmable Custom Com-
puting Machines, pages 143{152, April 2003.
131
Appendices
132
Appendix A
Steps in Writing Parent RTPCore
1. Create a class that inherits its functionality from class RTPCore in the package
com.xilinx.JBits.CoreTemplate,
2. Deflne the methods to calculate the width and height of the RTPCore in terms of
PLBs and/or slices. The methods are declared static, therefore the object does not
have to be created in order to determine the width and height of the RTPCore.
The following methods help while placing the RTPCore in the PLB array:
? calcHeight(): This method, when called, returns the height of the RTPCore
in the multiple of the granularity set by calcHeightGran() method. The gran-
ularity can be a Logic Element (such as  ip- op, LUT), a slice or a PLB.
? calcWidth(): This method, when called, returns the width of the RTPCore in
the multiple of the granularity set by calcWidthGran(). The granularity can
be a Logic Element (such as  ip- op, LUT), a slice or a PLB.
? calcHeightGran(): Sets the granularity or the unit of height of the RTPCore
in terms of Logic Element (such as  ip- op, LUT), slice or PLB.
? calcWidthGran(): Sets the granularity or the unit of width of the RTPCore
in terms of Logic Element (such as  ip- op, LUT), slice or PLB.
3. Initialize the data members inherited from the parent class and this class. This is
achieved by calling following methods in the parent class RTPCore:
? setHeight()
133
? setWidth()
? setHeightGran()
? setWidthGran()
4. Deflne the implement() method. The implement() method of the parent RTPCore
has the following steps:
(a) Deflne the logical nets and buses required to connect the child RTPCore.
The methods newNet(NetName) or newBus(BusName, width) provided in
the class com.xilinx.JBits.CoreTemplate.RTPCore deflne the logical nets and
buses, respectively. Thenetsandbusesthusdeflned, arepassedasparameters
to the constructors of the child RTPCore class. It is the responsibility of
the implement() method of the child RTPCore classes to deflne the mapping
between these logical nets and buses deflned at the upper level of hierarchy
to the physical resources of FPGA.
(b) Instantiate the child RTPCore by calling the constructor of its class and pass
the logical nets and buses as parameters.
(c) Call the addChild(Child Core) method from parent class RTPCore in the
package com.xilinx.CoreTemplate. This call establishes parent-child relation-
ship.
(d) Specify the placement of the child RTPCore in the PLB array. This is ac-
complished either by passing a static integer deflned in classOfiset from pack-
agecom.xilinx.JBits.CoreTemplate.RTPCoretothe addChild() methodoras-
signing absolute ofisets to the child RTPCore before calling the implement()
method of the child RTPCore class.
134
(e) Call the implement() method of the child RTPCore class.
(f) Specify routing between the child RTPCores if necessary.
(g) Call the static method connect(Bus) in the class Bitstream belonging to pack-
age com.xilinx.JBits.CoreTemplate.RTPCore. The logical buses are mapped
into physical wires with the call to this method. This call can only be made
once the RTPCore is implemented i.e. complete routing and placement of the
RTPCore have been specifled. Unless this call is successful, the logical buses
specifled in the JBits program will not map into the physical wires and will
not re ect in the XDL flle generated at the output of the JBits program.
The most important function of the implement() method in the class SimpleRoute-
BIST is to conflgure the routes between the TPGCounterCore and ORACore. This is
done using following steps:
1. CreateinstanceofclassTPGCounterCorebycallingtheconstructor. Theconstruc-
tor expects object of class CounterProperties. Therefore, we assume this object is
created and all the logical buses are appropriately assigned before the constructor
is called. Assume the object is identifled by instance of class TPGCounterCore.
2. Call the addChild(string, instance of class TPGCounterCore) method inherited
from the parent class com.xilinx.JBits.CoreTemplate.RTPCore. Thus parent core-
child core hierarchy is established.
135
        /* Set App vars */
        int WIDTH = 4;
        /* Create Top level net and bus */
        Net clk = newNet("clk");
        Bus coutSrc = newBus("coutSrc",WIDTH/2);
        Bus coutSrc1 = newBus("coutSrc1",WIDTH/2);
        Clock clock = new Clock("clock", clk);
        clock.implement(1); // Use GCLK1
        TPGProperties cp = new TPGProperties();
        cp.setIn_clk(clk);
        cp.setIn_ce(Net.NoConnect);
        cp.setIn_rst(Net.NoConnect);
        cp.setOut_dout(coutSrc);
        /* Create Counter Core */
        TPGCounter tpg = new TPGCounter("counter",cp);
        addChild(tpg);
public final void implement() throws CoreException, 
ConfigurationException, RouteException {
Figure A.1: JBits Program for Instantiating a Counter core
3. Obtainanobjectofclasscom.xilinx.JBits.CoreTemplate.RTPCore.Ofisetusing ge-
tRelativeOfiset() method of instance of class TPGCounterCore. Let?s say the ob-
ject is identifled by ofiset of TPGCounterCore. Set the horizontal and vertical ofi-
sets according to arguments i and j, using methods setHorOfiset() and setVerOfi-
set() respectively of the object ofiset of TPGCounterCore.
4. Call the implement() method of the instance of class TPGCounterCore.
This process is depicted in the Figure A.1 and A.2.
136
}
        Offset CntrOffset = tpg.getRelativeOffset();
        CntrOffset.setHorOffset(Gran.SLICE, clbCol*2+clbSlice);
        CntrOffset.setVerOffset(Gran.CLB, clbRow);
        tpg.implement();
        TPGProperties cp1 = new TPGProperties();
        cp1.setIn_clk(clk);
        cp1.setIn_ce(Net.NoConnect);
        cp1.setIn_rst(Net.NoConnect);
        cp1.setOut_dout(coutSrc1);
        /* Create Second Counter Core */
        TPGCounter tpg1 = new TPGCounter("counter1",cp1);
        addChild(tpg1);
        Offset CntrOffset1 = tpg1.getRelativeOffset();
        CntrOffset1.setHorOffset(Gran.SLICE, clbCol*2+clbSlice+1);
        CntrOffset1.setVerOffset(Gran.CLB, clbRow);
        tpg1.implement();
        /* Connect Top level nets */
        Bitstream.connect(clk);
        Bitstream.connect(coutSrc);
        Bitstream.connect(coutSrc1);
Figure A.2: JBits Program Continued...
137
Appendix B
Steps in Writing Child RTPCores
1. Create a class that inherits its functionality from class RTPCore. Deflne input
and output ports of the RTPCore using methods newInputPort (Port Name, Sig-
nal Type) and newOutputPort (Port Name, Signal Type).
2. Steps 2 and 3 are the same as that of the parent RTPCore.
3. The implement() method of a child RTPCore has following steps:
(a) Deflne a logical internal signal for every input and output port.
(b) AssignthelogicalinternalsignaltothecorrespondingportusingsetIntSig (Sig-
nal Type) method deflned in com.xilinx.JBits.CoreTemplate.RTPCore.Port
class.
(c) Deflne a logical external signal for every input and output port.
(d) Initialize the logical external signals by getExtSig(Signal Type) method de-
flnedintheclasscom.xilinx.JBits.CoreTemplate.RTPCore.Port. Atthispoint
the input or output port deflned by the child RTPCore is completely initial-
ized. All the changes made to the internal signal can be guaranteed to re ect
on the external signal.
(e) Write the user code using the logical internal signals.
(f) If necessary, use the setPin(Pin Type) method deflned in the class Port de-
flned in package com.xilinx.JBit.CoreTemplate.RTPCore to explicitly set a
particular PLB resource to the port.
138
(g) Call static method Bitstream.connect(Signal Type) for the internal signals. If
this call is successful then the logical internal signal and the routing would be
re ected in the XDL flles generated at the output of the application. If this
call is unsuccessful, the router throws an exception condition and bails out of
the JBits program indicating it is unable to route the particular signal.
139
Appendix C
Complete Program Source
The TPGCounterCore is exactly same as that of two-bit counter found in the class
Counter found in the package com.xilinx.JBits.Virtex.RTPCore.Basic. The source code
isprovidedwiththeJBitssoftware. ThesourcecodefortheORACoreisderivedfromthe
sourcecodeoftheclassLUT5foundinthepackagecom.xilinx.JBits.Virtex.RTPCore.Basic.
We cannot publish the modifled source code in this thesis because the original source
code is copyrighted by Xilinx. The primary purpose of the class LUT5 is to implement
combinational logic function with flve variables. The input and output ports deflned to
the LUT5 are as listed in Table C.1. The four nets comprising of the four variables of
the combinational logic function are connected to both, addr0 and addr1 input ports.
The net representing the flfth variable of the combinational logic function is connected
to muxsel input net of the core. The two LUTs in a slice are conflgured with the logic
expression given to the constructor of the class.
Table C.1: Input and Output ports of LUT5
Port Width (in Bits) Direction Function
clk 1 IN Clock
addr0 4 IN Input Net to G-LUT
addr1 4 IN Input Net to F-LUT
muxsel 1 IN Input Net to Select the Output of F-LUT or
G-LUT
dout 1 OUT Unregistered Output
dout reg 1 OUT registered Output
ce 1 IN Enable/Disable
In order to derive the functionality of comparison-based ORA, we completely dis-
abled the mux select input, prevented the connection between the output of the F5 mux
140
A1
A2
A3
A4
A1
A2
A3
A4
Virtex?I Slice
G LUT
F LUT
X FF
Y FF
MUX_SEL
ADDR0[3]
ADDR0[2]
ADDR0[1]
ADDR0[0]
ADDR1[0]
ADDR1[1]
ADDR1[2]
ADDR1[3]
DOUT
DOUT_REG
Figure C.1: Conflguration of LUT5 RTPCore
141
and X  ip- op and connected output of G LUT to output Y of the slice as shown in the
Figure 4.3. This enabled us to feedback the output of the G LUT to the input F LUT.
The actual nets connected to the addr0 and addr1 port of the ORACore, are deflned by
the parent RTPCore, SimpleRouteBIST.
142
/*
 *
 * Created on March 5, 2004, 2:16 AM
 */
/**
 *
 * @author  newalaa
 */
 */
package RouteBIST;
import com.xilinx.util.JBitsCommandLineApp;
import java.io.*;
 * SimpleRouteBISTApp.java
 * essentially acts as a wrapper to instantiation of SimpleRouteBIST core.
 * 
/* Simple application for experimenting with routing BIST and JBits JRoute2 API..
import com.xilinx.JBits.CoreTemplate.*;
import com.xilinx.JRoute2.Virtex.ResourceDB.CenterWires;
import com.xilinx.JBits.Virtex.Devices;
import com.xilinx.JBits.Virtex.JBits;
import com.xilinx.Netlist.SYM.*;
import com.xilinx.Netlist.XDL.*;
import com.xilinx.JRoute2.Virtex.JRoute;
import com.xilinx.JRoute2.Virtex.RouteException;
import com.xilinx.JBits.Virtex.ConfigurationException;
public class SimpleRouteBISTApp extends JBitsCommandLineApp {
    /** Creates a new instance of SimpleRouteBIST */
    public SimpleRouteBISTApp() {    }
Figure C.2: JBits Program for User Interaction and Populating the PLB Array
143
    public void run() {
        try {
            jbits = getJBits();
            jroute = new JRoute(jbits);
            Bitstream.setVirtex(jbits, jroute);
            CoreOutput.generateBitstream(true);
            CoreOutput.generateSYM(true);
            CoreOutput.generateXDL(true);
            /* Generate Outputs */
            FileOutputStream fOut=new FileOutputStream("simpleRBIST.ctf");
            ObjectOutput out = new ObjectOutputStream(fOut);      
      
            int row = 0;
            int col = 0;
            int slice = 0;
            System.out.println(" The maximum columns "+MAX_COL);
            System.out.println(" The maximum rows "+MAX_ROW);
            
            for (int i=0; i<MAX_ROW;i++) {
                for (int j=0; j<MAX_COL;j++) {        
                    if (j%6==0) {//j has reached where there is new ORA
                        
 
                         j+=6;
                    }
            int MAX_COL = Devices.getClbColumns(com.xilinx.JBits.Virtex.Devices.XCV50);
            int MAX_ROW = Devices.getClbRows(com.xilinx.JBits.Virtex.Devices.XCV50);
new SimpleRouteBIST("simpleBIST"+i+"_"+j,i,j,slice);
                    SimpleRouteBIST simpleBIST =
                    /* Define orgin */
                    Offset offset = simpleBIST.getRelativeOffset();
                    offset.setVerOffset(Gran.CLB,0);
                    offset.setHorOffset(Gran.SLICE,0);
                    /* Implement this design */
                    simpleBIST.implement();
                    out.writeObject(simpleBIST);
                }
            }
            CoreOutput.writeXDLFile("simpleRBIST.xdl");
            CoreOutput.writeSymFile("simpleRBIST.sym");
            out.flush();
            out.close();
        }
Figure C.3: JBits Program Continued...
144
        } catch (Exception e) {
        e.printStackTrace();
        System.exit(?1);
        }
   } /* end main() */ 
    
    protected JRoute jroute;
    protected JBits jbits;
       public String Speed;
   public String Package;
}
Figure C.4: JBits Program Continued...
145
/*
 * ORAProperties.java
 *
 * Created on September 14, 2004, 7:24 PM
 */
package RouteBIST;
import com.xilinx.JBits.CoreTemplate.*;
/**
 *
 * @author  Administrator
 */
public class ORAProperties extends CoreParameters {
/* Inputs. */
private Net clk = null;
private Bus addr0 = null;
private Bus addr1 = null;
private Net lutOut = null;
private Net ce = null;
/* Outputs */
private Net dout = null;
private Net doutReg = null;
/*
***************************************************
* Accessor Methods
***************************************************
*/
/* Inputs. */
public final Net getIn_clk() {
   return clk;
}
Figure C.5: JBits Program Continued...
146
public final void setIn_clk(Net net) {
   clk = net;
}
public final Bus getIn_addr0() {
   return addr0;
}
public final void setIn_addr0(Bus bus) {
   addr0 = bus;
}
public final Bus getIn_addr1() {
   return addr1;
}
public final void setIn_addr1(Bus bus) {
   addr1 = bus;
}
public final Net getIn_ce() {
   return ce;
}
public final void setIn_ce(Net net) {
   ce = net;
}
/*************************************************/
/*************************************************/
/* Outputs. */
public final Net getOut_dout() {
   return dout;
}
public final void setOut_dout(Net net) {
   dout = net;
}
public final Net getOut_doutReg() {
   return doutReg;
}
Figure C.6: JBits Program Continued...
147
public final void setOut_doutReg(Net net) {
   doutReg = net;
}
public final void setOut_lutOut(Net net) {
    lutOut = net;
}
public final Net getOut_lutOut() {
    return lutOut;
}
/**
**  This method checks to make sure that
**  the parameters that were set are valid
**  for the specified core.
**
**  @param core  The RTPCore these parameters will be used with.
**
**  @exception com.xilinx.JBits.CoreTemplate.CoreParameterException
*/
public void checkParameters(RTPCore core) throws CoreParameterException {
   String ret = "";
   if (clk == null) {
      ret = ret.concat(": No clock connection specified");
   }
   if (addr0 == null) {
      ret = ret.concat(": No addr0 connection specified");
   }
   if (addr1 == null) {
      ret = ret.concat(": No addr1 connection specified");
   }
   if (ce == null) {
      ret = ret.concat(": No ce connection specified");
   }
   if (dout == null) {
      ret = ret.concat(": No dout connection specified");
   }
Figure C.7: JBits Program Continued...
148
   if (doutReg == null) {
      ret = ret.concat(": No doutReg connection specified");
   }
   if (addr0.getWidth() != 4) {
      ret = ret.concat(": addr0 length equal to " + addr0.getWidth() +
         ", should be equal to 4");
   }
   if (addr1.getWidth() != 2) {
      ret = ret.concat(": addr1 length equal to " + addr1.getWidth() +
         ", should be equal to 4");
   }
   if (!ret.equals("")) {
      throw new CoreParameterException(core, ret);
   }
} /* end checkParameters() */
}
Figure C.8: JBits Program Continued...
149
/*
 * SimpleRouteBIST.java
 *
 * Created on March 14, 2004, 10:30 PM
 */
/**
 *
 * @author  newalaa
 */
/**
   2 slices are configured with 2?bit counter cores, wires are connected using manual route,
   ORA core for pass/fail results.  For the time being results are observed through BoardScope...
 *
    Tests four Hex Lines ??
    CenterWires.Hex_Horiz_East[0],
 *  CenterWires.Hex_Horiz_East[1],
    CenterWires.Hex_Horiz_East[2],
 *  CenterWires.Hex_Horiz_East[3]   Tested with indices 0,5,7,9 also
 *
 *  Special Note:  This version uses TPG that occupies 2 slices.  The XQ, YQ outputs
 *  of the slices are used to drive the exhaustive test patterns on the wires.  The outputs are
 *
 *  The route specified in this program (method internalroute() ) has been observed in
 *  FPGA Editor.
 *
 *  then compared with the ORA.
 */
package RouteBIST;
Figure C.9: JBits Program Continued...
150
import com.xilinx.JBits.CoreTemplate.*;
import com.xilinx.JRoute2.Virtex.JRoute;
import com.xilinx.JBits.Virtex.JBits;
import com.xilinx.JRoute2.Virtex.ResourceDB.CenterWires;
import com.xilinx.JRoute2.Virtex.ResourceFactory;
import com.xilinx.JRoute2.Virtex.Segment;
import com.xilinx.JRoute2.Virtex.JBitsConnector;
import com.xilinx.Netlist.XDL.*;
import com.xilinx.Netlist.XDL.XDLexception;
import  com.xilinx.JBits.Virtex.ConfigurationException;
import  com.xilinx.JRoute2.Virtex.RouteException;
import com.xilinx.JBits.Virtex.RTPCore.Basic.Clock;
import com.xilinx.JBits.Virtex.RTPCore.Basic.Counter;
import com.xilinx.JBits.Virtex.RTPCore.Basic.CounterProperties;
import com.xilinx.JBits.Virtex.RTPCore.Basic.LUT5;
import com.xilinx.JBits.Virtex.RTPCore.Basic.LUT5Properties;
import com.xilinx.JBits.Virtex.RTPCore.TestGeneration.TestInputVector;
import com.xilinx.JBits.Virtex.RTPCore.Basic.Register;
import com.xilinx.JBits.Virtex.Expr;
import com.xilinx.JBits.Virtex.Util;
import com.xilinx.JBits.Virtex.Bits.*;
import com.xilinx.DeviceSimulator.Virtex.RouteTracer;
import com.xilinx.DeviceSimulator.Virtex.RouteTree;
import com.xilinx.DeviceSimulator.Virtex.SimulationException;
Figure C.10: JBits Program Continued...
151
/*
***************************************************
* Required Methods
***************************************************
*/
/**
 * Return the vertical granularity of this core.
 */
public static int calcHeightGran() {
   return Gran.CLB;
   } /* end getHeightGran() */
/**
 * Return the horizontal granularity of this core.
 */
public static int calcWidthGran() {
   return Gran.SLICE;
   } /* end getWidthGran() */
    /** Creates a new instance of SimpleRouteBIST */
public SimpleRouteBIST(String name, int row, int col, int slice)
   throws CoreException {
    super(name);
       // Set up a print stream object.
    clbRow = row;
    clbCol = col;
    clbSlice = slice;
    setHeight(calcHeight());
    setWidth(calcWidth());
    setHeightGran(calcHeightGran());
    setWidthGran(calcWidthGran());
}/* end constructor */
Figure C.11: JBits Program Continued...
152
/**
 * Calculates width for this core.  This method is
 * static so an object does not have to be created
 * in order to figure out the width of this object
 * with the given properties.
 *
 */
public static int calcWidth() {
   return 2;
   } /* end calcWidth() */
/*
***************************************************
* Convention Methods
***************************************************
*/
/**
 * Calculates height for this core.  This method is
 * static so an object does not have to be created
 * in order to figure out the height of this object
 * with the given properties.
 *
 *
 */
public static int calcHeight() {
   return 1;
   } /* end calcHeight() */
Figure C.12: JBits Program Continued...
153
        WutPin[2] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[2]);
        WutPin[3] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[3]);
    }
    int i;
    CoutSrcPin = new Pin[w];
    OraPin = new Pin[2*w];
    WutPin = new Pin[w];
    OutMuxPin = new Pin[w];
    int slice = 0;
    JRoute jroute = Bitstream.getVirtexRouter();
    for (i=0;i+3<w;i++) {
        CoutSrcPin[0] = new Pin(Pin.CLB, row, col, CenterWires.Slice_XQ[slice]);
private void internalRoute(int w, int row, int col)
throws RouteException, ConfigurationException{
        CoutSrcPin[1] = new Pin(Pin.CLB, row, col, CenterWires.Slice_XQ[slice+1]);
        CoutSrcPin[2] = new Pin(Pin.CLB, row, col, CenterWires.Slice_YQ[slice]);
        CoutSrcPin[3] = new Pin(Pin.CLB, row, col, CenterWires.Slice_YQ[slice+1]);
        OraPin[0] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG1[slice]);
        OraPin[1] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG2[slice]);
        OraPin[2] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG3[slice]);
        OraPin[3] = new Pin(Pin.CLB, row, col+6, CenterWires.SliceG4[slice]);
        WutPin[0] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[0]);      
        WutPin[1] = new Pin(Pin.CLB, row, col, CenterWires.Hex_Horiz_East[1]);      
    for (i=0;i<w;i++) {
        System.out.println(" Now Routing ...."+CoutSrcPin[i]+"... and "+WutPin[i]);
        jroute.route(CoutSrcPin[i],WutPin[i]);
        jroute.route(WutPin[i], OraPin[i]);
        //jroute.route(WutPin[i], OraPin[i+4]);
    }
Figure C.13: JBits Program Continued...
154
}
/**
 *  This method instantiates the core.
 */
public final void implemen()
    //java.io.PrintStream ps = System.out;
    //ResourceFactory rf = ResourceFactory.getResourceFactory(Bitstream.getVirtex() );
    /* Set App vars */
    int WIDTH = 4;
    int CE_WIDTH = 1;
    int DEPTH = 16;
    /* Create Top level net and bus */
    Net clk = newNet("clk");
    Bus coutSrc = newBus("coutSrc",WIDTH/2);
    Bus ffd = newBus("ffd", WIDTH/2);
    Bus coutSrc1 = newBus("coutSrc1",WIDTH/2);
    Bus ffd1 = newBus("ffd1", WIDTH/2);
    Bus oraSink = newBus("oraSink",WIDTH);
    Clock clock = new Clock("clock", clk);
    clock.implement(1); // Use GCLK1
    TPGProperties cp = new TPGProperties();
    cp.setIn_clk(clk);
    cp.setIn_ce(Net.NoConnect);
    cp.setIn_rst(Net.NoConnect);
    cp.setOut_dout(coutSrc);
    cp.setOut_ffd(ffd);
throws CoreException, ConfigurationException,
RouteException {
Figure C.14: JBits Program Continued...
155
    tpg.implement();
    TPGProperties cp1 = new TPGProperties();
    cp1.setIn_clk(clk);
    cp1.setIn_ce(Net.NoConnect);
    cp1.setIn_rst(Net.NoConnect);
    cp1.setOut_dout(coutSrc1);
    cp1.setOut_ffd(ffd1);
    /* Create Second Counter Core */
    TPGCounter tpg1 = new TPGCounter("counter1",cp1);
    addChild(tpg1);
    Offset CntrOffset1 = tpg1.getRelativeOffset();
    CntrOffset1.setHorOffset(Gran.SLICE, clbCol*2+clbSlice+1);
    CntrOffset1.setVerOffset(Gran.CLB, clbRow);
    tpg1.implement();
    /*
     for (int i=0;i<WIDTH;i++) {
        if (i<WIDTH/2){
            oraSink.setNet(i, ffd.getNet(i));
        }
        if (i>=WIDTH/2){
            oraSink.setNet(i, ffd1.getNet(i?(WIDTH/2)));
        }
    }
    */
Figure C.15: JBits Program Continued...
156
    //Bus OraSink = newBus("oraSink",WIDTH);
    Net passFailOut = newNet("passFailOut");        //Pass/Fail output from X flip?flop
    Net lutOut = newNet("lutOut");      //Output of the G LUT
    Net passFailIn = newNet("passFailIn");          //Logical input of X flip?flop
    oraFB.setNet(0, passFailOut);
    oraFB.setNet(1, lutOut);
    ORAProperties op = new ORAProperties();
    op.setIn_addr0(oraSink);        //addr port of G LUT
    op.setIn_addr1(oraFB);          //addr port of F LUT
    op.setOut_lutOut(lutOut);      //Y output of Slice
    op.setIn_clk(clk);
    op.setIn_ce(Net.NoConnect);
    op.setOut_dout(passFailIn);
    op.setOut_doutReg(passFailOut);
    ORACore BISTORACore= new ORACore("ORA00",op);
    addChild(BISTORACore);
    Offset ORAOffset = BISTORACore.getRelativeOffset();
    ORAOffset.setHorOffset(Gran.SLICE, (clbCol+6)*2+clbSlice);
    ORAOffset.setVerOffset(Gran.CLB, clbRow);
    Bus oraFB = newBus("OraFB",2);      
    internalRoute(WIDTH, clbRow, clbCol);
    /* Connect Top level nets */
    Bitstream.connect(clk);
    Bitstream.connect(coutSrc);
    Bitstream.connect(ffd);
    Bitstream.connect(coutSrc1);
    Bitstream.connect(ffd1);
    Bitstream.connect(oraFB);
} /* end implement() */
   private int clbRow, clbCol, clbSlice;
   private Pin [] CoutSrcPin;
   private Pin [] WutPin;
   private Pin [] OraPin;
   private Pin [] OutMuxPin;
};  /* end class SimpleRouteBIST*/
    BISTORACore.implement(Expr.G_LUT("(G1 ^ G2) | (G3 ^ G4)"),
   Expr.F_LUT("~( ((~F1)&F2) | (F1&(~F2)) )"));
Figure C.16: JBits Program Continued...
157
Appendix D
Complete List of Connections Between the Mux CIPs
Table D.1: Mux CIPs Mux28to1 and Connecting Single Interconnects
Wires Number of Distinct Wires Connected
S0F1, S0G1,
S1F4, S1G4
S0F3, S0G3,
S1F2, S1G2
S0F4, S0G4,
S1F1, S1G1
S0F2, S0G2,
S1F3, S1G3
SINGLE EAST 4 7 7 6
SINGLE WEST 7 6 5 6
SINGLE SOUTH 6 5 6 7
SINGLE NORTH 7 6 6 5
158
Table D.2: MUX CIPs Mux16to1 and Connecting Interconnects
Wires Number of Distinct Wires Connected
S0BX,
S0BY
S0CE,
S1CE
S1BX,
S1BY
S0SR,
S1SR
S0Clk,
S1Clk
TS0,
TS1
HEX HORIZ EAST 0 0 0 0 0 0
HEX HORIZ WEST 0 0 0 0 0 0
HEX VERT NORTH 0 1 0 1 1 0
HEX VERT SOUTH 0 0 0 0 0 0
SINGLE EAST 4 2 4 2 1 0
SINGLE WEST 4 2 4 2 1 0
SINGLE SOUTH 4 3 4 3 2 0
SINGLE NORTH 4 3 4 3 2 0
HEX VERT A0 0 0 0 0 0 1
HEX VERT B0 0 0 0 0 0 1
HEX VERT C0 0 0 0 0 0 1
HEX VERT D0 0 0 0 0 0 1
HEX VERT M0 0 0 0 0 0 1
HEX VERT A1 0 0 0 1 0 0
HEX VERT B1 0 0 0 1 0 0
HEX VERT C1 0 0 0 1 0 0
HEX VERT D1 0 0 0 1 0 0
HEX VERT M1 0 0 0 1 0 0
HEX VERT A2 0 0 0 0 1 0
HEX VERT B2 0 0 0 0 1 0
HEX VERT C2 0 0 0 0 1 0
HEX VERT D2 0 0 0 0 1 0
HEX VERT M2 0 0 0 0 1 0
HEX VERT A3 0 1 0 0 0 0
HEX VERT B3 0 1 0 0 0 0
HEX VERT C3 0 1 0 0 0 0
HEX VERT D3 0 1 0 0 0 0
HEX VERT M3 0 1 0 0 0 0
GCLK0 0 0 0 0 1 0
GCLK1 0 0 0 0 1 0
GCLK2 0 0 0 0 1 0
GCLK3 0 0 0 0 1 0
159