Built-In Self Test of Configurable Memory Resources in Field Programmable Gate Arrays Except where reference is made to the work of others, the work described in this thesis is my own or was done in collaboration with my advisory committee. This thesis does not include proprietary or classi ed information. Daniel Milton Certi cate of Approval: Victor P. Nelson Professor Electrical and Computer Engineering Charles E. Stroud, Chair Professor Electrical and Computer Engineering Thaddeus A. Roppel Associate Professor Electrical and Computer Engineering George T. Flowers Interim Dean Graduate School Built-In Self Test of Configurable Memory Resources in Field Programmable Gate Arrays Daniel Milton A Thesis Submitted to the Graduate Faculty of Auburn University in Partial Ful llment of the Requirements for the Degree of Master of Science Auburn, Alabama December 17, 2007 Built-In Self Test of Configurable Memory Resources in Field Programmable Gate Arrays Daniel Milton Permission is granted to Auburn University to make copies of this thesis at its discretion, upon the request of individuals or institutions and at their expense. The author reserves all publication rights. Signature of Author Date of Graduation iii Vita Daniel Milton, son of Thomas and Diane Milton, was born in Birmingham, Alabama on April 25, 1983. In the Fall of 2005, he graduated Summa cum Laude with a Bachelor of Electrical Engineering majoring in Computer Engineering. Upon graduation he immediately began working on his Master of Science degree at Auburn University under the advisement of Dr. Charles E. Stroud. iv Thesis Abstract Built-In Self Test of Configurable Memory Resources in Field Programmable Gate Arrays Daniel Milton Master of Science, December 17, 2007 (B.E.E., Auburn University, 2005) 151 Typed Pages Directed by Charles E. Stroud Testing embedded memory resources in Field Programmable Gate Arrays (FPGAs) is di cult because the collective signal fan-in and fan-out is much greater than the available external I/O. A testing approach is needed that can test all of the memory resources in parallel without out being limited to external I/O. Built-in Self Test (BIST) is a testing method that incorporates test circuitry around the devices under test (DUT). The pro- grammable nature of FPGAs allows the BIST circuitry to have no performance and size overhead because the BIST circuitry can be downloaded to the FPGA while the system is o ine. Once o ine, resources inside the FPGA can be tested and the results retrieved. If the FPGA is found to be fault-free then the system function can be downloaded again and brought back online. BIST for embedded memory resources in Virtex 4 FPGAs is developed and test con- gurations are generated for all Virtex 4 devices. Twenty- ve total BIST con gurations are developed to test memories operating in RAM, FIFO, ECC, and cascade modes. To test each operating mode, a hardware design language (HDL) based test pattern generator v (TPG) is developed and then incorporated into an algorithmically placed BIST template that contains two TPGs, DUTs, and output response analyzers (ORAs) to observe DUT outputs. Partial recon guration is used to reduce both con guration bitstream storage and test time. A total speed-up factor of 12 is observed when utilizing partial recon guration. vi Acknowledgments I would like to thank Dr. Stroud for his support and advise during my tenure at Auburn University during both my undergraduate and graduate studies. I would also like to thank Dr. Nelson and Dr. Roppel for their contribution to this thesis by serving on my graduate committee. To my research colleagues, Sachin, Sudheer, Bobby, Lee, Mustafa, Brad, David, and Noah, I am grateful for all of your help and assistance throughout my research. Lastly, I would like to acknowledge my parents, as their ever present support has always inspired my to ful ll my potential. vii Style manual or journal used Journal of Approximation Theory (together with the style known as \aums"). Bibliography follows IEEE Transactions. Computer software used The document preparation package TEX (speci cally LATEX) together with the departmental style- le aums.sty. Plots were generated using Microsoft Excel and gures were drawn in Microsoft Visio. viii Table of Contents List of Figures xi List of Tables xiv 1 Introduction 1 1.1 Built-In Self Test (BIST) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Embedded Memory Resources in FPGAs . . . . . . . . . . . . . . . . . . . 5 1.4 Thesis Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2 Background 8 2.1 Introduction to FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2 Virtex 4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2.1 Virtex 4 PLBs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.2 Virtex 4 BRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.3 Virtex 4 FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.2.4 Virtex 4 CAD Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.2.5 Virtex 4 Boundary Scan . . . . . . . . . . . . . . . . . . . . . . . . . 27 2.3 SRAM Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 SRAM Fault Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 March Tests for Single-Port Memories . . . . . . . . . . . . . . . . . 32 2.3.3 March Tests for Dual-Port Memories . . . . . . . . . . . . . . . . . 35 2.4 Overview of BIST for FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.4.1 BIST for BRAMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 2.5 Thesis Restatement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 Virtex 4 Block RAM BIST Implementation 44 3.1 Virtex 4 BRAM BIST Architecture . . . . . . . . . . . . . . . . . . . . . . 44 3.2 TPG Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.3 BRAM BIST Con gurations . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.4 Running BIST Con gurations . . . . . . . . . . . . . . . . . . . . . . . . . . 61 3.5 ORA Results Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.6 BIST Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 3.7 BRAM BIST Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 ix 4 Virtex 4 FIFO BIST Implementation 69 4.1 Virtex 4 FIFO BIST Architecture . . . . . . . . . . . . . . . . . . . . . . . 69 4.2 FIFO TPG Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.3 FIFO BIST Con guration Development . . . . . . . . . . . . . . . . . . . . 74 4.4 Running FIFO BIST Con gurations . . . . . . . . . . . . . . . . . . . . . . 77 4.5 FIFO BIST Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 5 Virtex 4 ECC and Cascade BIST Implementation 82 5.1 ECC and Cascade BIST Architecture . . . . . . . . . . . . . . . . . . . . . 82 5.2 ECC BRAM BIST Development . . . . . . . . . . . . . . . . . . . . . . . . 85 5.3 Cascade TPG Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4 BIST Con gurations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.5 Running BIST Con gurations . . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.6 BIST Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 6 Summary and Conclusion 98 6.1 Summary of Virtex 4 BIST Results . . . . . . . . . . . . . . . . . . . . . . . 98 6.2 Application to Virtex 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 Bibliography 101 Appendices 104 A MMTPG VHDL Source code 105 B FIFOTPG VHDL Source code 123 C ECCTPG VHDL Source code 127 D CASTPG VHDL Source code 131 E List of Acronyms 135 x List of Figures 1.1 General BIST Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 FPGA Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Basic PLB Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1 Con guration Bits in FPGAs . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 A Five Transistor Con guration SRAM Cell [1] . . . . . . . . . . . . . . . . 11 2.3 A Six Transistor Con guration SRAM Cell [1] . . . . . . . . . . . . . . . . . 11 2.4 Basic Virtex 4 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Virtex 4 PLB [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.6 Simpli ed Virtex 4 Slice Diagram [2] . . . . . . . . . . . . . . . . . . . . . . 16 2.7 Virtex 4 BRAM [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.8 ECC BRAM Architecture [2] . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.9 BRAM Cascade Operational Diagram [2] . . . . . . . . . . . . . . . . . . . 23 2.10 Virtex 4 FIFO Implementation [2] . . . . . . . . . . . . . . . . . . . . . . . 25 2.11 Virtex 4 FIFO [2] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 2.12 TAP Controller State Diagram [3] . . . . . . . . . . . . . . . . . . . . . . . 29 2.13 SRAM Memory Functional Model . . . . . . . . . . . . . . . . . . . . . . . 30 2.14 Structural Model of a two-port SRAM cell . . . . . . . . . . . . . . . . . . . 31 2.15 March LR with 4-bit BDS [4] . . . . . . . . . . . . . . . . . . . . . . . . . . 35 2.16 March s2pf [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 xi 2.17 March d2pf [5] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.18 A General Comparison Based BIST Architecture . . . . . . . . . . . . . . . 38 2.19 A Circular Comparison Based BIST Architecture . . . . . . . . . . . . . . . 39 2.20 Comparison Based ORA with Shift Chain . . . . . . . . . . . . . . . . . . . 40 2.21 Comparison Based ORA without a Shift Chain . . . . . . . . . . . . . . . . 40 3.1 BRAM BIST Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 3.2 BRAM ORA Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.3 ORA Placement and Comparison in SX devices . . . . . . . . . . . . . . . . 47 3.4 ORA Placement and Comparison in FX devices . . . . . . . . . . . . . . . . 48 3.5 MMTPG Control Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . 51 3.6 FX12 BRAM BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.7 LX25 BRAM BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.8 V4BRAMTPG Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.9 V4BRAMBIST Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.10 V4BRAMMOD Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.11 Example BIST program execution . . . . . . . . . . . . . . . . . . . . . . . 59 3.12 Partial BRAM BIST in LX60 . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.13 LX60 BRAM BIST Speed-up factors . . . . . . . . . . . . . . . . . . . . . . 64 3.14 Timing Analysis (Slowest / Fastest ) . . . . . . . . . . . . . . . . . . . . . . 66 3.15 Timing Analysis per BRAM BIST Con guration . . . . . . . . . . . . . . . 67 4.1 FIFO ORA Placement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 4.2 FULL Flag Transition Timing . . . . . . . . . . . . . . . . . . . . . . . . . . 73 xii 4.3 FIFOTPG Control Register . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 4.4 LX60 FIFO BIST Con guration . . . . . . . . . . . . . . . . . . . . . . . . 75 4.5 FIFO BIST Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6 LX60 FIFO Speed-up Factors . . . . . . . . . . . . . . . . . . . . . . . . . . 81 4.7 Routed FX12 FIFO BIST Con guration . . . . . . . . . . . . . . . . . . . . 81 5.1 ECC and Cascade BIST Architecture . . . . . . . . . . . . . . . . . . . . . 84 5.2 Expected Cascade ORA Failure Locations . . . . . . . . . . . . . . . . . . . 85 5.3 Parity Tree TPG [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.4 Cascade BRAM Operational Diagram . . . . . . . . . . . . . . . . . . . . . 90 5.5 FX12 ECC BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.6 FX12 Cascade BIST . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.7 LX60 ECC BIST Speed-up Factors . . . . . . . . . . . . . . . . . . . . . . . 95 5.8 LX60 CAS BIST Speed-up Factors . . . . . . . . . . . . . . . . . . . . . . . 95 5.9 ECC BIST Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.10 Cascade BIST Timing Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 97 6.1 BIST Speed-up for LX60 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 xiii List of Tables 2.1 Overview of Resources in Virtex 4 Family Devices [2] . . . . . . . . . . . . . 13 2.2 Summary of Virtex 2 and 4 BRAM Aspect Ratios . . . . . . . . . . . . . . 17 2.3 BRAM Signal Descriptions [2] . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.4 BRAM Con guration Options [2] . . . . . . . . . . . . . . . . . . . . . . . . 20 2.5 ECC Status Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.6 FIFO Con guration Options [2] . . . . . . . . . . . . . . . . . . . . . . . . 24 2.7 Virtex 4 Status Flag Clock Cycle Latency [2] . . . . . . . . . . . . . . . . . 26 2.8 FIFO Port Signal Descriptions [2] . . . . . . . . . . . . . . . . . . . . . . . 26 2.9 Summary of Xilinx Design Tools . . . . . . . . . . . . . . . . . . . . . . . . 28 2.10 Virtex 4 BSCAN Module Access Commands [3] . . . . . . . . . . . . . . . . 29 2.11 Common SRAM Fault Types . . . . . . . . . . . . . . . . . . . . . . . . . . 32 2.12 March Test Notation Descriptions . . . . . . . . . . . . . . . . . . . . . . . 33 2.13 Common March Tests for Single Port Memories . . . . . . . . . . . . . . . . 34 2.14 Background Data Sequence for 8-bits . . . . . . . . . . . . . . . . . . . . . . 35 2.15 Virtex 2 BRAM Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Virtex 4 TPG March Test Algorithms . . . . . . . . . . . . . . . . . . . . . 50 3.2 BRAM BIST Con guration Detail . . . . . . . . . . . . . . . . . . . . . . . 52 3.3 BRAM Initialization Values . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.4 XDL Argument Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.5 BRAM BIST Execution Detail . . . . . . . . . . . . . . . . . . . . . . . . . 62 xiv 3.6 Summary of LX60 BRAM BIST Download Size and Test Times . . . . . . . 65 4.1 Summary of Virtex 4 FIFO Con gurations . . . . . . . . . . . . . . . . . . 76 4.2 Summary of LX60 FIFO BIST Download Size and Test Times . . . . . . . 80 5.1 ECC BRAM BIST Con guration Settings . . . . . . . . . . . . . . . . . . . 88 5.2 Summary of Cascade BIST Con guration Settings . . . . . . . . . . . . . . 91 5.3 Summary of LX60 CAS BIST Download Size and Test Times . . . . . . . . 94 5.4 Summary of LX60 ECC BIST Download Size and Test Times . . . . . . . . 94 xv Chapter 1 Introduction Moore?s Law states that the complexity of Integrated Circuits (ICs) tends to double every 24 months [7]. While this empirical observation was rst observed in 1965, more recently, the International Technology Roadmap for Semiconductors has predicted Moore?s Law will persist at least until 2016 based upon industry data and forecasts [8]. One of the earliest microprocessors, the Intel 4004, had approximately 2300 transistors [9]. For contrast, recent state-of-the-art Field Programmable Gate Arrays (FPGAs) may contain more than one billion transistors [10]. Clearly, one can see the exponential growth in transistor count over the last four decades. The continuing problem with these higher density ICs is that developing tests for such complex devices is becoming progressively more di cult with each generation of new ICs. Recent IC fabrication technology has led to larger chip sizes with smaller feature sizes, but also has introduced new types of defects and subsequently increased the probability of defects [6]. Developing good tests for ICs is becoming a major factor in the cost of producing working silicon ICs. One of the reasons for high costs is the disparity between the number of Input/Output (I/O) pins in packaged ICs verses the number of transistors in the package. Using external ?bed-of-nails? test equipment, the costs of testing are generally attributed to the xed cost of the test equipment and the speed at which the actual tests can be performed. IC manufactures generally employ tests that minimize the cost of the test equipment while minimizing the device test time [11]. Improvements in Design for Test (DFT) methodology have produced scan design and Built-in Self Test (BIST) [6][11]. 1 1.1 Built-In Self Test (BIST) DFT is a common practice in the VLSI design process. Traditionally, DFT for ICs has involved using scan ip- ops. Scan ip- ops reduce the amount of time needed to generate tests for sequential circuits because they can operate as a shift register to shift in a test vector from an external Automated Test Equipment (ATE). When shifting a new test vector into a scan chain, the output response from the previous vector is shifted out, which is then can be compared to an expected value. Scan design eliminates the possibility of being unable to initialize a ip- op to a desired value. However, for many large VLSI circuits, the number of test vectors that must be applied to achieve a necessary fault coverage percentage has also increased. Coupled with a growing amount of test vectors and ATE not being able to test at speed for newer ICs, a di erent approach for applying test vectors was introduced [11]. BIST is a DFT technique that allows the Device Under Test (DUT) or a portion of the DUT to tell the tester if it is fully functional. BIST implementations include a Test Pattern Generator (TPG) which drives a DUT or many DUTs in parallel. The outputs of the DUTs are then analyzed by an Output Response Analyzer (ORA) which determines the correctness of the DUT. The general BIST architecture can be seen in Figure 1.1. BIST solves two of the major issues with ATE based testing. First, the BIST circuitry is implemented in the chip itself, and therefore it can perform at speed. Secondly, since the BIST circuitry generates test vectors, the external tester merely needs to tell the device to perform BIST and then report whether the device is faulty or not [6]. 2 Figure 1.1: General BIST Architecture Traditionally, BIST has been used to test logic and memory resources in VLSI cir- cuits [6]. One caveat of BIST is that it usually implies an overhead in terms of in- creased chip area which may in turn reduce the yield of the chip [6]. One may wonder if it is possible to implement BIST with no overhead. BIST implementations proposed in [12][13][14][15][16][17][18][19][20][21][22] suggest BIST for FPGAs can incur no overhead penalty in terms of speed and area. 1.2 FPGAs An FPGA can be described as \an array of logic blocks that can be programmably interconnected to realize di erent designs" [23]. A general FPGA architecture, as seen in Figure 1.2, contains I/O cells that facilitate signals entering and exiting the device. Programmable Logic Blocks (PLBs) perform the necessary digital logic functions and the programmable routing resources direct signals both between PLBs and the overall signal path from inputs to outputs. PLBs vary among manufactures, but a simple PLB can contain Look-up Tables (LUTs), ip- ops/latches, and multiplexers. A simple PLB architecture is illustrated in Figure 3 Figure 1.2: FPGA Architecture Figure 1.3: Basic PLB Architecture 4 1.3. LUTs can be con gured as the truth table for a given logical function and they can implement small distributed random access memories (RAMs). The interconnection or routing between PLBs is realized by con guring Programmable Interconnect Points (PIPs) to create signal paths from wire segments inside the FPGA [23]. FPGA programming techniques vary among manufacturers, and include static RAM (SRAM), fuse/anti-ifuse, and oating gate methods. SRAM is the most popular program- ming technique for advanced FPGAs because SRAM based designs allow high density and fast con guration [23]. SRAM based FPGAs contain a con guration memory that, when written to, speci es the operation of PLBS, I/O cells, and routing resources. Besides logic blocks and routing resources, the con guration memory may con gure other embedded re- sources in the FPGA such as embedded RAMs, multipliers, and digital signal processors (DSP). The inclusion of additional embedded resources allows for higher PLB utilization because the embedded resources can o oad much of the functionality of digital systems. 1.3 Embedded Memory Resources in FPGAs FPGA manufacturers have incorporated embedded memories for many product genera- tions [10][24]. Previously, storing large quantities of data internally required an appreciable amount of PLB resources to implement a memory resource. While most designers frequently utilize the memory resources as regular RAM modules, several FPGAs now allow system designers to con gure memory resources as multi-port RAMs and First-In-First-Out (FIFO) modules. The advent of dedicated memory resources extends the potential of FPGAs to act as a programmable System-on-Chip (SoC). However, as advantageous as memory resources 5 are, there are testability concerns that must be addressed. Traditionally, testing RAMs re- quires applying test patterns that read and write data in such an order that within a set of faults being tested, any such fault will be sensitized if present. These testing algorithms are generally known as march tests. There have been many march tests developed for detecting certain types of faults within memory resources [25][26]. Unfortunately, applying march tests to embedded memories is complicated. ATE is generally used to verify production ICs. However, ATEs usually provide test patterns to the external I/O pins of an IC. A better testing approach is needed because most FPGAs contain many memory resources whose collective fan-in and fan-out are much greater than the available I/O pins [10][24]. BIST is an ideal solution for testing embedded resources in FPGAs. A BIST approach o ers more exibility for testing than an external ATE because test pattern generation is not limited by the external I/O availability. Previous implementations have shown that this approach is quite feasible and e cient for testing embedded memories [27][17][28]. 1.4 Thesis Statement The goal of this thesis is to develop a BIST architecture and BIST con gurations for testing embedded memories in Xilinx Virtex 4 FPGAs. This BIST architecture will address minimizing both the test time required and the memory required to store the BIST con gurations. This minimization is achieved by designing the BIST architecture to e ciently utilize an FPGA con guration technique known as partial recon guration. The remainder of this thesis is organized as follows: In Chapter 2, additional background information on BIST for FPGAs will be given along with details toward testing SRAMs in general and a detailed overview of the Virtex 4 FPGA architecture. In Chapter 3, a 6 BIST architecture for Virtex 4 embedded memories will be presented with applications to memory resources con gured to operate in a basic RAM mode of operation. Chapters 4 and 5 will apply the presented BIST architecture to memory resources con gured to operate as FIFOs and Error-Correcting Code (ECC) RAMs, respectively. Chapter 6 will summarize the work presented in this thesis and include ideas for future research in this eld. 7 Chapter 2 Background This chapter presents an overview of di erent BIST techniques for FPGAs found in the literature. The architecture of the Xilinx Virtex 4 FPGA will also be presented with an emphasis on the dedicated memory resources referred to as block RAMs (BRAMs). Background on SRAM testing will be presented that predominately focuses on march tests and the associated fault models for which they are designed. 2.1 Introduction to FPGAs FPGAs di er from Application Speci c Integrated Circuits (ASICs) because they are made of logic and routing resources capable of implementing most digital systems while not being explicitly fabricated for a speci c task. For example, one might buy a microproces- sor, an ASIC, as part of a digital system while another designer might use an FPGA to implement a microprocessor and other supporting functions all in the same IC. Further- more, another designer might use the exact same FPGA in a DSP application. Clearly, the exibility of FPGAs is its main advantage. Another advantage is the reduced non-recurring engineering costs by eliminating the need to design and fabricate custom ASICs. The main disadvantage, however, of FPGAs is higher chip area, higher power consumption, and lower operational speeds as compared to a custom ASIC which is due to extra programming cir- cuitry overhead. Also, FPGAs typically are more expensive than traditional ASICs so they are usually relegated to low volume or prototype designs [23]. 8 In order for an FPGA to realize a digital system, it must be programmed [23]. Several programming technologies exist, but SRAM based FPGAs are currently the most common and popular. Other older programming technologies exist such as fuse or anti-fuse based and mask-programmable FPGAs. However, these programming technologies only allow for one-time programmability, either at the factory in the case of mask-programmed FPGAs or in the eld as is the case with fuse or anti-fuse based FPGAs. While SRAM based FPGAs are advantageous in their capabilities of being programmed more than once, they also must be reprogrammed each time the chip is power cycled. This is necessary because SRAM is inherently a volatile storage medium [1]. When a FPGA is con gured, the con guration data, also known as a bitstream, is downloaded to the device. The bitstream contains all of the con guration bits that, when downloaded, implement a desired logic function. Con guration bits can control many re- sources inside the FPGA such as a LUTs? contents, routing resources, and the operational mode of embedded intellectual property (IP) cores such as BRAMs and DSP modules. Con- guration bits also can determine the initialization values of ip- ops in PLBs and data values stored in BRAMs [29][2]. Figure 2.1 illustrates the use of con guration bits. In Figure 2.1a, a con guration bit is used to block or pass a signal for implementing programmable routing resources. In Figure 2.1b, con guration bits are used to implement a LUT such that a 2-input truth table can be realized for a logical function. Con guration bits can also initialize a BRAM as seen in Figure 2.1c. As described in [1], con guration memory SRAM cells do not require fast read and write performance, which allows FPGA designers to use a ve transistor SRAM as shown 9 Figure 2.1: Con guration Bits in FPGAs in Figure 2.2 instead of a more standard six transistor SRAM cell that provides both the complemented and uncomplemented form as seen in Figure 2.3. Another important attribute concerning FPGA con guration is the ability for FPGAs to be partially recon gured. Most modern FPGAs like the Xilinx Virtex series support partial recon guration [3][29]. Partial recon guration allows for a reduction in bitstream size by only storing the di erence between a previous full download and the changes needed to implement the modi ed system function. Partial recon guration allows a designer to extract more functionality from a smaller FPGA for usage scenarios where a subset of implemented system functions can be swapped in and out of the FPGA without compromising critical performance parameters. In terms of testing FPGAs, partial recon guration is a valuable tool that allows the test con guration download time to be reduced. 10 Figure 2.2: A Five Transistor Con guration SRAM Cell [1] Figure 2.3: A Six Transistor Con guration SRAM Cell [1] 11 2.2 Virtex 4 Architecture Xilinx released the Virtex 4 FPGA family in 2004 and, at the time of its introduction, it was one of the most complex FPGAs available on the market. For its production, Xilinx used a 1.2V, 90nm, triple-oxide process and it is only available in a ip-chip BGA package [10]. The Virtex 4 PLB contains four slices and each slice features two 4-input LUTs and two ip- ops. Dedicated memory storage is provided via a programmable 18K-bit SRAM called a block RAM. Previously in Virtex 2, an 18 x 18 multiplier module was available [29]. In Virtex 4, the multiplier has grown into a complete DSP module that incorporates a 48-bit accumulator attached to the 18 x 18 multiplier. The accumulator can also perform addition and subtraction on two data words without rst being processed by the multiplier. Virtex 4 is available in three distinct product families: LX, SX, and FX as summarized in Table 2.1. The LX family contains eight devices that have been tailored toward design implementations that have high PLB usage. The SX family consists of three devices and is targeted toward DSP-oriented designs due to higher number of DSP modules. The FX family is a blend of the SX and LX family and provides more specialized modules which include PowerPC (PPC) and high performance I/O Serial/Deserial (SERDES) modules. Virtex 4 employs a column based architecture. Each device has a center column which divides the FPGA into two halves. Beginning from the center column and moving in either direction, rst there are columns of PLBs. The width of the rst columns of PLBs di ers by device size and family. Moving outward again, the next resource will either be a column of DSP or BRAM modules as illustrated in Figure 2.4. In FX family devices, either one or two PPC modules will be positioned to the left of the center column. 12 Table 2.1: Overview of Resources in Virtex 4 Family Devices [2] Device Row x Col Slices DSPs BRAMs PPCs I/O XC4VLX15 64 x 24 6,144 32 48 - 320 XC4VLX25 96 x 28 10,752 48 72 - 448 XC4VLX40 128 x 36 18,432 64 96 - 640 XC4VLX60 128 x 52 26,624 64 160 - 640 XC4VLX80 160 x 56 35,840 80 200 - 768 XC4VLX100 192 x 64 49,152 96 240 - 960 XC4VLX160 192 x 88 67,584 96 288 - 960 XC4VLX200 192 x 116 89,088 96 336 - 960 XC4VSX25 64 x 40 10,240 128 128 - 320 XC4VSX35 96 x 40 15,360 192 192 - 448 XC4VSX55 128 x 48 24,576 512 320 - 640 XC4VFX12 64 x 24 5,472 32 36 1 320 XC4VFX20 64 x 36 8,544 32 68 1 320 XC4VFX40 96 x 52 18,624 48 144 2 448 XC4VFX60 128 x 52 25,280 128 232 2 576 XC4VFX100 160 x 68 42,176 160 376 2 768 XC4VFX140 192 x 84 63,168 192 552 2 896 The smallest addressable unit of con guration memory in Virtex 4 is referred to as a frame, and each frame consists of 41 32-bit words. During the creation of a con guration bitstream, successive frame data along with frame addresses are written to the con gura- tion bitstream. Designs utilizing regular structures, such as multiple identically con gured BRAMs or replicated system logic in fault tolerant designs, can take further advantages of multi-frame write capabilities. Multi-frame write capabilities allow the con guration bit- stream to specify multiple frame addresses to be given the same frame data. This can allow substantial reductions in con guration bitstream sizes and can be bene cial to reducing the time to perform BIST as discussed in Chapter 3. All or parts of the con guration memory can also be read back. Con guration memory readback can be performed for various reasons such as con guration download veri cation, 13 but for BIST, readback is used to retrieve ORA results. In order to capture the contents of ip- ops, the CAPTURE module must be instantiated in the design. The CAPTURE module permits the current contents of ip- ops to overwrite their initial speci ed state stored in the con guration memory. Once the current ip- op values have been captured, the ORA results can be retrieved via con guration memory readback. Figure 2.4: Basic Virtex 4 Architecture 2.2.1 Virtex 4 PLBs Virtex 4 PLBs each consist of 2 SLICEMs and 2 SLICELs as pictured in Figure 2.5. A simpli ed representation of a SLICE is shown in Figure 2.6. A SLICEL contains 2 LUTs and 2 ip- ops plus multiplexers and logic gates to allow implementation of complex switching functions that span multiple SLICELs and even multiple PLBs. SLICEMs are a superset of SLICELs and have additional specialized features that enable them to be used as small distributed memories or function as a fast shift register. This thesis does not focus on testing PLBs and the slices within them. PLBs will be used in the work presented in this 14 thesis, however their use will only be to implement TPG and ORA components to facilitate testing BRAMs. Figure 2.5: Virtex 4 PLB [2] 2.2.2 Virtex 4 BRAMs This thesis concentrates on the development of BIST for BRAMs in Virtex 4 FPGAs. In [14], BIST was implemented for distributed RAMs, BRAMs, and multipliers in Xilinx?s Virtex 2 series FPGAs. Distributed RAMs are memory resources created by using PLBs to create small memories. The BRAMs available in both Virtex 2 and 4 share much of the same functionality. The BRAM features present in Virtex 2 can be viewed as a subset of those present in Virtex 4. The Virtex 4 BRAM, as seen in Figure 2.7, is a true dual-port RAM meaning that is has dual address and data input and output buses. The memory array can be con gured in multiple aspect ratios that utilize up to 18K of addressable memory as 15 Figure 2.6: Simpli ed Virtex 4 Slice Diagram [2] 16 outlined in Table 2.2. For word sizes of eight bits and larger there are additional parity bits associated with each word. It is important to note, however, that the user is responsible for writing data to the parity bit locations since parity over the written data word is not calculated automatically. As for BRAM performance, all BRAM read and write operations can be completed in a single clock period unless using a registered output mode. Table 2.2: Summary of Virtex 2 and 4 BRAM Aspect Ratios Word Depth Word Width Parity Width 512 32 4 1K 16 2 2K 8 1 4K 4 - 8K 2 - 16K 1 - Table 2.3 lists all of the input and output signal names and functionality found in a Virtex 4 BRAM. It should be noted that the Virtex 2 BRAM uses the same signal names except for the cascade input and outputs. Unlike Virtex 2, Virtex 4 BRAMs support cascading two 16K x 1-bit con gured BRAMs to create a 32K x 1-bit memory without utilizing additional PLBs. All of the inputs are captured at either the rising or falling edge of the input clock depending if the BRAM clock input is programmed to be rising or falling edge triggered. Also, the following signals are programmable in their active levels: WE, EN, SSR, and REGCE. The WE signal is actually a 4-bit bus which enables the ability to write a single byte to a BRAM when con gured to have a data word wider than a byte. This feature is most commonly used in conjunction with combining a PPC module and several BRAMs such that they implement program and data storage for the processor. Table 2.4 summarizes additional con guration parameters for each BRAM. The SAVE- DATA con guration option determines if a partial recon guration will overwrite the present 17 Figure 2.7: Virtex 4 BRAM [2] 18 Table 2.3: BRAM Signal Descriptions [2] Port Name Description DI[A,B] Data Input Bus DIP[A,B] Data Input Parity Bus ADDR[A,B] Address Bus WE[A,B] Write Enable EN[A,B] Port Enable SSR[A,B] Set/Reset CLK[A,B] Clock Input DO[A,B] Data Output Bus DOP[A,B] Data Output Parity Bus REGCE[A,B] Output Register Clock Enable CASCADEIN[A,B] Cascade input pin for 32K x 1 mode CASCADEOUT[A,B] Cascade output pin for 32K x 1 mode contents of a BRAM. This option is useful if the recon guration is targeted at changing the system function that operates on data stored in the BRAM. The RAM EXTENSION parameter for each port determines which BRAM is the UPPER or LOWER BRAM when con gured in the cascade mode of operation. There are no restrictions on whether a partic- ular BRAM can be UPPER or LOWER. This designation is decided by either the designer or the constraints associated with the available resources during resource placement. It is important to point out that the BRAMs located on a bottom row or directly above a PPC modules do not have CASCADEIN[A,B] inputs. Likewise, BRAMs located on the top row or directly below a PPC module do not have CASCADEOUT[A,B] ports. The DO REG parameter determines if the output data bus is either latched or sent through an extra ip- op. The READ WIDTH and WRITE WIDTH parameters determines the selected BRAM aspect ratio per port. The WRITE MODE parameter selects one of three supported write modes. READ FIRST brings the current contents of the addressed word to the output during a write operation. Likewise, WRITE FRIST writes the input data to the addressed 19 location while it also forces the input data to the output of the BRAM. The NO CHANGE write mode prevents the data output from changing during a write operation. Only a read operation will change the output data. Table 2.4: BRAM Con guration Options [2] Con guration Attribute Parameters SAVEDATA TRUE, FALSE RAM EXTENSION [A,B] LOWER, UPPER, NONE DO [A,B] REG 0,1 INVERT CLK DO[A,B] REG TRUE, FALSE READ WIDTH [A,B] 36,18,9,4,2,1,0 WRITE WIDTH [A,B] 36,18,9,4,2,1,0 WRITE MODE [A,B] READ FIRST, WRITE FIRST, NO CHANGE BRAM Error Checking Code (ECC) and Cascade Operational Modes A pair of BRAMs can be con gured to either implement a 512 x 72-bit ECC RAM or a 32K x 1-bit RAM. However, there are several restrictions for these additional modes concerning the actual placement during design implementation. An ECC BRAM can be placed in a Virtex 4 so long as the bottom BRAM is located on an even row. For clarity, the numbering convention Xilinx uses sets the bottom most row to be row zero. If an FX devices is being used, an ECC BRAM can not use the BRAM directly below or above the PowerPC core. Restrictions on the placement for cascaded BRAMs are somewhat less strict. Any pair of BRAMs in a column can be cascaded so long as the two BRAMs are physically adjacent and the pair does not span a PPC module when using FX family devices [2]. An ECC RAM is commonly used in systems that are designed to be fault tolerant. In Virtex 4, when a data word is written to an ECC RAM, a Hamming code is generated for the written data word and stored alongside the data. A Hamming code is a form of an 20 error checking code that inserts Hamming bits throughout a data word. Each Hamming bit contains parity over a subset of the data word. The contents of all the Hamming bits for a given data word is what is referred to as the Hamming code associated with a data word. The Hamming code is designed such that any single bit error in the Hamming code or in the data word will be able to indicate the presence of a single-bit error and indicate which bit is incorrect. If a Hamming bit is allocated to generate parity over the entire Hamming code and the data, then it is possible to provide single-bit error correction and double-bit error detection [30]. Virtex 4 uses what is termed a (72,64) Hamming code, meaning the Hamming codeword is 72 bits with 64 of those bits being the actual data. Seven of the eight Hamming bits are used to provide single-bit correction and the eighth bit is used to provide parity over the entire Hamming codeword which enables double-bit error detection [30]. Figure 2.8 illustrates the architecture of an ECC BRAM. An ECC BRAM also generates status bits that indicate if a single bit error was corrected or a double bit error was detected. Table 2.5 gives a description of the three valid ECC status words. It is important to note, however, that when a single bit error is corrected in the Virtex 4 ECC implementation, only the data at the output registers of the ECC BRAM are corrected. The contents stored in the BRAM are not corrected automatically [2]. The cascade mode of operation is implemented by using the MSB of the ADDR[A,B] bus, a single con guration bit, and dedicated routing to send the lower BRAM data output to the upper BRAM in the cascade pair. Figure 2.9 illustrates two BRAMs con gured as a cascade pair. Data written to a cascaded pair is routed to either the lower or upper BRAM by the MSB of the address bus. In addition, the MSB of the address bus selects the data 21 from either the lower or upper BRAM during read operations. The lower BRAM outputs its data to the dedicated routing between a cascade pair and is then output via a multiplexer that is selected by the MSB of the address bus. Both lower and upper BRAMs in a cascade pair output data during every read operation but the output of the upper BRAM is the only valid output for a cascade con gured BRAM pair [2]. Table 2.5: ECC Status Description Status[1:0] Condition 00 No error 01 Single Bit error corrected 10 Double Bit error detected 11 Invalid status condition Figure 2.8: ECC BRAM Architecture [2] 2.2.3 Virtex 4 FIFOs A First In, First Out (FIFO) memory is commonly used in digital systems to handle data ow control and bu ering. A FIFO holds data in a queue such that the rst data stored 22 Figure 2.9: BRAM Cascade Operational Diagram [2] is also the rst data able to be retrieved. In Virtex 4 each BRAM can be con gured as a FIFO without utilizing any surrounding PLBs. The FIFO implementation, as illustrated in Figures 2.10 and 2.11, generates both read and write pointers used to retrieve and store data from the BRAM [2]. A description of each FIFO input and output is given in Table 2.8. Also, status ags are generated to determine the state of the FIFO. In earlier FPGAs, such as Virtex 2, implementing a FIFO required using a substantial number of PLBs for this supporting logic. Each Virtex 4 FIFO can be independently con gured to four di erent depths as sum- marized in Table 2.6. Like BRAMs, the FIFO control signals RDCLK, WRCLK, RDEN, WREN, and RST also have programmable active levels. The rst-word fall-through (FWFT) operational mode extends the depth of a FIFO by one data word. In this mode, the rst 23 data item written to the FIFO is not stored in the BRAM, but is immediately available at the registered outputs. The remaining con guration options pertain to the programmable Almost Full/Empty ag limits. These programmable ags are set by the designer to help coordinate the status of the FIFO with surrounding logic. Table 2.6: FIFO Con guration Options [2] Con guration ALMOST EMPTY OFFSET ALMOST FULL OFFSET Standard FWFT 4k x 4 5 to 4092 6 to 4093 4 to 4091 2k x 9 5 to 2044 6 to 2045 4 to 2043 1k x 18 5 to 1020 6 to 1021 4 to 1019 512 x 36 5 to 508 6 to 509 4 to 507 The timing characteristics of the various FIFO status ags are given in Table 2.7. The most interesting attribute in this table is the assertion of the FULL ag. The FULL ag is asserted one clock cycle after the last possible data entry is written and deasserts 3 to 4 clock cycles later depending on whether the standard or FWFT operational modes is used. This latency means that one could accidentally write to a FIFO when in actuality it is already full. To remedy this problem, Xilinx recommends using the ALMOST FULL ag to signal the FIFO is full [2]. In terms of testing, the FIFO?s status ag timing creates several problems in terms of the ability to create test algorithms that fully test the device in all modes of operation. This issue will be discussed in detail in Chapter 4. 24 Figure 2.10: Virtex 4 FIFO Implementation [2] Figure 2.11: Virtex 4 FIFO [2] 25 Table 2.7: Virtex 4 Status Flag Clock Cycle Latency [2] Clock Cycle Latency Assertion Deassertion Standard FWFT Standard FWFT EMPTY 0 0 3 4 FULL 1 1 3 3 ALMOST EMPTY 1 1 3 3 ALMOST FULL 1 1 3 3 READ ERROR 0 0 0 0 WRITE ERROR 0 0 0 0 Table 2.8: FIFO Port Signal Descriptions [2] Port Name Direction Description DI Input Data input. DIP Input Parity-bit input. WREN Input Write enable, Active high or low. WRCLK Input Clock for write domain operation. Rising or falling edge triggered. RDEN Input Read enable, Active high or low. RDCLK Input Clock for read domain operation. Rising or falling edge triggered. RESET Input Asynchronous reset of all FIFO functions, ags, and pointers. Active high or low. DO Output Data output. DOP Output Parity-bit output. FULL Output All entries in FIFO memory are lled. No addi- tional write enable is performed. ALMOSTFULL Output Almost all entries in FIFO memory have been lled. EMPTY Output FIFO is empty. No additional read can be per- formed. ALMOSTEMPTY Output Almost all valid entries in FIFO have been read. RDCOUNT Output The FIFO data read pointer. WRCOUNT Output The FIFO data write pointer. WRERR Output When the FIFO is full, any additional write op- eration generates an error ag. RDERR Output When the FIFO is empty, any additional read operation generates an error ag. 26 2.2.4 Virtex 4 CAD Tools Xilinx provides a complete set of computer-aided design (CAD) tools that enable a designer to implement digital systems using a high-level design methodology that supports schematic entry, hardware description language (HDL) synthesis, and IP core integration. A designer can either use a graphical user-interface (GUI), Project Navigator, to imple- ment designs, or command-line tools can be used. By using command-line tools and batch les, one can automate the BIST con guration generation process. Table 2.9 summarizes several Xilinx tools that are used to implement the BIST con gurations presented in this thesis. BITGEN has several options that can be utilized in BIST con guration generation. BITGEN supports the creation of partial con guration bitstream from a set of regular full con guration bit- les. During the creation of partial bit- les BITGEN compares two NCD design les and generates a bit- le containing the di erence between the two. The XDL command-line tool allows the conversion between an NCD le and an XDL le. An XDL le type contains a human-readable netlist description of a FPGA con guration. XDL can convert an XDL le to an NCD le type which describes a given FPGA con guration, but this le type is a binary le which is not human-readable [31]. 2.2.5 Virtex 4 Boundary Scan Virtex 4 implements boundary scan, also called JTAG, that meets IEEE Standard 1149.1-2001 [3]. Boundary scan was originally intended as a mechanism for testing inter- connect between multiple IC chips on a system board. A boundary scan implementation includes the following four signals: Test Clock (TCK), Test Mode Select (TMS), Test Data In (TDI), and Test Data Out (TDO). By asserting TMS to either a logic high or low during 27 Table 2.9: Summary of Xilinx Design Tools Application Input Output Description File Type File Type XST VHDL, Verilog NGC Synthesis Tool - Compiles HDL and gener- ates a design netlist compatible with Xil- inx devices NGDbuild NGC NGD Compiles designs to common format MAP NGD NCD Translates post-synthesis design to a de- vice speci c implementation PAR NCD NCD Places and routes device speci c designs BITGEN NCD .BIT, .RBT Creates bitstream con guration les for download XDL XDL, NCD NCD, XDL Converts in between XDL and NCD de- sign formats TRCE NCD TWR TRACE - Generates con guration timing analysis report a rising transition on TCK, one can navigate a state machine termed the test access port (TAP) controller as illustrated in Figure 2.12 [11]. However, the technology has evolved and now most FPGAs allow for device programming via boundary scan [29][3]. Xilinx?s Virtex series of FPGAs also allow connecting boundary scan signals to internal FPGA logic. This connection is made via what Xilinx calls boundary scan (BSCAN) modules. In Virtex 4, four BSCAN modules are available for use and each module is selected by shifting a speci c data word into TAP?s instruction register as shown in Table 2.10. The convention that is used when shifting data either into the data register when in state Shift-DR or the instruc- tion register when in state Shift-IR is to assert TMS to a ?1? on the last data bit transmitted over TDI. 28 Figure 2.12: TAP Controller State Diagram [3] Table 2.10: Virtex 4 BSCAN Module Access Commands [3] Boundary Scan Binary Code Description Command (9:0) USER1 1111000010 Access user-de ned register 1 USER2 1111000011 Access user-de ned register 2 USER3 1111100010 Access user-de ned register 3 USER4 1111100011 Access user-de ned register 4 29 2.3 SRAM Testing Most digital systems incorporate some type of memory element whether it is an em- bedded SRAM in a microprocessor?s cache or a standalone memory in a digital system. Applications that utilize memories are quite abundant and thus there is a need to ensure that memory devices such as SRAMs are fault-free. Figure 2.13 illustrates a functional model of a common SRAM. As seen in this gure, an array of memory cells and support- ing functions such as decoders, sense ampli ers, and storage registers are all components integral to a SRAM. Figure 2.14 shows how a two-port memory can be made by additional word and bit lines that connect the the cross-coupled inverters. Figure 2.13: SRAM Memory Functional Model 30 Figure 2.14: Structural Model of a two-port SRAM cell 2.3.1 SRAM Fault Models In [26], van de Goor shows that a simple stuck-at fault (SAF) model is not su cient for modeling faults found in memory devices. Table 2.11 lists common types of faults for SRAMs and are sometimes referred to as simple faults. Coupling faults can be further classi ed into four subtypes: inversion coupling faults (CFin), idempotent coupling faults (CFid), state coupling faults (CFst), and disturb coupling fault (CFdst). A CFin refers to a " and/or # write operation in a coupling cell that causes an inversion in the coupled cell. For example, if cell A and cell B are both ?1? and a ?0? is written to cell A, a CFin fault in cell B would invert its value to a ?0? as well. In this case cell A is considered the coupling cell and cell B is deemed the coupled cell. A CFid is similar to a CFin except that the coupled cell is forced to either a 1 or 0 instead of an inversion. A CFst is slightly more complicated as the coupled cell is only a ected if the coupling cell is in a certain state. For example, a ?1? in cell A could force cell B to either a ?1? or ?0?, but if cell A is a ?0?, no fault 31 e ect would be observed. Finally, a CFdst refers to a fault where the coupled cell undergoes a transition due to a read or write operation to the coupling cell [32]. When coupling faults are considered as part of a memory?s fault model, often the occurrence of multiple faults is considered. These multiple faults are said to be linked and are referred to as linked faults. The term linked is used because in the presence of multiple faults, each fault can possibly in uence the e ect of the other faults [11]. Table 2.11: Common SRAM Fault Types Fault Type Description Stuck-At (SAF) A logic value of a memory cell always being 1 or 0 Transition Faults (TF) Memory cell not able to make a 0 to 1 or 1 to 0 transition Coupling Faults (CF) A change in one memory cell, the coupling cell, causes an- other cell , the coupled cell, to change its value Address Decoder Faults (AF) The expected memory cell is not selected or another cell is selected Data Retention Faults (DRF) The expected stored data is corrupted Pattern Sensitive Faults (PSF) The contents of a memory cell changes the value of another memory cell 2.3.2 March Tests for Single-Port Memories Memory devices are traditionally tested with march tests. March tests apply de ned test patterns consisting of writing and reading varying patterns of 1s and 0s to and from a memory device. Table 2.12 lists common notations used to describe march tests while Table 2.13 lists several march tests of varying complexity. For example, when running the MATS+ march test on a memory device, a ?0? is written to each memory location in either a descending or ascending traversal. Next, starting at the lowest addressed memory location and traversing upward, a ?0? is read and a ?1? is written to each memory location. The next 32 Table 2.12: March Test Notation Descriptions Notation Description r A read operation w A write operation r0 Read a 0 w0 Write a 0 w1 Write a 1 " Traverse upward through memory addresses # Traverse downward through memory addresses l Traverse any direction through memory addresses sequence begins at the highest memory location and traverses downward while reading a ?1? and then nally writing a ?0? to each memory location. In terms of fault detection, MATS+ is able to detect all SAF and all AF [11]. More complex tests such as March Y can detect additional faults, those being TF and CFin and some linked faults. In addition, March C- can detect most all types of CFs. The general trend for march test fault detection is that longer, more complex tests o er higher fault detection. While high fault detection is obviously desired, some tests like those used speci cally to target pattern sensitive faults can be unpractical, especially if the DUT is of any su cient size. In [32], van de Goor presents March LR as \a test for simple faults and realistic linked faults." The set of realistic linked faults further reduces the universe of possible linked faults by removing combinations that are not likely to occur in actual devices. Realistic linked faults do not include linked faults containing one or more CFins and linked faults containing two CFids or two CFdsts are also removed. Van de Goor shows that March LR is superior to March C- as it can also detect some neighborhood PSFs (NPSFs)[32]. 33 Table 2.13: Common March Tests for Single Port Memories March Test Description Test time MATS+ fl(w0);"(r0,w1);#(r1,w0)g 5N March C- fl(w0);"(r0,w1);"(r1,w0);#(r0,w1);#(r1,w0);l(r0)g 10N March Y fl(w0);"(r0,w1,r1);#(r1,w0,r0);l(r0)g 8N March LR fl(w0);#(r0,wl);" (r1,w0,r0,wl);"(r1,w0);"(r0,w1,r1,w0);l(r0)g 14N Note: N = Number of address locations All of the march tests listed in Table 2.13 are designed for bit-oriented memories (BOMs), meaning each memory word is a single bit. However, many memories, includ- ing the Virtex 4 BRAM, perform as a word-oriented memory (WOM), meaning each word is more than a single bit. Executing BOM-based march tests on WOM involves extending the bit operation to the entire word. For example, if a w0 operation is to be applied to a BOM of 4-bits it could be interpreted as w0000, meaning write all zeros to each bit in the data word. Writing and reading either all zeros or ones in a WOM does not su - ciently detect CF between cells in a data word. In [4], van de Goor et al. develop methods for converting BOM march tests, speci cally March LR, to WOM march tests by using background data sequences (BDS). Instead of writing either all zeros or ones, BDS involve writing binary patterns consisting of alternating ones and zeros and alternating sets of ones and zeros. Table 2.14 list all possible BDS for an eight bit word. The number of BDS for a M-bit word is given by Equation 2.1. Figure 2.15 illustrates converting March LR to March LR with 4-bit BDS. The test length increases from 14N to 30N, where N is number of words in the memory. In general, Equation 2.2 gives the expected test length for a WOM march test derived from a BOM march test. The variable ?M? in Equation 2.2 refers to the number of bits in a data word. Converting March LR to incorporate BDS requires running 34 f[l(w0000);#(r0000,wl111);"(r1111,w0000,r0000,wl111); "(r1111,w0000); "(r0000,w1111,r1111,w0000);l(r0000)] ["(r0000,w1111,r1111);#(r1111,w0000,r0000); "(r0000,w0101,w1010,r1010);#(r1010,w0101,r0101); "(r0101,w0011,w1100,r1100);#(r1100,w0011,r0011);"(r0011)]g Figure 2.15: March LR with 4-bit BDS [4] Table 2.14: Background Data Sequence for 8-bits # Normal # Inverse 0 00000000 1 11111111 2 01010101 3 10101010 4 00110011 5 11001100 6 00001111 7 11110000 the original March LR test as enclosed in the rst set of square brackets in Figure 2.15 and then addition additional marches that incorporate BDS. #BDS =dlog2 (M)e+ 1 (2.1) Test Length = (16 + 7 dlog2 (M)e (2.2) 2.3.3 March Tests for Dual-Port Memories In [5], Hamddioui and van de Goor describe two march tests for dual-port memories. Speci c march tests are required for both ports in order to detect speci c faults in dual- port memories. Dual-port memories generally support the following operations by the two ports[5]: Simultaneous read and write operations to di erent addresses. 35 Simultaneous read and write operations to the same address. For this case, however, the write operation is assumed to have higher priority over the read operation. Two simultaneous reads to either the same or di erent addresses. Two simultaneous write operations to di erent addresses. The only operation not allowed is simultaneous write operations to the same address location. The Virtex 4 BRAM supports the same operations and limitations discussed above. The faults associated with dual-port memories are classi ed as 2PF1 and 2PF2[5]. Faults classi ed as 2PF1 are sensitized by two simultaneous reads or both a read and write operation. Two types of faults are associated with two simultaneous reads. In one case, the correct data value is read through the sense ampli er in the SRAM cell, but the actual data stored in the cell will ip. The other case is when sense ampli er reads an incorrect value and the actual data stored is also ipped. Simultaneous read and write operations can also cause the intended write operation to not occur. March s2pf, as seen in Figure 2.16, detects all 2PF1 type faults. The ?:? symbol used in Figure 2.16 separates the operation on each port. For example, ?r1:-? would indicate a read operation on one port and any allowed operation on the second port. Figure 2.16: March s2pf [5] 36 2PF2 faults are similar to the 2PF1 type faults except the a ected cell is a neighboring cell. For example, a fault may ip a neighboring celli if two read operations occur on cellj. Another type of fault occurs when a read occurs at celli and a write to cellj. During this scenario, the fault will cause the read to return the wrong value [5]. March d2pf, as seen in Figure 2.17, is able to detect all 2PF2 faults. The the variables ?c? and ?r? correspond to the memory cell column and row location, respectively. The variables ?C? and ?R? represent the number of memory cell columns and rows, respectively. The widest word size, 36-bits, is assumed to be the memory array column size. When this march test is applied to BRAMs in Virtex 4, ?C? is considered to be zero and R is ?512?. Figure 2.17: March d2pf [5] 2.4 Overview of BIST for FPGAs Developing BIST for FPGAs consists of designing test con gurations that fully test all components within a FPGA. Traditionally, these components have been grouped into the following types of tests: BIST for PLBs, BIST for I/O Bu ers, BIST for programmable rout- ing resources, and BIST for specialized embedded cores such as large SRAMs [14][18][33]. The work presented in this thesis concentrates on testing Virtex 4 BRAMs which fall into the category of testing specialized embedded cores. BIST for PLBs is also called Logic BIST 37 and this is the most common BIST found in the literature. Logic BIST usually requires repeatedly con guring PLBs in di erent modes of operations and applying test patterns the PLBs under test. A general BIST architecture is illustrated in Figure 2.18. Two identical TPGs drive alternating columns of blocks under tests (BUTs) whose outputs are observed by two adjacent ORAs. For Logic BIST, all of the BIST components are implemented us- ing PLBs. In certain implementations such as in [34], the TPGs can be implemented using other available specialized cores. In order to test all PLBs in a FPGA, the entire BIST architecture must be ipped such that the PLBs acting as BUTs are now ORAs and the PLBs previously implementing ORAs are now con gured to be BUTs [34]. Figure 2.18: A General Comparison Based BIST Architecture Several di erent types of ORAs are used in BIST for FPGAs. A comparison-based ORA design is the most common. Figures 2.20 and 2.21 illustrate two types of comparison- based ORAs: one with a shift chain and one without a shift chain. Using an ORA design without shift chains reduces the size of an ORA and can allow for more ORAs to be used which increases diagnostic resolution [21]. In both designs, any mismatch from the BUTs will cause the ip- op to latch to a ?1? until the end of the test. The addition of the 38 shift chain allows the results to be shifted out of the device at the end of testing [14]. A comparison-based ORA with no shift chain can be used when the FPGA supports read back of the contents of the ip- ops in the each PLB via con guration memory readback. Modern FPGAs such as the Virtex family from Xilinx support this readback operation [29][3]. Figure 2.18 also presents an example of a general comparison-based ORA BIST archi- tecture. This type of comparison has limitations in terms of diagnostic resolution for BUTs located at the outer columns since they are only compared by a single ORA. In [14], cases where a fault could escape detection are discussed. A circular-comparison based BIST archi- tecture, as seen in Figure 2.19, eliminates the loss of fault detection around outer columns. Circular-comparison based BIST architectures require a minimum of three BUTs for fault detection and four BUTs for fault diagnosis [28]. The only condition for a fault to escape detection is for all BUTs in a circular comparison chain to have identical equivalent faults. If the comparison chain is su ciently long, then the probability of multiple equivalent fault occurring is negligible. Figure 2.19: A Circular Comparison Based BIST Architecture 39 Figure 2.20: Comparison Based ORA with Shift Chain Figure 2.21: Comparison Based ORA without a Shift Chain 2.4.1 BIST for BRAMs In [17], Garimella developed BIST con gurations to test BRAMs in Virtex 2 FPGAs. The work presented in this thesis borrows many of the same concepts and applies them to BIST for BRAMs in Virtex 4. BRAMs in Virtex 2 only operate as a dual-port memory. There are no built-in FIFOs or ECC or cascade modes of operation. Garimella?s approach was to create a portable BIST architecture that could be adapted to di erent FPGAs. A HDL was used to model, synthesize, and implement the entire BIST architecture. Garimella used the comparison-based ORA design that includes the shift chain. The end of the shift chain was connected to the boundary scan port TDO. The boundary scan pin TDI was connected to each ORA such that a ?1? on TDI enabled all of the ORAs to become a shift register. Likewise, a ?0? on TDI would disable the shift operation and the ORAs would 40 Table 2.15: Virtex 2 BRAM Summary BIST Test Address Data Clock Con guration Algorithm Locations (A) Width (D) Cycles 1 March LR w/ BDS 512 36 58A 2 March LR 1K 18 14A 3 2K 9 14A 4 4K 4 14A 5 8K 2 14A 6 16K 1 14A 7 March s2pf 512 36 14A 8 March d2pf 512 36 9A TOTAL BIST CLOCKS= 485,888 continue to compare BUT responses on each clock cycle. The system clock for the ORAs, BUTs, and TPG was sent through the TCK port on the Boundary Scan port. Table 2.15 summarizes the eight BIST con gurations generated by Garimella for Virtex 2 BRAMs. March LR was implemented as a TPG and used for each of the programmable aspect ratios. March LR was chosen primarily because of its relatively low complexity and high fault coverage. The 512 x 36 addressing mode used March LR modi ed to generate BDS so that fault detection of intra-word coupling faults was maximized. March s2pf and d2pf were also used to test dual-port functionality. The advantages of Garimella?s approach was that the BIST architecture was described in a HDL which allowed for very rapid development. There are, however, some important disadvantages. BIST approaches such as in [34] take advantage of partial recon guration FPGA techniques in order to reduce test time and the amount of con guration data needed to download to a device during multiple BIST con gurations. In order for several FPGA con gurations to use partial recon guration e ciently, there needs to be only small regular changes between each con guration. In terms of BIST for PLBs, the changes made between 41 each con guration is only to change the operational mode of the BUT and keep the rest of the BIST circuitry static [34]. This approach yielded substantial test time and con guration storage reductions when applied to BIST for PLBs in Virtex 2 and Virtex 4. Garimella?s approach is not compatible with partial recon guration. The use of a HDL to develop BIST con gurations removes the control over the placement of BIST circuitry. The CAD tools that transform a HDL to a con guration ready for download are not able to take advantage of the regularity of BIST architectures. The result is that the TPG and ORA portions of the BIST circuitry are intermingled amongst the available logic resources surrounding BRAMs. CAD tools also do not obtain identical results in repeated imple- mentations. This severely limits the use of partial recon guration because at no time are subsequent BIST con gurations guaranteed to be similar to previous con gurations. For this reason, Garimella used full con guration downloads for each BIST con guration. An- other disadvantage with the HDL approach is that developing BIST at a high level reduces the controllability over the resource being tested. For example, in BIST con gurations that used active low signals, the synthesis tool inverted the signals connected to the BRAM instead of con guring the BRAM ports to the opposite level. This behavior was observed when determining the portability of Garimella?s BIST approach to Virtex 4. As a result, con guration bits and logic inverting BRAM control signals are not tested by Garimella?s approach. 2.5 Thesis Restatement The work by Garimella in developing BIST con gurations for Virtex 2 BRAMs is a basis for the work presented in this thesis. The main disadvantage of Garimella?s approach 42 is not being able to take advantage of partial recon guration techniques between each BIST con guration following the initial full BIST con guration download. Downloading BIST con gurations is the most time expensive portion of the entire time required for BIST. The time spent applying BIST clock cycles is nominal compared to the con guration time re- quired. This thesis presents the development of BIST con gurations for Virtex 4 BRAMs compatible with partial recon guration. Using partial recon guration, a large portion of the time needed to perform BIST can be reduced. Unlike Garimella, the BIST architecture presented in this thesis does not rely on a HDL to describe the overall BIST architecture. Instead, this approach uses custom created BIST programs to implement BIST con gura- tions which enables much greater control of each BIST con guration as compared to the HDL approach by Garimella. This improvement allows for testing BRAMs in all of their con guration options gives higher fault coverage. Chapter 3 will introduce the general BIST architecture for testing BRAMs. Also in Chapter 3, BIST con gurations testing BRAMs in single and dual-port modes are presented. BIST con gurations for FIFO operational modes are developed in Chapter 4 while ECC and cascade mode BIST con gurations are presented in Chapter 5. 43 Chapter 3 Virtex 4 Block RAM BIST Implementation A BIST approach developed for BRAMs in Virtex 4 FPGAs is presented in this chapter. The BIST architecture will be discussed as well as a TPG which is able to generate multiple march tests. Finally, results from applying BIST to Virtex 4 devices are given and compared to results obtained by Garimella in [17]. 3.1 Virtex 4 BRAM BIST Architecture In all Virtex 4 devices, BRAMs are located along columns that span the entire de- vice. In between columns of BRAMs there are at least 4 columns of PLBs. In order to achieve high fault coverage and diagnostic resolution, circular comparison-based ORAs are used in conjunction with two identical TPGs which drive alternating rows of BRAMs as seen in Figure 3.1. Each BRAM output is observed by ORAs immediately adjacent and directly above each BRAM. The topmost BRAM outputs connect to adjacent ORAs as well, however, there are no ORAs above the topmost BRAM. To enable all BRAMs to have each output compared by two ORAs, the topmost BRAM?s outputs are compared by the ORAs adjacent to the bottommost BRAM in each column. This arrangement implements a circular comparison chain per column of BRAMs. Each BRAM has 72 outputs, 36 per port. A Virtex 4 slice contains two 4-input LUTS and two ip- ops. With these resources, two comparison-based ORAs can be implemented in each slice. Given that there are four slices per PLB, it would require nine PLBs to implement all the ORAs needed to compare the outputs of two BRAMs. The Virtex 4 44 Figure 3.1: BRAM BIST Architecture architecture places four rows of PLBs per BRAM, which in turn yields 16 PLBs total since there are four columns between each BRAM. From these 16 PLBs, nine PLBs are used to implement ORAs beside each BRAM. Figure 3.2 illustrates the BRAM to ORA connections. The rst column of four PLBs compare DOA[31:0] while the second column of 4 PLBs compares DOB[31:0]. A single PLB from a third column is used to compare the parity bits DOPA[3:0] and DOPB[3:0]. The fourth column of 4 PLBs is unused. Locating ORAs in an algorithmic method also facilitates the use of con guration memory readback to retrieve the ORA results at the end of a test. The ORA results are stored in frame data and the bit-locations of the ORAs within each frame can be obtained by generating a logic allocation le using the ?-l? argument in BITGEN. 45 Figure 3.2: BRAM ORA Orientation The FX and SX family of Virtex 4 devices require special ORA placements. In all SX devices there are several columns of BRAMs that do not have four consecutive columns of PLBs. In these devices a row of DSP modules bisects the 4 columns of PLBs leaving two columns on each side. As illustrated in Figure 3.3, this case requires that the ORAs comparing the parity bit outputs of each BRAM be shifted by one column. In FX family devices there are two exceptions that must be handled. The rst problem is the presence of either one or two PPC modules. For BRAMs located in columns with PPC modules, these BRAMs must correctly send their outputs to their directly adjacent ORAs and the 46 Figure 3.3: ORA Placement and Comparison in SX devices ORAs in the same column above the PPC module. The other exception is only applicable to FX40, FX100, and FX140 devices wherein a column of BRAMs directly to the right of a PPC module has only a single column of PLBs in between the BRAM column and the PPC module. Instead of straddling ORAs across the PPC, the entire set of ORAs is shifted to the left of the PPC module. Figure 3.4 illustrates the ORA placement and circular comparison modi cation for BRAM BIST. 47 Figure 3.4: ORA Placement and Comparison in FX devices 3.2 TPG Development Unlike, the approach used by Garimella in [17], the BIST approach for Virtex 4 can take advantage of partial recon guration. By being able to algorithmically place and route all BRAMs and ORAs identically for each BIST con guration, the only change that must be made between BIST con gurations is to modify the BRAMs? con guration. In order for each BIST con guration to di er only by changes to the BRAMs, the TPG must be static 48 throughout all BIST con gurations. These two modi cations are the most di erent aspects of BIST developed for Virtex 2 by Garimella and the BIST presented in this thesis. Garimella?s BIST architecture used di erent TPGs, with each TPG implementing a single march test in each BIST con guration [17]. This approach was a logical choice since each BIST con guration was developed exclusively using VHDL. The BIST architecture developed for Virtex 4, however, requires a much more complicated TPG. In order for the TPG to remain static between all BIST con gurations, it needs to be able to generate all of the required test patterns for all con gurations. Garimella used March LR for all single-port con gured BRAMs. March s2pf and d2pf were used to test dual-port functionality. March LR was also converted to incorporate BDS and applied to BRAMs con gured in 512 x 36 mode of operation. Table 3.1 summarizes the march test selection for testing BRAMs in this thesis. March LR with BDS ensures that most faults are detected in the memory array. March LR with BDS is run on a BRAM con gured in its widest aspect ratio in order to maximize the detection of intra-word faults with BDS and to minimize the number of BIST clock cycles required to run the test. The regular BOM-based March LR cannot be applied to a BRAM con gured in a 16K x 1-bit mode because this mode does not utilize 2K of the parity memory space. March s2pf and d2pf are also used to detect the speci c dual-port faults discussed in the previous chapter. MATS+ is used to test the remaining BRAM size con gurations to detect faults within the programmable row and column decoders. All four march tests were incorporated into a single multi-march TPG (MMTPG) and implemented in VHDL. The MMTPG VHDL source code is given in Appendix A. Since the Virtex 4 BRAM has programmable active levels for each of its control inputs, the MMTPG must be able to allow inverting active levels of the march tests driving the BRAMs. In order 49 Table 3.1: Virtex 4 TPG March Test Algorithms Address Data Clock MMTPG March Test Locations (A) Width Cycles Op Mode[3:0] March LR with BDS 512 36 2*58*A 000 s2pf 512 36 14*A 011 d2pf 512 36 9*A 100 16K 1 2*5*A 010 8K 2 2*5*A 001 MATS+ 4K 4 2*5*A 110 2K 9 2*5*A 111 512 36 2*5*A 101 to control the march test selection and appropriate active levels, a control shift register was added to the MMTPG. The control shift register was connected to one of four boundary scan modules in a Virtex 4. Figure 3.5 summarizes the control function of each bit in the control register. For example, if \000000" was shifted into the control register, this control string would direct the MMTPG to execute March LR with BDS assuming the BRAM was con gured for active low WE, SSR, REGCE, and EN control signals. Similarly, if \101010" was shifted into the shift register (beginning with the zero), that would select MATS+ with the BRAM con gured to have an active high WE, EN, and SSR and an active low REGCE. For the single-port march tests, March LR with BDS and MATS+, each of these march tests run twice; once through port A and then again through port B, after which the TPG repeats the sequence. As seen in Table 3.1, this doubles the required number of clock cycles to run each of the march tests. By designing the MMTPG to generate several march tests along with programmable active levels, the MMTPG is able to apply appropriate march tests to any size con gured Virtex 4 BRAM. 50 Figure 3.5: MMTPG Control Shift Register 3.3 BRAM BIST Con gurations Table 3.2 summarizes all of the BRAM BIST con gurations. By the end of the last BIST con guration, all active levels and additional con guration options have been applied and tested. Two March LR con gurations are listed, each with di erent initialization values. March LR (Init A) initializes the BRAMs contents to all alternating ones and zeros beginning with a zero in the list signi cant bit of BRAM. March LR (Init B) initializes the BRAMs to the opposite values. Each ports? SR values are con gured to have values opposite of the BRAMs? contents such that when the SSR signal is asserted, the output response makes a transition. To test initialization values, a BRAM must be con gured as READ FIRST which means an addressed memory cell outputs its contents before being overwritten. Table 3.3 gives the initialization values for each BIST con guration. After each March LR con guration, the initialization values remain constant such that when generating partial con gurations, the number of con guration bits is reduced. Dual-port testing using march s2pf and d2pf is accomplished with one con guration; however, to select between the two march tests, a separate MMTPG control register value is applied. BRAM BIST does not test BRAMs con gured in either the 4K x 4-bit or the 2K x 9-bit memory aspect ratios. While the MMTPG has the ability to generate tests for these memory sizes, the FIFO mode of operation also can be con gured in these two memory aspect ratios. 51 Table 3.2: BRAM BIST Con guration Detail Con g Marc h Read Width WRITE WIDTH DO A REG Inv ert DO RAM EXTENSION Num. Test [A,B ] [A,B] Register CLK [A,B] 1 Marc hLR (Init A) 36 36 0 FALSE NONE 2 Marc hLR (Init B) 36 36 0 FALSE NONE 3 s2pf /d2pf 36 36 1 FALSE NONE 4 MA TS+ 1 1 1 FALSE NONE 5 MA TS+ 8 8 1 TR UE NONE 6 MA TS+ 512 512 1 FALSE NONE Con g W rite REGCE SSR WE EN CLK Num. Mo de 1 READ FIRST High High High High High 2 READ FIRST High High High High High 3 READ FIRST High High High High High 4 READ FIRST High High High High High 5 NO CHANGE Lo w Lo w Lo w Lo w High 6 WRITE FIRST Lo w Lo w Lo w Lo w Lo w 52 Table 3.3: BRAM Initialization Values Con g March Test Memory Srval Output Num. Init Value Latch Init 1 March LR (Init A) 1010 0101 1010 2 March LR (Init B) 0101 1010 0101 3 s2pf / d2pf 0101 1010 0101 4 MATS+ 0101 1010 0101 5 MATS+ 0101 1010 0101 6 MATS+ 0101 1010 0101 BIST Con guration Development Developing BIST con gurations that maximize the use of partial recon guration re- quires that only the BUTs? con guration changes between subsequent con gurations. BRAM BIST for Virtex achieves this by algorithmically placing ORAs and setting aside dedicated logic to implement the MMTPG. However, implementing the required architecture violates many of the design rules de ned by Xilinx. All inputs and outputs of a BRAM must be connected at all times such that all possible BRAM modes can be tested without adding modifying signals and altering the routing. Normally, Xilinx?s CAD tools tie any unused input to BRAM to a global logic ?1?. In order to meet this convention, the MMTPG com- plies with this design rule by driving any unused input to a ?1? for a given con guration. For example, if the BRAM is con gured in a 16K x 1 mode, then the MMTPG drives both DI ports with a ?1? except for the least signi cant bit. The design ow of BIST con gurations for PLBs presented by Dhingra in [34] can be modi ed to generate BRAM BIST con gurations. Instead of the high-level HDL design methodology used by Garimella in [17], Dhingra uses a low-level vendor speci c design language called Xilinx Design Language (XDL). XDL is a human readable description of the physical placement and routing of a design in a Xilinx FPGA. XDL allows one to have 53 the utmost control over all aspects of a FPGA design, especially when developing BIST con gurations. Dhingra used XDL to describe the placement of ORAs, BUTs, and TPGs used in testing PLBs. This design process must be modi ed to allow for the creation of BRAM BIST con gurations. For a TPG, Dhingra used a pseudo-random pattern generator built from a Virtex 4 DSP module that continuously accumulated a prime number [34]. Dhingra?s TPG only required the XDL instantiation of a DSP module and then made logical routing connections to the PLBs under test. Compared to Dhingra?s TPG, the MMTPG is much larger, requiring 531 slices. Dhingra inserted the XDL TPG instantiation into a program that generated BIST con gurations. However, inserting a 531-slice MMTPG instantiation into a similar tool for BRAM BIST is impractical because any slight change of the MMTPG during development would require modifying the program source code. To overcome this problem, a TPG parsing tool, V4BRAMTPG, was developed that accepts an XDL version of the HDL synthesized MMTPG. The parsing program removes all instantiated components except the slices implementing the MMTPG. In order for the MMTPG to synthesize without failing design rule checks by Xilinx?s CAD tools the TPG was connected to a dummy BRAM, which the parsing tool also removes. The next program created is called V4BRAMBIST. This program instantiates BRAMs, ORAs, and the parsed XDL MMTPG. V4BRAMBIST also generates all of the logical routing for the entire BIST con guration. When the program processes the MMTPG, it duplicates the TPG logic in order to implement the two required MMTPGs. During synthesis and implementation, the MMTPG is constrained in a Virtex 4 FX12 to t in the rst four columns of PLBs to the left of the center line. This placement allows the XDL version of the MMTPG to be compatible with all Virtex 4 devices. V4BRAMBIST shifts the slice coordinate of each MMTPG slice 54 such that it is always aligned to the four columns of PLBs directly to the left and right of the center line. Figure 3.6 and 3.7 show a logically connected (unrouted) BRAM BIST con guration in an FX12 and LX25 device, respectively. MMTPG1 MMTPG2 ORAs BRAM Figure 3.6: FX12 BRAM BIST To control each BIST con guration, V4BRAMBIST instantiates two BSCAN modules: one con gured as USER1 and the other as USER2. The USER1 BSCAN module uses TDI to reset the TPG and TCK to apply clock cycles to BRAMs, ORAs, and the two MMTPGS. The USER2 BSCAN module uses its TDI and TCK to serial shift control data into each of the MMTPG control registers. Two di erent BSCAN modules are used such that when changing the MMTPG mode, the rest of the BIST circuity is clock inhibited. This prevents 55 Figure 3.7: LX25 BRAM BIST 56 the BIST circuitry from going in into an unknown state that could occur as the control string is shifted into the control register. The output of V4BRAMBIST is a logically connected (unrouted) XDL le that is a template for all BRAM con gurations. This XDL le must be converted to an NCD le type such that Xilinx?s place and route tool (PAR) can operate on the le. The template design must then be modi ed to con gure all instantiated BRAMs such that their con guration corresponds to the settings shown in Table 3.2. These modi cations are performed automat- ically by another program called V4BRAMMOD. V4BRAMMOD retains the current physi- cal routing between con gurations which is necessary to reduce the amount of partial con g- uration bits. With the combination of Xilinx?s CAD tools, V4BRAMTPG,V4BRAMBIST, and V4BRAMMOD all six BIST con gurations can be generated. The following procedure outlines the process of generating a set of BRAM BIST con gurations: 1. Synthesize VHDL TPG 2. Place and Route TPG in FX12 with TPG constrained to the rst four columns of PLBs to the left of the center column 3. Convert TPG to XDL format using XDL with the -ncd2xdl -nopips -nocom -cfg brief arguments (See Table 3.4 for a summary of XDL arguments). 4. Run V4BRAMTPG to parse and extract TPG 5. Build the BRAM BIST template with V4BRAMBIST. 6. Convert BRAM BIST template to NCD format using XDL with the -xdl2ncd -force -nodrc arguments. 57 Table 3.4: XDL Argument Summary XDL argument Description -ncd2xdl Selects conversion from NCD to XDL -xdl2ncd Selects conversion from XDL to NCD -nopips Routing is removed from NCD when converted to XDL -nocom XDL le will not contain comment blocks -cfg brief Unused con guration options are not listed during NCD to XDL conversion -force Force conversion of XDL to NCD despite design rule errors -nodrc Disables design rule checking during XDL to NCD conversion 7. Fully route the BIST template using PAR. 8. Convert BIST template to XDL format using XDL with the -ncd2xdl -nocom -cfg brief arguments. 9. Run V4BRAMMOD for each of the six BRAM BIST con gurations. 10. Convert each BIST con guration to NCD format using XDL with the -xdl2ncd -force -nodrc arguments. 11. Generate con guration bit- les for each BIST con guration using BITGEN. Figures 3.8, 3.9, and 3.10 show the command-line options available in V4BRAMTPG, V4BRAMBIST, and V4BRAMMOD, respectively. Each BRAM BIST con guration can be constrained to test a subset of BRAMs in a device as long as there are four BRAMs in each circular comparison ORA chain. Figure 3.11 demonstrates the use of these tools to create the BIST con guration shown in Figure 3.12. Testing a subset of BRAMs in a device can be used to increase the timing performance of a con guration and also can lower the BIST power consumption. Three di erent types of con guration bitstreams can be generated using BITGEN: full, compress, and partial. Full con guration bit- les contain frame data for every addressable 58 V4BRAMTPG.exe V4BRAMTPG processes XDL from synthesis for use in XDL generation program\\ command line format:\\ V4BRAMTPG } notes: assumes input XDL file generated with ?xdl -ncd2xdl -cfg_brief -nocom?} Figure 3.8: V4BRAMTPG Syntax V4BRAMBIST.exe V4BRAMBIST - generates template file for BRAM BIST config in any Virtex 4 command line format: V4BRAMBIST dev part rows cols dev part rows cols dev part rows cols lx 15 64 31 sx 25 64 55 fx 12 64 31 lx 25 96 35 sx 35 96 55 fx 20 64 47 lx 40 128 43 sx 55 128 69 fx 40 96 65 lx 60 128 61 fx 60 128 67 lx 80 160 65 fx 100 160 85 lx 100 192 73 fx 140 192 103 lx 160 192 98 lx 200 192 127 Figure 3.9: V4BRAMBIST Syntax V4BRAMMOD.exe V4BRAMMOD - modifies routed XDL for BRAM BIST command line format: V4BRAMMOD where phase = MLRA,MLRB,DUALP,8K,16K,MATS512} Figure 3.10: V4BRAMMOD Syntax V4BRAMTPG TPG.xdl parsedTPG.xdl V4BRAMBIST BRAM\_lx60 64 32 128 61 lx 60 parsedTPG.xdl V4BRAMMOD BRAM_lx60.xdl BRAM_lx60_DUALP.xdl dualp Figure 3.11: Example BIST program execution 59 Figure 3.12: Partial BRAM BIST in LX60 60 frame in a given device. Full con gurations are the default con guration bit- le type for BITGEN. Executing BITGEN with the ?-g Compress? argument allows BITGEN to take advantage of the multi-frame write capabilities that are usually used when generating par- tial con guration les. To create a partial con guration bit- le, the ?-r previouscon g.bit? argument must be used. In the previous argument, ?previouscon g.bit? is used as a reference for generating the partial con guration bit- le. One of the most time expensive portions of generating BIST con gurations is routing BIST con gurations. Garimella?s approach for generating BRAM BIST con gurations also required the synthesis of the entire BIST architecture as well as the placement and routing. These three processes were executed for each con guration. For Virtex 4, however, only the synthesis of the TPG is required and only one con guration must be routed. As mentioned earlier, each subsequent con guration retains the same routing. 3.4 Running BIST Con gurations Once all of the con gurations have been generated using the aforementioned BIST tools, the con gurations can be downloaded to the device via boundary scan. Table 3.5 lists the six BRAM BIST con gurations and the MMTPG control string needed to con gure the TPG along the the number of BIST clock cycles needed to run each march test to completion. The three least signi cant bits of each control string de ne the march test for the MMTPG as also shown in Table 3.1. A procedure for running BRAM BIST con gurations is as follows: 1. Download BRAM BIST con guration to device. 2. Goto USER2 access register. 61 Table 3.5: BRAM BIST Execution Detail Con g March MMTPG Control BIST Num. Test String [MSB:LSB] Clock Cycles 1 March LR (Init A) 111000 60,000 2 March LR (Init B) 111000 550 3a s2pf 111011 7,200 3b d2pf 111100 5,000 4 MATS+ (16K) 111010 165,000 5 MATS+ (8K) 000001 82,000 6 MATS+ (512) 000101 5,500 Total BRAM BIST Clock Cycles = 325,250 3. Clock in MMTPG control string LSB rst and assert TMS on the MSB. 4. Goto USER1 access register. 5. Toggle TDI to reset MMTPG (Active high asynchronous reset). 6. Apply BIST clock cycles. 7. Retrieve ORA results via con guration memory readback. 8. Repeat steps 1-7 for each addition con guration. Performing a con guration memory readback at the end of each BIST con guration can indicate the mode of failure for a BUT. Delaying the con guration memory readback until the last BIST con guration shortens the time required to perform BIST at the expense of diagnostic resolution due to an uncertainty in the mode of failure recorded. 3.5 ORA Results Retrieval The CAPTURE module is instantiated to transfer ORA ip- op contents to the con- guration memory for con guration memory readback. Xilinx?s Virtex 4 Con guration 62 Guide [2] provides a procedure for reading speci c frames of con guration memory through boundary scan. It is important to point out that the readback ip- op data is inverted. During con guration bit- le creation, a the ?-l? BITGEN argument creates a logic allocation le that reports the con guration memory frame bit associate with each ORA. Retrieving ORA results is e cient since all of the ip- op contents for a column of 16 PLBs are located within a single frame. Since four BRAMs span the height of 16 PLBs and each set of ORAs for a single BRAM are contained in three PLB columns, only 3 frames are read for each column of four BRAMs. 3.6 BIST Results Previously in [17], Garimella calculated the total number of BIST clock cycles for Virtex 2 BRAMs to be 485,888. From Table 3.5, the total BIST clock cycles needed for Virtex 4 BRAMs are 325,250 which represents an overall savings of 160,638 clock cycles. Table 3.6 summarizes the con guration time required to test an LX60 device. In Table 3.6 the con guration bit- le sizes and associated download and test times are given for an LX60. The test clock frequency of 50 MHz is used because it is the maximum BSCAN clock frequency supported in all Virtex 4 devices [3]. Table 3.6 also compares three di erent BIST download techniques: full, compressed, and partial recon gurations. In full con gurations, all addresses are written with frame data while the compressed technique allows for a reduction in con guration le size by using the multi-frame write capabilities. The partial con gurations are generated from a set of either full or compressed con gurations and further reduces the con guration bit- le size by only writing frame data that di ers between two given con gurations. Figure 3.13 illustrates the 63 Figure 3.13: LX60 BRAM BIST Speed-up factors BIST speed increase associated with using compress and partial con guration techniques over full con gurations. Garimella?s BIST con gurations for Virtex 2 BRAMs can only be compared to full BRAM BIST con gurations in Virtex 4 as also seen in Figure 3.13. It is clear that the BIST architecture for Virtex 4 is superior to that used in Virtex 2. Figures 3.14 and Figure 3.15 summarize the timing analysis of each BIST con guration for several Virtex 4 devices. The last con guration \512" in Figure 3.15 refers to the MATS+ 512 x 36-bit con guration and is approximately one-half as fast as the rest of the BRAM BIST con gurations. This is due to the BUTs? inverted clock input which acts to halve the available propagation delay. However, this con guration is necessary in order to test BRAMs con gured for falling-edge triggered operation. 64 Table 3.6: Summary of LX60 BRAM BIST Do wnload Size and Test Times BIST Con guration Do wnload Size (Bits) Con guration Time @ 50 Mhz (se cond s) Full Compressed Partial Full Compressed Partial Marc hLR (Init A) 17,717,632 9,516,672 9,516,672 0.354 0.190 0.190 Marc hLR (Init B) 17,717,632 9,516,672 446,688 0.354 0.190 0.00893 s2pf/d2pf 17,717,632 9,516,672 24,032 0.354 0.190 0.000481 MA TS+ (16K) 17,717,632 9,516,672 24,032 0.354 0.190 0.000481 MA TS+ (8K) 17,717,632 9,516,672 122,688 0.354 0.190 0.00245 MA TS+ (512) 17,717,632 9,516,672 24,032 0.354 0.190 0.000481 TOT AL 106,305,792 57,100,032 10,158,144 2.13 1.14 0.203 65 Figure 3.14: Timing Analysis (Slo west /F astest ) 66 Figure 3.15: Timing Analysis per BRAM BIST Con guration 67 3.7 BRAM BIST Summary This chapter has described a BIST architecture for testing BRAMs in Virtex 4 FPGAs. The architecture consists of BRAMs tested by a two TPGs driving alternating rows of BRAMs. A circular comparison based ORA architecture was implemented such that each column of BRAMs formed a separate circular comparison chain. Each TPG in a BRAM BIST con guration can apply multiple march tests depending on a TPG control register that communicates the current con guration of each BRAM such that an appropriate march test is applied. To implement the BRAM BIST architecture, several BIST programs were also devel- oped to facilitate generating BRAM BIST con gurations for any Virtex 4 device. As shown in the following chapters, these programs will be modi ed to support BIST for BRAMs operating in FIFO, ECC, and cascade modes of operation. 68 Chapter 4 Virtex 4 FIFO BIST Implementation A BIST approach developed for BRAMs con gured in a FIFO mode of operation in Virtex 4 FPGAs is presented in this chapter. The BIST architecture will be discussed as well as a TPG for FIFO testing. Finally, results from applying BIST to Virtex 4 devices are given. 4.1 Virtex 4 FIFO BIST Architecture Each Virtex 4 BRAM can operate in a FIFO mode of operation which allows for the same number of BUTs in both FIFO and BRAM BIST architecture. Also, the overall BRAM BIST architecture developed in the previous chapter can be applied to FIFO BIST. The logical connections from TPG to BUT and BUT to ORA must be modi ed slightly because each FIFO has its own dedicated inputs and outputs. While there are 72 outputs per BRAM, each FIFO has only 66 outputs. Fewer BUT outputs translates into fewer ORAs per BUT. Figure 4.1 illustrates the location of the BUT output signal comparisons by the ORAs. As with BRAM BIST, 9 PLBs are still required for FIFO BIST ORAs, but in the ninth PLB, only a single slice is used. In addition, the ORA placement exceptions for SX and FX device families discussed in Chapter 3 also are present in FIFO BIST con gurations. 4.2 FIFO TPG Development In [35], a test algorithm is described for Atmel FIFOs without programmable AL- MOSTFULL and ALMOSTEMPTY status ags. The test algorithm of length 6N (where 69 Figure 4.1: FIFO ORA Placement N is the number of address locations) is given below. Atmel FIFO Test Algorithm (Test Length = 6N) [35]: Step 1: Reset the FIFO Step 2: Repeat N times: Write a word with all zeros and observe the FULL ag assertion after N writes and the EMPTY ag deasserts after the rst word written. Step 3: Repeat N times: Read a word with all zeros and then write a word with all ones. The FULL should toggle in between each read and write operation. Step 4: Repeat N times: Read a word with all ones and then write a word with all zeros. The FULL should toggle in between each read and write operation. 70 Step 5: Repeat N times: Read a word expecting all zeros and observe the EMPTY ag assertion after N reads and the FULL disasserts after the rst read word. Steps 2 and 3 repeatedly read and write to the FIFO which fully test the FIFO read and write pointers by walking each pointer through the entire memory space. A comparison of read and write pointers along with the previous operation determines if the EMPTY or FULL ag asserts. The above test also ensures opposite logic values are written and read to each memory location. In [36], the Atmel FIFO algorithm was generalized to test FIFOs with programmable ALMOSTFULL and ALMOSTEMPTY status ags. These additional status ags are tested in Step 2 by repeatedly recon guring the ALMOSTFULL ag from its minimum to its maximum value while continuing to write and read data and observing the ALMOSTFULL ag toggle after each partial recon guration. The ALMOSTEMPTY ag can be tested in Step 5, during which the ALMOSTEMPTY ag is repeatedly recon gured from its maximum to its minimum allowed value while the FIFO is emptied with N reads. Also during Step 5, the ALMOSTEMPTY ag will toggle after each recon guration. For testing Virtex 4 FIFOs, it is not critical that opposite logic values are both written and read to and from each FIFO data word. March LR with BDS applied during BRAM BIST ensures the memory array is fault-free which allows the Virtex 4 FIFO test algorithm to concentrate on testing the status logic and read and write pointer logic. In Chapter 2, Table 2.7 summarizes the timing for FIFO status ag assertion and deassertion. The deassertion period for FULL and EMPTY ags is at most 4 clock cycles when in the FWFT mode. This latency becomes problematic for FIFO test algorithms. Steps 3 and 4 from the Atmel FIFO test algorithm are designed to test the FULL ag generation logic, 71 but if these steps were applied to a Virtex 4 FIFO, the FULL ag would not deassert in time for the next read and write operations. Fault coverage is reduced because the reassertion of the FULL ag would be masked since the FULL ag does not deassert until several clock cycles later. In order to test the FULL and EMPTY ags in Virtex 4 FIFOs several clock cycles where no operation is performed are inserted between read and write operations. These no operation (NO-OP) clock cycles allow the FULL to deassert before the read and write sequence is repeated. The test algorithm for Virtex 4 FIFO testing is given below. Virtex 4 FIFO Test Algorithm (Test Length = 8N): Step 1: Reset the FIFO Step 2: Repeat N times: Write a word with all zeros and observe the FULL ag assertion after N+2 writes and the EMPTY ag deassertion after the rst word written. Step 3: Repeat N times: Read a word with all zeros NO-OP NO-OP NO-OP Write a word with all ones Write a word with all ones Step 4: Repeat N times: Read a word expecting all ones By inserting the 3 NO-OP clock cycles, the FULL ag deasserts before the write and then read sequence. The repeated write in in Step 3 asserts the write error (WRERR) ag for one clock cycle to test the logic associated with this error indication. Figure 4.2 shows 72 Figure 4.2: FULL Flag Transition Timing a timing diagram to that describes the FIFO response to Step 3 in the Virtex 4 FIFO Test Algorithm. A TPG implementing the above Virtex 4 FIFO test algorithm was implemented in VHDL and required 96 slices when implemented in a FX12 device. The TPG, named FI- FOTPG, is able to generate the above test for the four di erent FIFO depth con gurations. The VHDL source for the FIFOTPG is given in Appendix B. Like the BRAM MMTPG, the FIFOTPG is also able to invert the active level of the FIFO control signals so that any possible FIFO con guration can be tested with the FIFOTPG. In order to program the FIFOTPG for the current FIFO con guration, a TPG control register is used to communi- cate with the FIFOTPG. Figure 4.3 indicates the function of each bit in the shift register. The three most signi cant bits control the active levels of the the FIFOTPG control signals RDEN, WREN, and RST while the two least signi cant bits determine the operational mode that corresponds to one of the four con gurable FIFO word depths. When writing to the control register, the control string value is shifted in, LSB rst. The control string values for each FIFO BIST con guration is summarized in Table 4.1. 73 Figure 4.3: FIFOTPG Control Register 4.3 FIFO BIST Con guration Development Three custom BIST generation tools were developed to enable FIFO BIST con gura- tion generation for all Virtex 4 devices using the same procedure discussed in Chapter 3 Section 3. V4FIFOTPG parses an XDL version of the FIFOTPG developed in the previous section. V4FIFOBIST places all of the BIST circuitry into a speci ed device. Figure 4.4 shows a logically connected (unrouted) FIFO BIST con guration in a LX60 device. The third program, V4FIFOMOD modi es each FIFO BIST con guration such that the con- guration options match those listed in Table 4.1. During each FIFO BIST con guration the FIFOTPG to FIFO routing and the FIFO to ORA routing is not changed. The FI- FOTPG also remains static throughout all of the con gurations. The only portion of the BIST con guration changed during each modi cation is the FIFO con guration options. The command-line options for these three programs are the same those shown for BRAMs in Figures 3.8 3.9 3.10. 74 FIFOTPG1 FIFOTPG2 FIFOs and ORAs Figure 4.4: LX60 FIFO BIST Con guration 75 Table 4.1: Summary of Virtex 4FIF O Con gurations RST, RDEN, WRE N ALMOST ALMOST FIF OTPG BIST Con g FIF O ACTIVE EMPTY FULL Con trol String Clo ck Num. MODE LEVEL FWFT LEVEL LEVEL [MSB:LSB] Cycles 1 2K x9-bits LO W TR UE 15 2,043 00001 16,384 2 512 x36-bits HIGH FALSE 15 496 11111 4,096 3 1K x18-bits HIGH FALSE 5 507 11110 8,192 4 4K x4-bits HIGH FALSE 5 5 11100 7 5 4K x4-bits HIGH FALSE 6 7 11100 8 6 4K x4-bits HIGH FALSE 8 8 11100 10 7 4K x4-bits HIGH FALSE 16 16 11100 18 8 4K x4-bits HIGH FALSE 32 32 11100 34 9 4K x4-bits HIGH FALSE 64 64 11100 66 10 4K x4-bits HIGH FALSE 128 128 11100 130 11 4K x4-bits HIGH FALSE 256 256 11100 258 12 4K x4-bits HIGH FALSE 512 512 11100 514 13 4K x4-bits HIGH FALSE 1,024 1,024 11100 1,026 14 4K x4-bits HIGH FALSE 2,048 2,048 11100 2,050 15 4K x4-bits HIGH FALSE 4,092 4,092 11100 32,768 TOT AL BIST CLOCK CYCLES = 65,561 76 The ALMOST FULL and ALMOST EMPTY ag values each are speci ed by 12 bits of con guration memory. The 4K x 4-bit operation mode is used in testing the ALMOST FULL and ALMOST EMPTY ags because it requires 12 bits to specify values up to 4K-bits. In order to test these con guration bits, FIFO BIST con gurations 4 through 15 move both the ALMOST FULL and ALMOST EMPTY ags higher such that each of the 12 con guration bits undergo a transition from ?1? to a ?0?. Also, the number of BIST con gurations is minimized by con guring the ALMOST FULL and ALMOST EMPTY ags to transition from the minimum allowed value in con guration 4 to the maximum allowed con guration in 15. This minimization is achieved because con gurations 4-14 are designed to only test the ALMOST FULL and ALMOST EMPTY ags and for those con gurations, only the number of clock cycles needed to reach the con gured ALMOST values are applied. In con guration 15, 32,768 clocks cycles are applied to fully execute the FIFO test algorithm and to test the nal ALMOST FULL and ALMOST EMPTY ags values. 4.4 Running FIFO BIST Con gurations Running each of the FIFO BIST con gurations is similar to the procedure described for BRAM BIST in Chapter 3. A minor di erence is that the FIFOTPG contains a 5- bit control register compared to the 6-bit control register in each MMTPG. The following procedure summarizes running each FIFO con guration: 1. Download FIFO BIST con guration to device. 2. Goto USER2 access register. 77 3. Clock in FIFOTPG control string, LSB rst, and assert TMS on the MSB. 4. Goto USER1 access register. 5. Toggle TDI to reset MMTPG (Active high asynchronous reset). 6. Apply BIST clock cycles. 7. Retrieve ORA results via con guration memory readback. 8. Repeat 1-7 for each addition con guration. 4.5 FIFO BIST Results Table 4.2 provides a summary of all fteen FIFO con gurations for a LX60. Due to the larger number of con gurations, FIFO BIST bene ts more from partial recon gura- tion than BRAM BIST. While BRAM BIST gained a ten times speed-up factor over full con gurations, FIFO BIST gained over a 32 times speed-up factor as seen in Figure 4.6. The number of BIST clock cycles for both BRAM and FIFO BIST is 390,811. This total still represents 95,077 less clock cycles than Garimella required for only testing Virtex 2 BRAMs. A FX12 FIFO BIST is shown in Figure 4.7. The two FIFOTPGs and their as- sociated routing have been highlighted. The FIFOTPG to the left of the center column is highlighted green while the FIFOTPG to the right is highlighted in yellow. The FIFO to ORA routing is also highlighted in blue. By highlighting the FIFOTPG routing, one can see that each FIFOTPG is driving alternating rows of Virtex 4 FIFO modules. The timing analysis for each FIFO BIST con guration is shown in Figure 4.5. The 4K x 4-bit FIFO mode is only listed once because each of the 12 con gurations have the same timing analysis result due to each FIFO being identically con gured. 78 0 50 100 150 200 FX12FX20FX40FX60FX100SX25SX35SX55LX15LX25LX40LX60LX80LX100 FREQUENCY(MHz) 4K 2K 1K 512K Figure 4.5: FIF O BIST Ti min gAnalysis 79 Table 4.2: Summary of LX60 FIF O BIST Do wnload Size and Test Times BIST Con guration Do wnload Size (Bits) Con guration Time @ 50 MHz (seconds) FULL Compress Partial FULL Compress Partial 1 17,717,632 7,777,728 7,777,728 0.354 0.156 0.1556 2 17,717,632 7,777,728 121,856 0.354 0.156 0.000464 3 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 4 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 5 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 6 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 7 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 8 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 9 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 10 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 11 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 12 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 13 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 14 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 15 17,717,632 7,777,728 23,200 0.354 0.156 0.000464 TOT AL 265,764,480 116,665,920 8,201,184 5.315 2.333 0.162 80 Figure 4.6: LX60 FIFO Speed-up Factors Figure 4.7: Routed FX12 FIFO BIST Con guration 81 Chapter 5 Virtex 4 ECC and Cascade BIST Implementation A BIST approach developed for BRAMs con gured in both ECC and cascade modes of operation in Virtex 4 FPGAs is presented in this chapter. The BIST architecture will be discussed as well as TPGs for ECC and cascade testing. Finally, results from applying BIST to Virtex 4 devices are given. 5.1 ECC and Cascade BIST Architecture The BIST architectures presented in previous chapters considered a BUT to be a single BRAM. For ECC and cascade BIST con gurations, the BUT is enlarged to encompass two adjacent BRAMs because both ECC and cascade modes require a pair of adjacent BRAMs. In order for the BUT to be a pair of BRAMs, the BIST architecture developed for BRAM and FIFO BIST is modi ed such that instead of TPGs driving alternating rows of BRAMs, the TPGs drive alternating pairs of BRAMs. BRAM to ORA routing is also modi ed such that the outputs of the lower BRAM pair is compared with the outputs of the next lower BRAM pair. The ORA comparison per BRAM output is identical to BRAM BIST. Figure 5.1 illustrates the con guration for the TPG to BUT and BUT to ORA connections. All Virtex 4 devices contain an even number of BRAMs per column which allows for complete ECC and cascade BRAM pairs per column. The only exception to this rule is the in FX devices. For ECC BIST con gurations, the BRAMs directly above and below each PPC module cannot operate in an ECC BRAM pair. As shown in Figure 5.1, ECC BRAM 82 pairs in Virtex 4 devices are xed and the even numbered rows are always con gured as the LOWER ECC BRAM. For cascade con gurations, problems arise when the cascade pair is separated by a PPC module. Cascaded BRAMs can be either cascaded with the BRAM directly above or below. Testing this attribute requires instantiating the bottom BRAM as a LOWER BRAM and continuing to alternate between UPPER and LOWER con gured BRAM as illustrated in Figure 5.1. A second con guration is required to test BRAMs that were con gured as an UPPER BRAM in the LOWER cascade mode of operation and vice versa. Due to the Virtex 4 BRAM cascade routing architecture, each cascade BIST con guration is expected to have ORA failures in a fault-free device. At the top of each BRAM column and directly below each PPC module, there are no cascade routes that wrap around to the bottom BRAM in a column or route through a PPC module to the next BRAM. This causes the bottom BRAM in a cascade pair located at the bottom of a BRAM column and also directly above a PPC to generate incorrect results when compared to the other BRAMs in a given con guration. The incorrect results stem from the cascade implementation as shown in Chapter 2, Figure 2.9. In this implementation, the bottom BRAM outputs data irrespective if the latched address targets the upper BRAM. The MSB of the address bus acts to enable writing to the appropriate BRAM and it also selects the corresponding output by selecting a multiplexer at the output of the upper BRAM in the cascade pair. In [2], Xilinx recommends leaving the outputs of a lower BRAM in a cascade pair unconnected. However, the BIST con gurations for both ECC and Cascade compare all of the BRAMs outputs. For BRAMs not expected to generate ORA failures, this enhances the BIST diagnostic resolution by enhancing the observability of cascade pair outputs. 83 Figure 5.1: ECC and Cascade BIST Architecture Figure 5.2 illustrates the expected ORA cascade failures in the LSB of each port?s data outputs. In general, the number of expected failures can be calculated by Equation 5.1. Four failures are expected per BRAM column because each data ouput is observed by two ORAs. Eight failures are expected per PPC module because the width of the PPC module spans two BRAM columns. # Expected ORA Failures = 4 (# BRAM Columns ) + 8 (# PPC) (5.1) 84 Figure 5.2: Expected Cascade ORA Failure Locations 5.2 ECC BRAM BIST Development In [36], Stroud discusses a general testing methodology for ECC RAMs. ECC RAMs are typically implemented with an ECC encode logic which generates Hamming bits for written data and ECC decode logic which regenerates the Hamming bits when a data word is read and compares the regenerated Hamming bits to the stored Hamming bits. The problem with testing ECC memories is that they are inherently fault tolerant. For example, in order to test if a ECC BRAM can detect Hamming bit errors, actual Ham- ming bit errors would have to be introduced. Fortunately, Virtex 4 has two con guration bits, EN ECC WRITE and EN ECC READ, which can either be con gured to TRUE or 85 FALSE. When EN ECC WRITE is TRUE, the ECC encode logic is enabled and when EN ECC READ is TRUE, the ECC decode logic is enabled. The remaining issue with testing ECC BRAMs is how to test the ECC encode logic. Out of the 72 data bits, 64 are data and 8 are Hamming bits. Generating all 264 possible inputs to the ECC encode circuity is infeasible. The ECC encode logic consists of an XOR parity tree and testing a parity tree can be completed with four test vectors if the structure of the parity tree is known [37]. However, the structure of the parity tree in the ECC encode circuit is not given in any Xilinx documentation so a more generic parity tree test is needed that will yield high fault coverage irrespective of the parity tree structure. In [36] Stroud shows that the following test vectors will achieve 100% fault coverage for any parity tree implementation: Generic Parity Tree Test Vectors: All zeros All combinations of a 1 in a eld of zeros All combinations of two 1s in a eld of zeros In [6], Stroud implements a circuit that generates all of the above test vectors and is shown in Figure 5.3 with modi cations for use with Virtex 4 ECC BRAMs. Given the above circuit, a TPG, ECCTPG, was implemented in VHDL and required 192 slices when implemented in a FX12. The source code is available in Appendix C. In order to test the ECC decode and correction circuitry, all possible 28 Hamming bit values must be read from an ECC BRAM. Without knowing the parity connections for each Hamming bit, it is not feasible to write data to the ECC BRAM to generate all 86 Figure 5.3: Parity Tree TPG [6] 256 Hamming bit values. Fortunately, an ECC BRAM can be initialized to contain all 256 Hamming bit values. These preloaded Hamming bit values will cause the ECC BRAM to indicate and correct single-bit errors or indicate double-bit errors when each memory address is read. In order generate the parity tree test vectors and read the preloaded Hamming bit values, a TPG with two test phases was developed. The ECC BRAM test algorithm is as follows: ECC BRAM Test Algorithm: Phase 1: Read each address and observe a single-bit or double-bit read error along with cor- recting single-bit erros when detected. Phase 2: Write to and then read from a memory address the vectors listed for a generic parity tree. Observe single-bit or double bit read error if EN ECC WRITE is FALSE. The rst phase tests the error detection and correction circuitry in the ECC decode circuity by applying all possible Hamming bit combinations to the ECC decode circuity by initializing all ECC BRAMs with all possible 256 Hamming bit combinations and then 87 Table 5.1: ECC BRAM BIST Con guration Settings Con g ECC WRITE EN ECC READ EN Phase 1 Phase 2 Num. Clock Cycles Clock Cycles 1 TRUE TRUE 512 5184 2 FALSE TRUE 512 5184 Total ECC BIST Clock Cycles = 11,392 reading the stored patterns. During this phase, a single-bit or double-bit error condition is expected to occur during the read traversal through memory. The second phase further tests the ECC decode parity tree and is also able to test the ECC encode circuitry depending on the con guration of the ECC BRAM as discussed below. Virtex 4 ECC BRAM BIST consists of two con gurations. Table 5.1 summarizes the con guration settings for the two con gurations. During the rst con guration, the rst phase of the TPG causes the ECC BRAM to generate single-bit and double-bit read errors, while the second phase tests the ECC encode circuitry and does not cause read errors because ECC WRITE EN is generating Hamming for each written data word. In the second con guration, ECC WRITE EN is set to FALSE and tests the ECC decode parity tree because in this mode all 72-bits of a data word are written. Phase 1 of the ECCTPG algorithm is not needed for this second con guration, but this phase only requires 512 clock cycles and it is applied during both con gurations so that the same ECCTPG can be used. Applying the phase 2 test vectors causes single-bit and double-bit read errors because in this con guration mode, the TPG is writing directly to the Hamming bit locations instead of the ECC encode circuitry. 88 5.3 Cascade TPG Development Since the BRAM memory cell array is tested using March LR with BDS during BRAM BIST, only address decoding faults need to be testing in cascade BIST con gurations. While applying MATS+ in BRAM BIST was used to detect all AFs, applying MATS+ to 32K x 1-bit memory requires 327,680 clock cycles which represents almost the total number of clock cycles for all other BRAM BIST con gurations. A cascaded BRAM is two 16K x 1-bit BRAMs con gured such that data in the bottom half of the 32K-bit memory space is in the lower BRAM and the upper half is in the upper BRAM. As seen in Figure 5.4, the MSB of the address bus and an inverter selects the write enable for each BRAM. The MSB of the address bus also selects which cascade BRAM data to output. Since all AF faults are tested in BRAM BIST, cascade BIST only needs to test that opposite logic values can be read and written to the upper and lower cascaded BRAMs. The MATS+ march test can still be used, however, the number of address locations is reduced to two: one location in the upper BRAM and one location in the lower BRAM. This simpli cation allows all but the MSB of the address bus to be grounded (set to logic zero) as also seen in Figure 5.4. Applying MATS+ to both A and B BRAM ports of a cascaded BRAM only requires 20 clock cycles. The MATS+ portion of the MMTPG was modi ed to create the cascade TPG, CASTPG. The CASTPG implementation required 15 slices and was constrained to four PLB columns to the left of the FX12 center column. The VHDL source code for the CASTPG is included in Appendix D. 89 Figure 5.4: Cascade BRAM Operational Diagram 90 Table 5.2: Summary of Cascade BIST Con guration Settings Con g Upper BRAM Lower BRAM BIST Num. RAM EXTENTION[A,B] RAM EXTENTION[A,B] Clock Cycles 1 UPPER LOWER 20 2 LOWER UPPER 20 5.4 BIST Con gurations Two sets of BIST generation programs were developed. V4ECCTPG, V4ECCBIST, and V4ECCMOD facilitate ECC BIST con guration generation while V4CASTPG, V4CASBIST, and V4CASMOD generate CAS BIST con gurations. Each program in the the two sets follows the same procedure outlined in BRAM BIST and FIFO BIST. Unlike BRAM and FIFO BIST, ECC and cascade BIST con gurations do not require a TPG control register. Table 5.1 summarizes the BRAM con guration settings for ECC BIST, and Table 5.2 out- lines the BRAM con guration settings for cascade BIST. Figures 5.5 and 5.6 show unrouted ECC and cascade BIST con gurations in a FX12, respectively. 5.5 Running BIST Con gurations ECC and cascade BIST con gurations use a single BSCAN module to apply BIST clock cylcles and reset the TPG. Previously, BRAM and FIFO BIST con gurations used a second BSCAN module to shift in a control string. This feature is not needed for ECC and cascade con gurations since the same TPG algorithm is applied in each of the two con gurations. Specifying if the BUT is a upper or lower BRAM, or a portion of the ECC circuitry is enabled or disabled does not necessitate a change in TPG outputs. The testing procedure for ECC and cascade con gurations is given below: 1. Download BIST con guration to device 91 Figure 5.5: FX12 ECC BIST Figure 5.6: FX12 Cascade BIST 92 2. Goto USER1 access register. 3. Toggle TDI to reset TPG (Active high asynchronous reset). 4. Apply BIST clock cycles. 5. Retrieve ORA results via con guration memory readback. 6. Repeat Steps 1-5 for each addition con guration. 5.6 BIST Results For cascade BIST, the expected ORA failures discussed previously were observed when ORA results were read back for both cascade con gurations generated for LX60 and FX12 devices. The observed failures for this device were 20, which is expected since the LX60 contains ve columns of BRAMs. FX12 devices contain three columns of BRAM and a single PPC which caused 20 expected failures. Since both ECC and cascade BIST have only two con gurations, the advantage of partial recon guration is minimized because even in partial recon guration, the rst con- guration is a compressed full con guration. Tables 5.3 and 5.4 summarizes the download size and test times for both cascade and ECC BIST con gurations. Figures 5.8 and 5.7 illustrate the speed-up factors attained by using both compressed and partial recon gura- tion techniques. ECC and cascade BIST con guration timing analysis for several Virtex 4 devices are shown Figure 5.9 and Figure 5.10, respectively. For all devices, the the slowest clock frequency is greater than the 50 MHz maximum boundary scan clock frequency. 93 Table 5.3: Summary of LX60 CAS BIST Do wnload Size and Test Times BIST Con guration Do wnload Size (Bits) Con guration Time @ 50 Mhz (seconds) FULL Compress Partial FULL Compress Partial 1 17,717,632 7,808,352 7,808,352 0.354 0.156 0.156 2 17,717,632 7,808,352 36,256 0.354 0.156 0.001 TOT AL 35,435,264 15,616,704 7,844,608 0.709 0.312 0.157 Table 5.4: Summary of LX60 ECC BIST Do wnload Size and Test Times BIST Con guration Do wnload Size (Bits) Con guration Time @ 50 Mhz (seconds) FULL Compress Partial FULL Compress Partial 1 17,717,632 10,121,376 10,121,376 0.354 0.202 0.202 2 17,717,632 10,121,376 113,920 0.354 0.202 0.002 TOT AL 35,435,264 20,242,752 10,235,296 0.709 0.405 0.205 94 Figure 5.7: LX60 ECC BIST Speed-up Factors Figure 5.8: LX60 CAS BIST Speed-up Factors 95 Figure 5.9: ECC BIST Timing Analysis 96 Figure 5.10: Cascade BIST Timing Analysis 97 Chapter 6 Summary and Conclusion BIST results for all generated BIST con gurations are summarized along with a com- parison to the nal results attained by Garimella [17]. The BIST architecture detailed in this thesis can also be applied to more recent FPGAs such as the Xilinx Virtex 5. In Virtex 5, Xilinx has introduced several important testability improvements for BRAMs. A potential BIST architecture for Virtex 5 BRAMs will outline future work in this eld. 6.1 Summary of Virtex 4 BIST Results The work contained in this thesis developed a BIST architecture for Virtex 4 BRAMs in all modes of operations. Six BIST con gurations were generated to test BRAMs con gured to operate as a regular RAM. Fifteen BIST con gurations were needed to test BRAMs con gured to operate as a FIFO. Two BIST con gurations were generated to test both ECC and Cascade BRAM operational modes. The 25 total BIST con gurations were generated and downloaded to a LX60, SX35, and FX12 devices. Several faulty LX60 devices were tested using these BIST con gurations and in one of the devices, an ORA failure indicated that a single BRAM?s DOA[23] output was faulty. The eight BIST con gurations developed by Garimella for Virtex 2 required a total of 485,888 clock cycles [17]. The 25 BIST con gurations presented in this thesis require 402,243 clock cycles, a savings of 83,645 clock cycles. This BIST approach was able to reduce the amount BIST clock cycles while testing BRAMs in more modes of operations. Selecting MATS+ instead of March LR for testing additional memory sizes greatly reduced 98 Total BIST Speed-up Factors 1.80 1.00 12.15 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 Full Compressed Partial Figure 6.1: BIST Speed-up for LX60 the number of clock cycles. Moreover, the use of partial recon guration during each set of BIST con gurations allowed for a considerable reduction in the total number of downloaded con guration bits. As shown in the previous chapters, the number of con guration bits is vastly greater than the number of BIST clock cycles. The Virtex 4 BIST con gurations were able to take advantage of partial recon guration by keeping the TPG-to-BUT and BUT-to-ORA routing static. TPGs capable of adapting to di erent BRAM con gurations were developed and their physical placement was not modi ed during each set of BIST con gurations. The speed-up factors for each con guration technique are shown in Figure 6.1. The speed-up factors are normalized to FULL con gurations. Clearly, utilizing partial recon guration allows for signi cant gains in terms of test time and also BIST con guration memory storage. 99 6.2 Application to Virtex 5 In 2006, Xilnx released the successor to Virtex 4, the Virtex 5. The main distinction between in the two device families is the transition from a 4-input LUT to a 6-input LUT [38]. BRAMs for Virtex 5 have also been modi ed. Each Virtex 5 BRAM consists of two Virtex 4 BRAMs and each Virtex 5 BRAM can also be cascaded to form a 64K x 1-bit RAM. In Virtex 4, FIFO and BRAM output connections were in di erent locations, but in Virtex 5, all FIFO and BRAM connections are located together. Having all BRAM and FIFO outputs together allows for the MMTPG and FIFOTPG developed for Virtex 4 to be combined to form an even larger TPG. A BIST architecture could be developed that contains an initial full or compressed con guration and every con guration thereafter could be done through partial recon gurations. This architecture would require two sets of ORAs: one set of ORAs to compare BRAM, FIFO, and ECC modes, and the second set to compare the cascade BRAM outputs. Another improvement in Virtex 5 is that the ECC encode and decode logic can be tested separately. The encode and decode logic can be con gured such that it?s outputs bypass the BRAM memory. In addition, the Hamming bits generated for each data word are also available at the BRAM outputs. Virtex 4 did not output Hamming bits. Virtex 5 represents an excellent platform to develop BIST for BRAM due to the BIST- friendly architectural improvements over Virtex 4. Using partial con guration, Virtex 5 BIST could potentially be more e cient. 100 Bibliography [1] S. Trimberger, D. McCarty, and T. Whitney, Field Programmable Gate Array Tech- nology, S. Trimberger, Ed. Kluwer, 1994. [2] Virtex-4 User Guide, User Guide UG070 (v1.4), Xilinx, Inc., 2005 (avaiable at www.xilinx.com). [3] Virtex-4 Con guration Guide , Con guration Guide UG071, Xilinx, Inc., (available at www.xilinx.com). [4] A. Van De Goor, I. Tlili, and S. Hamdioui, \Converting march tests for bit-oriented memories into tests for word-oriented memories," in Memory Technology, Design and Testing, 1998. Proceedings. International Workshop on, 24-25 Aug. 1998, pp. 46{52. [5] S. Hamdioui and A. van de Goor, \E cient tests for realistic faults in dual-port SRAMs," Computers, IEEE Transactions on, vol. 51, no. 5, pp. 460{473, May 2002. [6] C. Stroud, A Designer?s Guide to Built-In Self-Test. Kluwer Academic Publishers, 2002. [7] D. Patterson and J. Hennessy, Computer Organization and Design: The Hardware / Software Infterface. Elsevier, New York, 2005. [8] International Technology Roadmap for Semiconductors 2001, http://public.itrs.net. [9] Intel Corp., www.intel.com/products. [10] Xilinx Corp., www.xilinx.com/products. [11] M. Bushnell and V. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Kluwer Academic Publishers, 2000. [12] M. Abramovici, J. Emmert, and C. Stroud, \Roving STARS: an integrated approach to on-line testing, diagnosis, and fault tolerance for FPGAs in adaptive computing systems," in Evolvable Hardware, 2001. Proceedings. The Third NASA/DoD Workshop on, 12-14 July 2001, pp. 73{92. [13] M. Abramovici and C. Stroud, \BIST-based delay-fault testing in FPGAs," in On-Line Testing Workshop, 2002. Proceedings of the Eighth IEEE International, 8-10 July 2002, pp. 131{134. [14] ||, \BIST-based test and diagnosis of FPGA logic blocks," Very Large Scale Inte- gration (VLSI) Systems, IEEE Transactions on, vol. 9, no. 1, pp. 159{172, Feb. 2001. 101 [15] ||, \BIST-based detection and diagnosis of multiple faults in FPGAs," in Test Con- ference, 2000. Proceedings. International, 3-5 Oct. 2000, pp. 785{794. [16] M. Abramovici, C. Stroud, and J. Emmert, \Online BIST and BIST-based diagnosis of FPGA logic blocks," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 12, no. 12, pp. 1284{1294, Dec 2004. [17] S. Garimella, \Built-in self test for regular structured embedded cores in system-on- chip," Master?s thesis, Auburn Unversity, 2005. [18] S. Garimella and C. Stroud, \A system for automated built-in self-test of embedded memory cores in system-on-chip," in System Theory, 2005. SSST ?05. Proceedings of the Thirty-Seventh Southeastern Symposium on, 20-22 March 2005, pp. 50{54. [19] C. Stroud, S. Wijesuriya, C. Hamilton, and M. Abramovici, \Built-in self-test of FPGA interconnect," in Test Conference, 1998. Proceedings. International, 18-23 Oct. 1998, pp. 404{411. [20] C. Stroud, M. Lashinsky, J. Nall, J. Emmert, and M. Abramovici, \On-line BIST and diagnosis of FPGA interconnect using roving STARS," in On-Line Testing Workshop, 2001. Proceedings. Seventh International, 9-11 July 2001, pp. 27{33. [21] C. Stroud, K. Leach, and T. Slaughter, \BIST for Xilinx 4000 and Spartan series FPGAs: a case study," in Test Conference, 2003. Proceedings. ITC 2003. International, vol. 1, Sept. 30-Oct. 2, 2003, pp. 1258{1267. [22] C. Stroud, E. Lee, and M. Abramovici, \BIST-based diagnostics of FPGA logic blocks," in Test Conference, 1997. Proceedings., International, 1-6 Nov. 1997, pp. 539{547. [23] J. Rose, A. El Gamal, and A. Sangiovanni-Vincentelli, \Architecture of eld- programmable gate arrays," Proceedings of the IEEE, vol. 81, no. 7, pp. 1013{1029, July 1993. [24] Altera Corp., www.altera/products. [25] S. Hamdioui, Testing Static Random Access Memories Defects, Fault Models and Test Patterns. Kluwer Academic Publishers, 2004. [26] A. Van de Goor, Testing Semiconductor Memories: Theory and Practice. John Wiley & Sons, 1991. [27] S. Jain and C. Stroud, \Built-in self testing of embedded memories," Design & Test of Computers, IEEE, vol. 3, no. 5, pp. 27{37, 1986. [28] C. Stroud and S. Garimella, \Built-in self-test and diagnosis of multiple embedded cores in socs," in Proc. International Conference on Embedded Systems and Applications, 2005. 102 [29] Virtex-II Pro / Virtex II Pro X Complete Data Sheet, Data Sheet DS083 (v4.5), Xilinx, Inc., 2005 (available at www.xilinx.com). [30] Single Error Correction and Double Error Detection, Xilinx Application Note XAPP645, 2006 (available at www.xilinx.com). [31] Development System Reference Guide (v8.2i), Xilinx, Inc., 2005 (available at www.xilinx.com). [32] A. van de Goor, G. Gaydadjiev, V. Mikitjuk, and V. Yarmolik, \March LR: a test for realistic linked faults," in VLSI Test Symposium, 1996., Proceedings of 14th, 28 April-1 May 1996, pp. 272{280. [33] C. Stroud, J. Sunwoo, S. Garimella, and J. Harris, \Built-in self-test for system-on- chip: a case study," in Test Conference, 2004. Proceedings. ITC 2004. International, 2004, pp. 837{846. [34] S. Dhingra, \Built-in self-test of logic resources in eld programmable gate arrays using partial recon guration," Master?s thesis, Auburn University, 2006. [35] Atmel Corp. Combined Megacell Testing, Application Note AN0696C, 1999. [36] L. Wang, C. Stroud, and N. Touba, System-on-Chip Test Architectures: Nanometer Design for Testability. Elsevier, 2007. [37] S. Mourad and E. McCluskey, \Testability of parity checkers," Industrial Electronics, IEEE Transactions on, vol. 36, no. 2, pp. 254{262, May 1989. [38] Virtex-5 User Guide UG190 (v3.0), Xilinx, Inc., 2007 (available at www.xilinx.com). 103 Appendices 104 Appendix A MMTPG VHDL Source code 105 library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity fsm is port ( Reset : in std_logic; TDI,DRCK,UPDATE,SHIFT: in std_logic; Clk : in std_logic; WEA : buffer std_logic; WEB: buffer std_logic; ??OEN : out std_logic; DIA : out std_logic_vector(35 downto 0); DIB: out std_logic_vector(35 downto 0); ADDRA : out std_logic_vector(13 downto 0); ADDRB: out std_logic_vector(13 downto 0); EnA: out std_logic; EnB : out std_logic; SSR : out std_logic; ??follows EN_LEVEL signal REGCE: out std_logic); ??FINISHTEST: out std_logic); end fsm; architecture BEHAVIORAL of fsm is type testmodes is (MarchLR,MATS,March_s2pf,March_d2pf); type phases is (Init,dummy,Phase1,phase2,phase3,phase4,phase5,phase6,phase7,phase8,p hase9,phase10,phase11,phase12,phase13,phase14,phase15,phase16); type elements is (ele1,ele2,ele3,ele4,ele5); signal testmode: testmodes :=MarchLR; signal phase : phases := dummy; signal Element : elements := ele1; signal Address : std_logic_vector (13 downto 0); signal AddressB: std_logic_vector(13 downto 0); signal MAXADDRESS : std_logic_vector ( 13 downto 0); constant MINADDRESS : std_logic_vector ( 13 downto 0) := (others => ?0?); signal portBtested: std_logic:=?0?; signal tempdata: std_logic_vector(35 downto 0); ??to comply with design consideratio ns by Xilinx signal tempdataB: std_logic_vector(35 downto 0); ??to comply with design considerati ons by Xilinx signal ENATEMP,ENBTEMP,SSRTEMP,WEAtemp,WEBtemp,EN_LEVEL, WEN_ACTIVE, REGCE_ACTIVE: s td_logic; signal MODE: std_logic_vector(2 downto 0); signal SR, PDO : std_logic_vector(5 downto 0); begin bsync: process (DRCK, UPDATE,SHIFT) begin ?? sync circuitry on BSCAN clock if (DRCK?event and DRCK = ?1?) then if (SHIFT = ?1?) then ?? shift for I in 0 to 4 loop SR(I) <= SR(I+1); end loop; SR(5) <= TDI; ?? TDO <= SR(0); end if; end if; if (UPDATE = ?1?) then PDO <= SR; ?? update end if; end process bsync; EN_LEVEL <= PDO(3); WEN_ACTIVE <=PDO(5); REGCE_ACTIVE <= PDO(4); MODE(0)<=PDO(0); MODE(1)<= PDO(1); MODE(2) <= PDO(2); ??begin p0: Process(clk,Reset,MODE,MAXADDRESS,tempdata,tempdataB,Address,AddressB,WEA) 106 begin if ( Reset = ?1? ) then ADDRB <= (others => ?1?); ADDRA <= (others => ?1?); DIA <= (others => ?0?); DIB <= (others => ?0?); REGCE <= REGCE_ACTIVE; EnA <= not EN_LEVEL; EnB <= not EN_LEVEL; WEA <= not WEN_ACTIVE; WEB <= not WEN_ACTIVE; SSR <= not EN_LEVEL; elsif (Clk = ?1? and Clk?Event) then case MODE is when "000" => MAXADDRESS<="00000111111111"; ??512 X 36 with BDS DIA<=tempdata; DIB<=tempdata; ADDRA<=Address(8 downto 0)&"11111"; ADDRB<=Address(8 downto 0)&"11111"; REGCE <=(REGCE_ACTIVE); WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; testmode <= MarchLR; when "001" => MAXADDRESS<="01111111111111"; ??8k X 2 DIA<="1111111111111111111111111111111111"&tempdata(1 downto 0); DIB<="1111111111111111111111111111111111"&tempdata(1 downto 0); ADDRA<=Address(12 downto 0)&?1?; ADDRB<=Address(12 downto 0)&?1?; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; Testmode <= MATS; REGCE <= REGCE_ACTIVE; when "010" => MAXADDRESS<="11111111111111"; ??16k X 1 DIA<="11111111111111111111111111111111111"&tempdata(0); DIB<="11111111111111111111111111111111111"&tempdata(0); ADDRA<=Address( 13 downto 0); ADDRB<=Address( 13 downto 0); REGCE <= REGCE_ACTIVE; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; Testmode <= MATS; when "101" => MAXADDRESS<="00000111111111"; ??512 X 36 DIA<=tempdata; DIB<=tempdata; ADDRA<=Address(8 downto 0)&"11111"; ADDRB<=Address(8 downto 0)&"11111"; REGCE <= (REGCE_ACTIVE); Testmode <= MATS; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; 107 when "011" => MAXADDRESS<="00000111111111"; ??s2pf DIA<=tempdata; DIB<=tempdataB; ADDRB<=Address(8 downto 0)&"11111"; ADDRA<=Address(8 downto 0)&"11111"; REGCE <= (REGCE_ACTIVE); testmode <= March_s2pf; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; when "100" => MAXADDRESS<="00000111111111"; ??d2pf DIA<=tempdata; DIB<=tempdataB; ADDRB<=AddressB(8 downto 0)&"11111"; ADDRA<=Address(8 downto 0)&"11111"; REGCE <= (REGCE_ACTIVE); testmode <= March_d2pf; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; when "110" => MAXADDRESS<="00111111111111"; ??4k x 4 DIA<="11111111111111111111111111111111"&tempdata(3 downto 0); DIB<="11111111111111111111111111111111"&tempdata(3 downto 0); ADDRB<=AddressB(11 downto 0)&"11"; ADDRA<=Address(11 downto 0)&"11"; REGCE <= (REGCE_ACTIVE); testmode <= MATS; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; when "111" => MAXADDRESS<="00011111111111"; ??2k x 9 DIA<="111111111111111111111111111"&tempdata(8 downto 0); DIB<="111111111111111111111111111"&tempdata(8 downto 0); ADDRB<=AddressB(10 downto 0)&"111"; ADDRA<=Address(10 downto 0)&"111"; REGCE <= (REGCE_ACTIVE); testmode <= MATS; WEA<=WEAtemp; WEB<=WEBtemp; SSR<=SSRtemp; EnA<=EnAtemp; EnB<=EnBtemp; when others => end case; end if; ?? end if; end process; p1: Process(Clk) begin if (Reset =?1?) then tempdata <= (others => ?0?); tempdataB <= (others => ?0?); AddressB <= (others => ?0?); 108 Address <= (others => ?0?); Element <= ele1; Phase <= dummy; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); SSRtemp <= not EN_LEVEL; EnAtemp <= not EN_LEVEL; EnBtemp <= not EN_LEVEL; ?? elsif (Clk = ?1? and Clk?Event) then case testmode is ??????????????????????????????MARCH LR?????????????????????????????? when MarchLR=> case Phase is when dummy => EnAtemp <= EN_LEVEL; EnBtemp <= EN_LEVEL; SSRtemp <= EN_LEVEL; Phase <= Init; when Init => SSRtemp<= not(EN_LEVEL); Address <= MINADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); Element <= ele1; Phase <= Phase1; EnAtemp <= EN_LEVEL; EnBtemp <= not(EN_LEVEL); when phase1 => ?? U w 000000000000000000000000000000000000 if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; Element <= ele1; tempdata <= (others => ?0?); else ?? D r 000000000000000000000000000000000000 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); Phase <= Phase2; Element <= ele2; end if; when phase2 => ?? D r 000000000000000000000000000000000000 w 1111111 11111111111111111111111111111 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?1?); Element <= ele1; when ele1 => if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= (others => ?0?); else ?? U r 111111111111111111111111111111111111 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?1?); Phase <= Phase3; 109 Element <= ele2; end if; when others => end case; when phase3 => ?? U r 111111111111111111111111111111111111 w 0000000 00000000000000000000000000000 r 000000000000000000000000000000000000 r 0000000000000000000 00000000000000000 w 111111111111111111111111111111111111 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); Element <= ele5; when ele5 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?1?); Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <=not WEN_ACTIVE; Element <= ele2; tempdata <= (others => ?1?); else ?? U r 111111111111111111111111111111111111 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?1?); Phase <= Phase4; Element <= ele2; end if; when others => end case; when phase4 => ?? U r 111111111111111111111111111111111111 w 0000000 00000000000000000000000000000 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; ??was ? WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= (others => ?1?); else ?? U r 000000000000000000000000000000000000 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); Phase <= Phase5; 110 Element <= ele2; end if; when others => end case; when phase5 => ?? U r 000000000000000000000000000000000000 w 1111111 11111111111111111111111111111 r 111111111111111111111111111111111111 r 1111111111111111111 11111111111111111 w 000000000000000000000000000000000000 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?1?); Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?1?); Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?1?); Element <= ele5; when ele5 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= (others => ?0?); else ?? D r 000000000000000000000000000000000000 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); Phase <= Phase6; Element <= ele2; end if; when others => end case; when phase6 => ?? D r 000000000000000000000000000000000000 w 0101010 10101010101010101010101010101 w 101010101010101010101010101010101010 r 1010101010101010101 01010101010101010 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "010101010101010101010101010101010101"; Element <= ele3; when ele3 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "101010101010101010101010101010101010"; Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "101010101010101010101010101010101010"; Element <= ele1; when ele1 => if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); 111 WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= (others => ?0?); else ?? U r 101010101010101010101010101010101010 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "101010101010101010101010101010101010"; Phase <= Phase7; Element <= ele2; end if; when others => end case; when phase7 => ?? U r 101010101010101010101010101010101010 w 0101010 10101010101010101010101010101 r 010101010101010101010101010101010101 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "010101010101010101010101010101010101"; Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "010101010101010101010101010101010101"; Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= "101010101010101010101010101010101010"; else ?? D r 010101010101010101010101010101010101 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "010101010101010101010101010101010101"; Phase <= Phase8; Element <= ele2; end if; when others => end case; when phase8 => ?? D r 010101010101010101010101010101010101 w 0011001 10011001100110011001100110011 w 110011001100110011001100110011001100 r 1100110011001100110 01100110011001100 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "001100110011001100110011001100110011"; Element <= ele3; when ele3 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "110011001100110011001100110011001100"; Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "110011001100110011001100110011001100"; Element <= ele1; when ele1 => if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); 112 Element <= ele2; tempdata <= "010101010101010101010101010101010101"; else ?? U r 110011001100110011001100110011001100 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "110011001100110011001100110011001100"; Phase <= Phase9; Element <= ele2; end if; when others => end case; when phase9 => ?? U r 110011001100110011001100110011001100 w 0011001 10011001100110011001100110011 r 001100110011001100001100110011001100 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "001100110011001100110011001100110011"; Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "001100110011001100001100110011001100"; Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= "110011001100110011001100110011001100"; else ?? D r 001100110011001100110011001100110011 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "001100110011001100110011001100110011"; Phase <= Phase10; Element <= ele2; end if; when others => end case; when phase10 => ?? D r 001100110011001100110011001100110011 w 000011 110000111100001111000011110000 w 111100001111000011110000111100001111 r 111100001111000011 110000111100001111 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "000011110000111100001111000011110000"; Element <= ele3; when ele3 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "111100001111000011110000111100001111"; Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "111100001111000011110000111100001111"; Element <= ele1; when ele1 => if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; 113 tempdata <= "001100110011001100110011001100110011"; else ?? U r 111100001111000011110000111100001111 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "111100001111000011110000111100001111"; Phase <= Phase11; Element <= ele2; end if; when others => end case; when phase11 => ?? U r 111100001111000011110000111100001111 w 000011 110000111100001111000011110000 r 000011110000111100001111000011110000 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "000011110000111100001111000011110000"; Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "000011110000111100001111000011110000"; Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= "111100001111000011110000111100001111"; else ?? D r 000011110000111100001111000011110000 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "000011110000111100001111000011110000"; Phase <= Phase12; Element <= ele2; end if; when others => end case; when phase12 => ?? D r 000011110000111100001111000011110000 w 000000 001111111100000000111111110000 w 111111110000000011111111000000001111 r 111111110000000011 111111000000001111 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "000000001111111100000000111111110000"; Element <= ele3; when ele3 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "111111110000000011111111000000001111"; Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "111111110000000011111111000000001111"; Element <= ele1; when ele1 => if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; 114 tempdata <= "000011110000111100001111000011110000"; else ?? U r 111111110000000011111111000000001111 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "111111110000000011111111000000001111"; Phase <= Phase13; Element <= ele2; end if; when others => end case; when phase13 => ?? U r 111111110000000011111111000000001111 w 000000 001111111100000000111111110000 r 000000001111111100000000111111110000 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "000000001111111100000000111111110000"; Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "000000001111111100000000111111110000"; Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= "111111110000000011111111000000001111"; else ?? D r 000000001111111100000000111111110000 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "000000001111111100000000111111110000"; Phase <= Phase14; Element <= ele2; end if; when others => end case; when phase14 => ?? D r 000000001111111100000000111111110000 w 000000 000000000000111111111111111100 w 111111111111111100000000000000001111 r 111111111111111100 000000000000001111 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "000000000000000000111111111111111100"; Element <= ele3; when ele3 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "111111111111111100000000000000001111"; Element <= ele4; when ele4 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not WEN_ACTIVE; tempdata <= "111111111111111100000000000000001111"; Element <= ele1; when ele1 => if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); 115 WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= "000000001111111100000000111111110000"; else ?? U r 111111111111111100000000000000001111 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "111111111111111100000000000000001111"; Phase <= Phase15; Element <= ele2; end if; when others => end case; when phase15 => ?? U r 111111111111111100000000000000001111 w 000000 000000000000111111111111111100 r 000000000000000000111111111111111100 case Element is when ele2 => WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= "000000000000000000111111111111111100"; Element <= ele3; when ele3 => WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "000000000000000000111111111111111100"; Element <= ele1; when ele1 => if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele2; tempdata <= "111111111111111100000000000000001111"; else ?? D r 000000000000000000111111111111111100 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); tempdata <= "000000000000000000111111111111111100"; Phase <= Phase16; Element <= ele1; end if; when others => end case; when phase16 => ?? D r 000000000000000000111111111111111100 if ( Address /= MINADDRESS ) then Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= not(WEN_ACTIVE); Element <= ele1; tempdata <= "000000000000000000111111111111111100"; else ?? U w 000000000000000000000000000000000000 Address <= MINADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); if ( portBtested = ?1? ) then ?? change testing mode portBtested<=?0?; phase<=Init; Element<=ele1; else portBtested<=?1?; Phase <=Phase1; EnBtemp <= EN_LEVEL; EnAtemp <= not(EN_LEVEL); Element <= ele1; end if; end if; 116 end case; ??????????????????????????????MATS???????DTM?????????????????????? when MATS=> case Phase is when dummy => EnAtemp <= EN_LEVEL; EnBtemp <= EN_LEVEL; SSRtemp <= EN_LEVEL; Phase <= Init; portbtested <=?0?; when Init => Address <= MINADDRESS; WEAtemp <= WEN_ACTIVE; ??changed from not WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); Phase <= Phase1; if (portbtested =?1?) then EnAtemp <= not EN_LEVEL; EnBtemp <= (EN_LEVEL); portbtested<=?0?; else EnAtemp <= EN_LEVEL; EnBtemp <= not(EN_LEVEL); end if; SSRtemp <=not(EN_LEVEL); when phase1=> if ( Address /= MAXADDRESS ) then WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); Address <= Address+?1?; elsif (Address = MAXADDRESS) then tempdata <= (others => ?0?); WEAtemp <=not WEN_ACTIVE; WEBtemp <=not WEN_ACTIVE; Phase <=Phase2; Address <= MINADDRESS; end if; when phase2 => ?? up R0 , W1 if ( Address /= MAXADDRESS ) then case element is when ele1 => ??read zeros WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?1?); element <=ele2; when ele2 => ??write ones WEAtemp <= not WEN_ACTIVE; WEBtemp <= not WEN_ACTIVE; tempdata <= (others => ?0?); Address <= Address +?1?; element <=ele1; when others => end case; elsif (Address = MAXADDRESS) then case element is when ele1 => ??read zeros WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?1?); element <=ele2; 117 when ele2 => ??write ones WEAtemp <= not WEN_ACTIVE; WEBtemp <= not WEN_ACTIVE; ??Address <= MAXADDRESS ; already at MAX!!! tempdata <= (others => ?1?); element <=ele1; Phase <= Phase3; Address <= MAXADDRESS ; tempdata <= (others => ?1?); when others => end case; end if; when phase3 => ?? Down R1,WO if ( Address /= MINADDRESS ) then case element is when ele1 => ??read ones WEAtemp <= WEN_ACTIVE; WEBtemp <= WEN_ACTIVE; tempdata <= (others => ?0?); element <=ele2; when ele2 => ??write zeros WEAtemp <= not WEN_ACTIVE; WEBtemp <= not WEN_ACTIVE; tempdata <= (others => ?0?); Address <= Address ??1?; element <=ele1; when others => end case; elsif (Address = MINADDRESS) then case element is when ele1 => ??read 1s WEAtemp <= WEN_ACTIVE; WEBtemp <=WEN_ACTIVE; tempdata <= (others => ?0?); element <=ele2; when ele2 => ??write ones WEAtemp <= not WEN_ACTIVE; WEBtemp <= not WEN_ACTIVE; Phase <=phase4; element <=ele1; when others => end case; end if; when phase4 => if (portBtested=?1?) then portBtested<=?0?; else portBtested<=?1?; Phase <= init; end if; when others => end case; ????????????????????????????MARCH S2PF???????DTM?????????????????????? when March_s2pf => case Phase is when dummy => EnAtemp <= EN_LEVEL; EnBtemp <= EN_LEVEL; SSRtemp <= EN_LEVEL; 118 Phase <= Init; when Init => element<=ele1; Address <= MINADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); SSRtemp <= not EN_LEVEL; tempdata <= (others => ?0?); tempdatab <= (others => ?0?); Phase <= Phase1; EnAtemp <= EN_LEVEL; EnBtemp <= EN_LEVEL; when phase1=>?? M0 up write 0s if ( Address /= MAXADDRESS ) then Address <= Address + ?1?; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); else Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); Phase <=Phase2; end if; when Phase2=> case element is when ele1 => ??2 tempdata<=(others => ?0?); element<=ele2; when ele2=>??3 element<=ele3; when ele3=> ??4 WEAtemp<=(WEN_ACTIVE); tempdata <= (others => ?1?); tempdataB <= (others => ?0?); ??port B output should have zero?s on it still. element<=ele4; when ele4 => ??goes to next RAM address or next march if (Address /=MAXADDRESS) then Address <=Address +?1?; WEAtemp<=not(WEN_ACTIVE); element <=ele1; else ?? done with March M1 Address <= MINADDRESS; WEAtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); element<=ele1; Phase <=Phase3; end if; when others => end case; when Phase3 => ??M2 case element is when ele1 => ??5 element<=ele2; ??reading 1s from each port and then reading again. when ele2=> ??6 element<=ele3; ??done reading same address twice when ele3=> ??7 WEAtemp<=(WEN_ACTIVE); tempdata <= (others => ?0?); tempdataB <= (others => ?1?); ??port B output should have 1s on it still. element<=ele4; when ele4 => if (Address /=MAXADDRESS) then Address <=Address +?1?; WEAtemp<=not(WEN_ACTIVE); 119 element <=ele1; else ?? done with March M2 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); ??OEN <= OEN_ACTIVE; tempdata <= (others => ?0?); Phase <=Phase4; element <=ele1; end if; when others => end case; when Phase4 => ??M3 case element is when ele1 => ??8 element<=ele2; ??reading zeros from each port and then readi ng again. when ele2=> ??9 element<=ele3; ??done reading same address twice when ele3=> ??10 WEAtemp <= (WEN_ACTIVE); tempdata <= (others => ?1?); tempdataB <= (others => ?0?); ??port B output should have zero?s on it still. element<=ele4; when ele4 => if (Address /=MINADDRESS) then Address <=Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); element <=ele1; else ?? done with March M3 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); element<=ele1; Phase <=Phase5; end if; when others => end case; when Phase5 => case element is when ele1 => ??11 element<=ele2; ??reading zeros from each port and then readi ng again. when ele2=> ??12 element<=ele3; ??done reading same address twice when ele3=> ??13 WEAtemp <= (WEN_ACTIVE); tempdata <= (others => ?0?); tempdataB <= (others => ?1?); ??port B output should have zero?s on it still. element<=ele4; when ele4 => if (Address /=MINADDRESS) then Address <=Address ? ?1?; WEAtemp <=not(WEN_ACTIVE); element <=ele1; else ?? done with March M4 Address <= MAXADDRESS; WEAtemp <= not(WEN_ACTIVE); ??OEN <= OEN_ACTIVE; tempdata <= (others => ?0?); element<=ele1; Phase <=Phase6; end if; when others => end case; when Phase6 => ??14 if ( Address /= MINADDRESS ) then 120 Address <= Address ? ?1?; WEAtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); else Element <= ele1; Phase <=init; ?? testmode <= March_s2pf; ??ADDRESS already at LOWER BOUND for next test session end if; when others => end case; ???????????????????????????????MARCH D2PF?????????????????????????????? when March_d2pf => case Phase is when dummy => phase <= init; EnAtemp <= EN_LEVEL; EnBtemp <= EN_LEVEL; Phase <= Init; SSRtemp <= EN_LEVEL; when init => SSRtemp <= not(EN_LEVEL); phase <= Phase1; Address <= MAXADDRESS; AddressB <= MAXADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others => ?0?); tempdataB <= (others => ?0?); element <= ele2; when Phase1 => if ( Address = MINADDRESS) then Phase <= Phase2; AddressB <= MINADDRESS + ?1?; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others=>?1?); tempdataB <= (others=>?0?); element <= ele1; elsif (Address /= MINADDRESS and element /= ele2) then Address <= Address ? ?1?; AddressB <= MAXADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others=>?0?); tempdataB <= (others=>?0?); elsif ( Address /= MINADDRESS and element = ele2 ) then AddressB <= MAXADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others=>?0?); tempdataB <= (others=>?0?); element <= ele1; end if; when Phase2 => if ( element = ele1 ) then if ( Address = MINADDRESS) then AddressB <= MAXADDRESS; ?? (r1_r:r0_MAX) for low address WEBtemp <= not(WEN_ACTIVE); tempdataB <= (others=>?0?); elsif ( Address /= MINADDRESS) then AddressB <= Address ? ?1?; WEBtemp <= WEN_ACTIVE; tempdataB <= (others=>?1?); end if;?? r1_r : w1_r?1 WEAtemp <= not WEN_ACTIVE; tempdata <= (others=>?1?); 121 ??tempdataB <= (others=>?1?); element <= ele2; elsif ( element = ele2 ) then ?? w0_r: r1_r?1 if ( Address = MINADDRESS) then AddressB <= MAXADDRESS; ?? (w0_r:r0_MAX) for low address tempdataB <= (others=>?0?); elsif ( Address /= MINADDRESS) then AddressB <= Address ? ?1?; tempdataB <= (others=>?1?); end if; WEBtemp <= not(WEN_ACTIVE); WEAtemp <= WEN_ACTIVE; tempdata <= (others=>?0?); element <= ele3; elsif (element = ele3 ) then ?? r0_r:w0_r+1 AddressB <= Address + ?1?; WEAtemp <= not(WEN_ACTIVE); WEBtemp <= WEN_ACTIVE; tempdata <= (others=>?0?); tempdataB <= (others=>?0?); element <= ele4; elsif ( element = ele4) then if ( Address = MAXADDRESS ? ?1?) then Phase <= init; Address <= MAXADDRESS; AddressB <= MAXADDRESS; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others=>?0?); tempdataB <= (others=>?0?); element <= ele1; elsif ( Address /= MAXADDRESS ? ?1?) then Address <= Address + ?1?; AddressB <= Address + "10"; WEAtemp <= WEN_ACTIVE; WEBtemp <= not(WEN_ACTIVE); tempdata <= (others=>?1?); tempdataB <= (others=>?0?); element <= ele1; end if; end if; when others => end case; when others => end case; end if; end process; end; 122 Appendix B FIFOTPG VHDL Source code 123 library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity FIFO_TPG is Port ( CLK,DRCK,UPDATE,SHIFT,TDI : in STD_LOGIC; Reset: in STD_LOGIC; DI: out STD_LOGIC_VECTOR(31 downto 0); DIP: out STD_LOGIC_VECTOR(3 downto 0); RST,WREN,RDEN: out STD_LOGIC); end FIFO_TPG; architecture Behavioral of FIFO_TPG is type phases is (RESET_FIFO,Phase1,phase2,phase4); type elements is(ele1,ele2,ele3,ele4,ele5,ele6); signal element : elements :=ele1; signal phase : phases := reset_fifo; signal MAXCOUNT: STD_LOGIC_VECTOR(12 downto 0); signal MINCOUNT: STD_LOGIC_VECTOR(12 downto 0); signal COUNT: STD_LOGIC_VECTOR(12 downto 0); signal tempdata: STD_LOGIC_VECTOR(35 downto 0); signal SR, PDO : STD_LOGIC_VECTOR(4 downto 0); signal MODE: std_logic_vector(1 downto 0); signal RDENtemp,WRENtemp,RDEN_level,WREN_level, RST_level: std_logic; begin MINCOUNT <= (others =>?0?); bsync: process (DRCK, UPDATE,SHIFT,SR) begin if (DRCK?event and DRCK = ?1?) then if (SHIFT = ?1?) then for I in 0 to 3 loop SR(I) <= SR(I+1); end loop; SR(4) <= TDI; end if; end if; if (UPDATE = ?1?) then PDO <= SR; end if; end process bsync; RDEN_level <= PDO(2); WREN_level <=PDO(3); RST_level <= PDO(4); MODE(0)<=PDO(0); MODE(1)<= PDO(1); Process(MODE,tempdata,WRENtemp,RDENtemp) begin if (MODE="00") then MAXCOUNT <= "1000000000000"; ??4096 DI <= X"0000000"&tempdata(3 downto 0); DIP <= "0000"; WREN <= WRENtemp; RDEN <= RDENtemp; elsif (MODE="01") then MAXCOUNT <= "0100000000000"; ??2049 DI <= X"000000"&tempdata(7 downto 0); DIP <= "000"&tempdata(35); WREN <= WRENtemp; RDEN <= RDENtemp; elsif (MODE="10") then MAXCOUNT <= "0010000000000"; ??1025 DI <=X"0000"&tempdata(15 downto 0); DIP <= "00"&tempdata(35 downto 34); WREN <= WRENtemp; RDEN <= RDENtemp; elsif (MODE="11") then MAXCOUNT <= "0001000000000"; ??513 DI <= tempdata(31 downto 0); DIP <= tempdata(35 downto 32); 124 WREN <= WRENtemp; RDEN <= RDENtemp; end if; end process; Process(Reset, Clk,RDEN_level,WREN_level,RST_level ) begin if ( Reset = ?1? ) then tempdata<=(others =>?0?); COUNT <= MINCOUNT; phase <= RESET_FIFO; rst<=RST_level; RDENtemp<=not(RDEN_level); WRENtemp<=not(WREN_level); elsif (Clk = ?1? and Clk?Event) then case Phase is when RESET_FIFO => case element is when ele1 => rst <=RST_level; element <= ele2; when ele2 => rst <= RST_level; element <=ele3; when ele3 => element <= ele4; when ele4 => element <=ele5; when ele5 => phase <= phase1; element <=ele1; count <= MINCOUNT; RDENtemp <= not RDEN_level; WRENtemp <= WREN_level; rst <= not RST_level; when others => end case; when Phase1=> if ( COUNT <= MAXCOUNT) then COUNT <= COUNT + ?1?; WRENtemp <= WREN_level; tempdata <= (others => ?0?); else COUNT <=MINCOUNT; Phase <=Phase2; WRENtemp <= not (WREN_level); RDENtemp <= RDEN_level; tempdata <=(others =>?0?); end if; when phase2 => case Element is when ele1 => if ( COUNT <=MAXCOUNT) then RDENtemp <=not RDEN_level; WRENtemp <=not(WREN_level); tempdata <=(others =>?1?); Element <= ele2; end if; when ele2 => RDENtemp <=not RDEN_level; WRENtemp <=not(WREN_level); tempdata <=(others =>?1?); Element <= ele3; when ele3 => RDENtemp <=not RDEN_level; WRENtemp <=not(WREN_level); tempdata <=(others =>?1?); Element <= ele4; when ele4 => RDENtemp <=not RDEN_level; 125 WRENtemp <=(WREN_level); Element <= ele5; when ele5 => RDENtemp <= not RDEN_level; WRENtemp <=(WREN_level); element <= ele6; when ele6=> RDENtemp <=(RDEN_level); WRENtemp <=not WREN_level; tempdata <=(others =>?0?); COUNT <= COUNT + ?1?; if ( COUNT=MAXCOUNT) then Phase <= phase4; COUNT<=MINCOUNT; RDENtemp <=(RDEN_level); WRENtemp <=not WREN_level; tempdata <= (others => ?1?); end if; Element <= ele1; when others => end case; when phase4 => ??read 001s from FIFO if ( COUNT <= MAXCOUNT ) then RDENtemp <=RDEN_level; WRENtemp <=not(WREN_level); tempdata <=(others =>?1?); COUNT <= COUNT + ?1?; else COUNT<=MINCOUNT; WRENtemp <= not WREN_level; RDENtemp <= not RDEN_level; tempdata <= (others => ?0?); Phase <=reset_fifo; end if; end case; end if; end process; end Behavioral; 126 Appendix C ECCTPG VHDL Source code 127 library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; entity ECC_TPG is Port (CLK : in STD_LOGIC; RESET : in STD_LOGIC; Data : out std_logic_vector(71 downto 0); ADDRB : out std_logic_vector(8 downto 0); WREN,RDEN: out std_logic); end ECC_TPG; architecture Behavioral of ECC_TPG is signal CLK_EN: std_logic; type phases is (Init_RAMS,Write_RAMS) ; signal phase : phases := Init_RAMS ; type elements is (ele1,ele2,ele3) ; signal element : elements :=ele1 ; signal Address : std_logic_vector ( 8 downto 0):= (others => ?0?) ; signal MAXADDRESS : std_logic_vector ( 8 downto 0):= (others =>?1?) ; component ECC_fsm is port ( CLK: in std_logic; CLK_EN: in std_logic; RESET: in std_logic; DATA_OUT: out std_logic_vector(71 downto 0)); end component; begin ECC_fsm_inst: ECC_fsm port map(clk,CLK_EN,reset,data) ; Process( Reset, Clk ) begin if ( Reset = ?1? ) then Address <= "111111110"; CLK_EN <=?1?; RDEN <= ?0?; WREN <= ?1?; Phase <= Init_RAMs; element <= ele1; elsif (Clk = ?1? and Clk?Event) then case Phase is when Init_RAMS=> if (Address < MAXADDRESS) then case element is when ele1 => Address <= "111111110"; element<=ele2; when ele2 => element <= ele3; CLK_EN <=?0?; WREN <=?0?; RDEN <=?1?; Address <= (others =>?0?); when ele3 => Address <=Address + ?1?; end case; else RDEN <=?0?; WREN <= ?1?; 128 Phase <= Write_RAMs; element <=ele1; end if; when Write_RAMS => Address <= (others => ?0?); case Element is when ele1 => CLK_EN <= ?1?; RDEN <= ?0?; WREN <= ?1?; element <= ele2; when ele2 => CLK_EN <= ?0?; WREN <= ?0?; RDEN <=?1?; element <= ele1; when others => end case; end case; end if; end process; ADDRB <= Address; end Behavioral; entity ECC_fsm is port ( CLK: in std_logic; CLK_EN: in std_logic; RESET: in std_logic; DATA_OUT: out std_logic_vector(71 downto 0)); end ECC_fsm; architecture Behavioral of ECC_fsm is component shifter is Port ( CLK_EN: in std_logic; CLK : in std_logic; RESET: in std_logic; ENABLE: in std_logic; BIT_IN: in std_logic; BIT_OUT: out std_logic; DOUT: out std_logic_vector(71 downto 0)); end component; signal DONE_check,BIT_IN1,BIT_IN2,BIT_OUT1: std_logic; signal DOUT1, DOUT2 : std_logic_vector(71 downto 0); begin SR1: shifter port map( CLK_EN => CLK_EN, CLK => CLK, RESET => RESET, ENABLE => ?1?, BIT_IN => BIT_IN1, BIT_OUT => BIT_OUT1, DOUT => DOUT1 ); SR2: shifter port map( CLK_EN => CLK_EN, CLK => CLK, RESET => RESET, ENABLE => BIT_OUT1, BIT_IN => BIT_IN2, DOUT => DOUT2 129 ); process(DOUT1, DOUT2,BIT_IN1,BIT_IN2) variable temp2:std_logic; variable temp1:std_logic; begin temp1:=?0?; temp2:=?0?; for i in 0 to 71 loop temp1:=temp1 OR DOUT1(i); end loop; for i in 0 to 71 loop temp2:=temp2 OR DOUT2(i); end loop; BIT_IN1<=not temp1; BIT_IN2<=not temp2; end process; DATA_OUT <= DOUT1 OR DOUT2; end Behavioral; entity shifter is Port ( CLK_EN: in std_logic; CLK : in std_logic; RESET: in std_logic; ENABLE: in std_logic; BIT_IN: in std_logic; BIT_OUT: out std_logic; DOUT: out std_logic_vector(71 downto 0)); end shifter; architecture Behavioral of shifter is signal DATA: std_logic_vector(71 downto 0); begin process(CLK,RESET) variable temp:std_logic :=?0?; begin temp:=Data(63); if (RESET=?1?) then DATA <=X"000000000000000000"; elsif (CLK=?1? and CLK?event) then if (enable=?1? and CLK_EN =?1?) then for i in 0 to 70 loop DATA(i+1)<=Data(i); Data(0)<=BIT_IN; end loop; end if; end if; BIT_OUT<=temp; end process; DOUT <=DATA; end Behavioral; 130 Appendix D CASTPG VHDL Source code 131 library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity fsm is generic( EN_Level: std_logic := ?1?; WEN_ACTIVE: std_logic := ?1?); port ( Reset : in std_logic; Clk : in std_logic; WEA : buffer std_logic; DIA : out std_logic_vector( 0 downto 0); ADDRA : out std_logic; EnA: out std_logic; EnB : out std_logic); end fsm; architecture BEHAVIORAL of fsm is type phases is (Init,dummy,Phase1,phase2,phase3,phase4,phase5,phase6); type elements is (ele1,ele2); signal phase : phases := dummy; signal Element : elements := ele1; signal Address : std_logic; signal portBtested: std_logic:=?0?; signal tempdata: std_logic; signal ENATEMP,ENBTEMP,WEAtemp: std_logic; begin p0: Process(clk,Reset,tempdata,Address,WEA) begin if ( Reset = ?1? ) then ADDRA <= ?0?; DIA(0) <= ?0?; EnA <= not EN_LEVEL; EnB <= not EN_LEVEL; WEA <= not WEN_ACTIVE; elsif (Clk = ?1? and Clk?Event) then DIA(0)<=tempdata; ADDRA<=Address; WEA<=WEAtemp; EnA<=EnAtemp; EnB<=EnBtemp; end if; end process; p1: Process(Clk) begin if (Reset =?1?) then tempdata <= ?0?; Address <= ?0?; Element <= ele1; Phase <= dummy; WEAtemp <= not(WEN_ACTIVE); EnAtemp <= not EN_LEVEL; EnBtemp <= not EN_LEVEL; ?? elsif (Clk = ?1? and Clk?Event) then case Phase is when dummy => 132 EnAtemp <= not EN_LEVEL; EnBtemp <= not EN_LEVEL; Phase <= Init; portbtested <=?0?; when Init => Address <= ?0?; WEAtemp <= WEN_ACTIVE; tempdata <= ?0?; Phase <= Phase1; element <= ele1; if (portbtested =?1?) then EnAtemp <= not EN_LEVEL; EnBtemp <= (EN_LEVEL); portbtested<=?0?; else EnAtemp <= EN_LEVEL; EnBtemp <= not(EN_LEVEL); end if; when phase1=>?? case element is when ele1 => Address <= ?1?; WEAtemp <= WEN_ACTIVE; element <= ele2; when ele2 => Address <=?0?; WEAtemp <=not WEN_ACTIVE; tempdata <= ?1?; Phase <= Phase2; element <= ele1; end case; when phase2=>?? case element is when ele1 => Address <= ?0?; WEAtemp <= WEN_ACTIVE; element <= ele2; when ele2 => Address <=?1?; WEAtemp <=not WEN_ACTIVE; Phase <= Phase3; element <= ele1; end case; when phase3=>?? case element is when ele1 => Address <= ?1?; WEAtemp <= WEN_ACTIVE; element <= ele2; when ele2 => Address <=?0?; WEAtemp <=not WEN_ACTIVE; tempdata <= ?0?; Phase <= Phase4; element <= ele1; end case; when phase4=> case element is when ele1 => Address <= ?0?; WEAtemp <= WEN_ACTIVE; element <= ele2; when ele2 => Address <=?1?; 133 WEAtemp <=not WEN_ACTIVE; tempdata <= ?0?; Phase <= Phase5; element <= ele1; end case; when phase5=>?? case element is when ele1 => Address <= ?1?; WEAtemp <= WEN_ACTIVE; element <= ele1; Phase <= Phase6; when others => end case; when phase6 => if (portBtested=?1?) then portBtested<=?0?; else portBtested<=?1?; Phase <= init; end if; when others => end case; end if; end process; end; 134 Appendix E List of Acronyms ATE - Automatic Test Equipment BIST - Built-in Self Test BRAM- Block RAM BSCAN - Boundary Scan BUT - Block under Test CAD - Computer-aided Design CUT - Circuit under Test DFT - Design for Test DSP - Digital Signal Processor DUT - Device under Test ECC - Error Correcting Code FF - Flip- op FIFO - First-in First-out FPGA - Field Programmable Gate Array FWFT - First-Word-Fall-Through GUI - Graphical User Interface HDL - Hardware Description Language I/O - Input / Output IC - Integrated Circuit IP - Intellectual Property 135 LUT - Look-up Table LSB - Least Signi cant Bit MMTPG - Multi-march Test Pattern Generator MSB - Most Signi cant Bit ORA - Output Response Analyzer PIP - Programmable Interconnect Point PLB - Programmable Logic Block PowerPC - PPC RAM - Random Access Memory SERDES - Serial / Deserial SoC - System-on-Chip SRAM - Static Random Access Memory TCK - Test Clock TDI - Test Data In TDO - Test Data Out TMS - Test Mode Select TPG - Test Pattern Generator VLSI - Very Large Scale Integration 136